Bioequivalence Analysis (BE)

Bioequivalence.jl is a package for performing bioequivalence analysis.

The full API is available in the next section and provides the signatures and examples for using all the available functionality.

Quickstart

In order to use Bioequivalence.jl, add the package through

using Pkg
Pkg.add("Bioequivalence")

or using the package REPL

]add Bioequivalence

You can then load the library through

using Bioequivalence

A bioequivalence study is an instance of the type BioequivalenceStudy and can be constructed through the pumas_be function.

Bioequivalence is a concept in pharmacokinetics that captures the idea that various pharmaceutical products administrated in a similar manner (e.g., same molar dose of the same active ingredient, route of administration) can be expected to have, for all intents and purposes, the same effect on individuals in a defined population.

Clinical studies collect data which can be analyzed such as through noncompartmental analysis to obtain insightful descriptives about the contentration curve also known as pharmacokinetic endpoints. These endpoints relate to the rate (e.g., maximum concentration, time of peak concentration) and extent of absorption (e.g., area under the curve). Bioequivalence relies on the study design and pharmacokinetic endpoints from clinical trials or simulation models to make a determination about the expected effects of formulations.

Three major types of bioequivalence are regularly used:

Average (ABE): are the mean values of the distributions of the pharmacokinetic endpoints for the reference and the test formulations similar enough? The concept is the most popular with a rise in adoption in the early 1990's by the United States and the European Union. It is considered to be the easiest criterion for a new formulation to achieve bioequivalence. It is required by most regulatories agencies for the product to be approved under a bioequivalence process.
Population (PBE): are the distributions of the pharmacokinetic endpoints for the reference and the test formulations similar enough? In this case, it is not longer comparing just the expected value of the distributions but the full distribution. PBE is especially important for determining prescribability or the decision to assign a patient one of formulations as part of a treatment for the first time.
Individual (IBE): are the distributions of the pharmacokinetic endpoints for the reference and the test formulations similar enough across a large proportion of the intended population? IBE is particularly relevant for switchability or the decision to substitute an ongoing regimen (change formulation) without detrimental effects to the patient.

PBE and IBE can be assessed through two different methods:

constant scaling: the regulatory agency provides a value to be used in determining PBE or IBE.
reference scaling: the estimated total variance of the reference formulation in determining PBE or IBE.
mixed scaling: use reference scaling when the estimated total variance of the reference formulation is greater than that of the test formulation and the constant scaling otherwise.

Note

One argument for using the mixed scaling is that if the estimate of the total variance of the test formulation is greater than of the reference it bioequivalence would be very conservative.

Info

The reference scaling system is most used when working with highly variable drugs (HVD), those with intrasubject variability > 30%, and narrow therapeutic index drugs (NTI), those drugs where small differences in dose or blood concentration may lead to serious therapeutic failures and/or adverse drug reactions that are life-threatening or result in persistent or significant disability or incapacity.

Designs

There are three major categories of bioequivalence study desings.

Nonparametric for endpoints such as time of maximum concentration which typically do not have (or can easily transformed) a normal-like distribution.

Parallel designs which are typically used when crossover designs are not feasible.

Crossover (replicated and nonreplicated) designs which are the most commonly used by the industry.

Designs are fully characterized by:

Subjects: participants in the study (each is assigned to a sequence)
Formulations: the different formulations being compared (i.e., reference and additional test formulations)
Periods: each dosing period at which each subject is administrated a formulation based on the sequence it has been assigned to
Sequences: a dosing regimen which establishes what formulation is given at each period

Warning

The periods should be spaced enough such that there are no carryover effects from dosings in the previous periods.

Nonparametric analysis (i.e., x formulations, y sequences, z periods)

The nonparametric design performs a Wilcoxon signed rank test of the null hypothesis that the distribution of the reference formulation and the distribution of an alternative formulation have the same median.

When there are no tied ranks and ≤ 50 samples, or tied ranks and ≤ 15 samples, it will perform an exact signed rank test or approximate it otherwise.

Since the study design can be inferred from the data argument (i.e., based on sequences, formulations, and periods), the inferred study design approach will be automatically selected. Once can manually overwrite the method for the nonparametric option by selecting nonparametric = true in pumas_be.

Parallel design (i.e., x formulations, y periods, z sequences, e.g. `R|S|T`)

Perform a Welch's t-test (i.e., unequal variance two-sample t-test) of the null hypothesis that the distribution of the reference formulation and the distribution of an alternative formulation comes have equal means. The number of degrees of freedom of the test uses the Welch-Satterthwaite equation:

\[ ν_{χ'} ≈ \frac{\left(\sum_{i=1}^n k_i s_i^2\right)^2}{\sum_{i=1}^n \frac{(k_i s_i^2)^2}{ν_i}}\]

Crossover designs

Crossover designs are divided into two categories:

Replicated
Nonreplicated

Nonreplicated designs have subjects assigned to distinct formulations in each period.

Replicated crossover designs are those with subjects receiving the same formulation more than once. A key feature of replicated designs is that it allows to estimate within-subject variances per formulation which are a component for assessing PBE and IPE.

Common crossover designs:

Name	Number of Formulations	Number of Periods	Number of Sequences	Example	Replicated
2x2	2	2	2	RT\|TR	false
Balaam	2	2	2	RR\|RT\|TR\|TT	true
Dual	2	3	2	RTT\|TRR	true
Inner	2	4	2	RRTT\|TTRR	true
Outer	2	4	2	RTRT\|TRTR	true
Williams 3	3	3	6	RST\|RTS\|SRT\|STR\|TRS\|TSR	false
Williams 4	4	4	4	ADBC\|BACD\|CBDA\|DCAB	false

Replicated designs are preferred, particularly the inner and outer designs.

If employing the dual design, it is recommended to have a larger sample size in order to achieve the same level of statistical power.

Note

In the United States, there is a minimum requirement of at least 12 evaluable subjects for any bioequivalence study.

For analyzing more than two formulations at a time, the Williams designs, a generalized latin square, is the preferred design given its statistical power.

There are two ways to analyze crossover designs:

Linear model

Performs a linear regression with the following model

\[ \ln\left(endpoint\right) = β₀ + β₁ formulation + β₂ sequence + β₃ period + β₄ id + ε\]

where βⱼ, j ∈ [1, 2, 3, 4], are vectors for features where formulation uses the dummy variable coding and sequence and period use contrast coding.

Linear mixed model

log(endpoint) ~ formulation + sequence + period + (1 | id)

Info

The linear mixed model corresponds to

proc mixed data = data method = ml;
class sequence subject period formulation;
model ln_endpoint = sequence period formulation;
random subject(sequence);

in SAS.

Tip

One can request to use the restricted maximum likelihood (REML) objective to match SAS default value through passing the reml = true argument to pumas_be.

Tip

Per the Food and Drug Administration (US regulatory agency) guidance, replicated crossover designs should employ the linear mixed model approach while nonreplicated crossover desings should employ a linear model (linear mixed models for nonreplicated crossover desings are also acceptable).

Validation

Each design has been tested using various sources including:

Chow, Shein-Chung, and Jen-pei Liu. 2009. Design and Analysis of Bioavailability and Bioequivalence Studies. 3rd ed. Chapman & Hall/CRC Biostatistics Series 27. Boca Raton: CRC Press. DOI: 10.1201/9781420011678.
Fuglsang, Anders, Helmut Schütz, and Detlew Labes. 2015. "Reference Datasets for Bioequivalence Trials in a Two-Group Parallel Design." The AAPS Journal 17 (2): 400–404. DOI: 10.1208/s12248-014-9704-6.
Patterson, Scott D, and Byron Jones. 2017. Bioequivalence and Statistics in Clinical Pharmacology. 2nd ed. Chapman & Hall/CRC Biostatistics Series. DOI: 10.1201/9781315374161.
Schütz, Helmut, Detlew Labes, and Anders Fuglsang. 2014. "Reference Datasets for 2-Treatment, 2-Sequence, 2-Period Bioequivalence Studies." The AAPS Journal 16 (6): 1292–97. DOI: 10.1208/s12248-014-9661-0.

API

Public

Bioequivalence.Bioequivalence — Module

Bioequivalence.jl

This module offers a suite of routines for bioequivalence (BE) analysis.

Bioequivalence.BioequivalenceStudy — Type

BioequivalenceStudy

Return a bioequivalence study.

See also: pumas_be.

Fields

data::DataFrame data used for the study
data_stats::NamedTuple
- total::Int refers to the number of observations the data passed to the function had.
- used_for_analysis::Int refers to the number of observations used for fitting the model (e.g., drop missing values)
- formulation::DataFrame gives a DataFrame with the summary statistics of the statistical model's response by formulation
- sequence::DataFrame gives a DataFrame with the summary statistics of the statistical model's response by sequence
- period::DataFrame gives a DataFrame with the summary statistics of the statistical model's response by period
design::NamedTuple number of subjects in each sequence
model statistical models used for the analysis
result::DataFrame results for inference

Examples

julia> data = Bioequivalence.testdata("PJ2017_4_5")
186×5 DataFrame
 Row │ id     sequence  period  AUC      Cmax
     │ Int64  Cat…      Int64   Int64?   Int64?
─────┼───────────────────────────────────────────
   1 │     1  SRT            1     7260     1633
   2 │     1  SRT            2     6463     1366
   3 │     1  SRT            3     8759     2141
   4 │     2  RTS            1     3457      776
   5 │     2  RTS            2     6556     2387
   6 │     2  RTS            3     4081     1355
   7 │     4  TSR            1     4006     1326
   8 │     4  TSR            2     4879     1028
   9 │     4  TSR            3     3817     1052
  10 │     5  STR            1     4250      945
  11 │     5  STR            2     3487     1041
  ⋮  │   ⋮       ⋮        ⋮        ⋮        ⋮
 177 │    61  RTS            3     3779     1144
 178 │    62  SRT            1     5787     1461
 179 │    62  SRT            2     7069     1995
 180 │    62  SRT            3     6530     1236
 181 │    63  TRS            1     2204      495
 182 │    63  TRS            2     2927      770
 183 │    63  TRS            3  missing  missing
 184 │    67  RST            1     4045     1025
 185 │    67  RST            2     7865     2668
 186 │    67  RST            3  missing  missing
                                 165 rows omitted

julia> output = pumas_be(data, endpoint = :Cmax, method = :lmm, reml = true)
Design: RST|RTS|SRT|STR|TRS|TSR

Sequences: RST|RTS|SRT|STR|TRS|TSR (6)
Periods: 1:3 (3)
Subjects per Sequence: (RST = 9, RTS = 11, SRT = 11, STR = 10, TRS = 11, TSR = 10)

Average Bioequivalence
───────────────────────────────────────────────────────────────────────
             PE         SE      lnLB      lnUB      GMR      LB      UB
───────────────────────────────────────────────────────────────────────
S - R  0.468471  0.0525592  0.381334  0.555607  1.59755  1.4642  1.743
T - R  0.259687  0.0525372  0.172587  0.346787  1.29652  1.1883  1.4146
───────────────────────────────────────────────────────────────────────

julia> output.data_stats.formulation
3×10 DataFrame
 Row │ formulation  exp_mean  mean     std       min      q25      median   q75      max      n
     │ Cat…         Float64   Float64  Float64   Float64  Float64  Float64  Float64  Float64  Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────
   1 │ R             837.478  6.7304   0.466938  5.89715  6.3257   6.66568  7.09589  7.71334     62
   2 │ S            1339.96   7.20039  0.419893  6.09131  6.92952  7.20117  7.5251   8.02027     61
   3 │ T            1078.08   6.98294  0.473224  5.75574  6.62539  7.00851  7.29641  7.90286     61

julia> output.data_stats.sequence
6×10 DataFrame
 Row │ sequence  exp_mean  mean     std       min      q25      median   q75      max      n
     │ Cat…      Float64   Float64  Float64   Float64  Float64  Float64  Float64  Float64  Int64
─────┼───────────────────────────────────────────────────────────────────────────────────────────
   1 │ RST        997.282  6.90503  0.493722  5.90263  6.6385   6.92755  7.17642  7.88908     26
   2 │ RTS       1084.03   6.98844  0.499403  6.09131  6.4677   7.05618  7.32449  7.77779     33
   3 │ SRT       1187.43   7.07954  0.482725  5.89715  6.63068  7.11964  7.45124  7.90286     33
   4 │ STR        869.299  6.76769  0.404857  6.04501  6.4758   6.8663   6.95607  7.71913     30
   5 │ TRS       1180.5    7.07369  0.466733  6.20456  6.71254  7.11698  7.39368  7.96797     32
   6 │ TSR       1071.51   6.97682  0.556906  5.75574  6.69448  7.02452  7.28049  8.02027     30

julia> output.data_stats.period
3×10 DataFrame
 Row │ period  exp_mean  mean     std       min      q25      median   q75      max      n
     │ Cat…    Float64   Float64  Float64   Float64  Float64  Float64  Float64  Float64  Int64
─────┼─────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1        1009.89  6.9176   0.498246  5.75574  6.46653  6.97018  7.28186  7.72356     62
   2 │ 2        1108.56  7.01082  0.460876  5.89715  6.63035  7.06641  7.31235  8.02027     62
   3 │ 3        1076.82  6.98176  0.516518  6.03787  6.63167  6.99805  7.30986  7.96797     60

julia> output.model
Linear mixed model fit by REML
 Cmax ~ 1 + formulation + sequence + period + (1 | id)
 REML criterion at convergence: 197.54813471940628

Variance components:
            Column   Variance Std.Dev.
id       (Intercept)  0.121775 0.348963
Residual              0.084569 0.290808
 Number of obs: 184; levels of grouping factors: 62

  Fixed-effects parameters:
─────────────────────────────────────────────────────────
                      Coef.  Std. Error       z  Pr(>|z|)
─────────────────────────────────────────────────────────
(Intercept)      6.72421      0.0578144  116.31    <1e-99
formulation: S   0.468471     0.0525592    8.91    <1e-18
formulation: T   0.259687     0.0525372    4.94    <1e-06
sequence: RTS    0.0215114    0.107371     0.20    0.8412
sequence: SRT    0.112618     0.107371     1.05    0.2942
sequence: STR   -0.19924      0.111523    -1.79    0.0740
sequence: TRS    0.101829     0.107712     0.95    0.3445
sequence: TSR    0.0098946    0.111523     0.09    0.9293
period: 2        0.0505335    0.0302924    1.67    0.0953
period: 3        0.00726902   0.0306267    0.24    0.8124
─────────────────────────────────────────────────────────

julia> output.model_stats.Wald
────────────────────────────────────────────────────────
                  Wald             Distribution  p-value
────────────────────────────────────────────────────────
Formulation  39.914     FDist(ν1=2.0, ν2=116.0)   <1e-13
Sequence      0.898625  FDist(ν1=5.0, ν2=56.0)    0.4885
Period        2.17727   FDist(ν1=2.0, ν2=116.0)   0.1180
────────────────────────────────────────────────────────

julia> output.model_stats.lsmeans
──────────────────────────────────────────────────────────────────────────────
   exp_Mean     Mean  Standard Deviation  t-statistic    Distribution  p-value
──────────────────────────────────────────────────────────────────────────────
R   832.312  6.72421           0.0578144      116.307  TDist(ν=118.0)   <1e-99
S  1329.66   7.19268           0.0580685      123.865  TDist(ν=118.0)   <1e-99
T  1079.11   6.98389           0.0581118      120.18   TDist(ν=118.0)   <1e-99
──────────────────────────────────────────────────────────────────────────────

Bioequivalence.generate_design — Method

generate_design(design::AbstractString,
                amt::Union{Number,AbstractVector{<:Number}},
                formulation::AbstractVector,
                subjects_per_sequence::Union{<:Integer,AbstractVector{<:Integer}},
                )::DataFrame

Returns a DataFrame with id, sequence, period, formulation, amt, evid, cmt, and time. It can be used to quickly set up data for Pumas, NCA, and Bioequivalence. In order to add covariates, use innerjoin to join the result of this function with another DataFrame with covariates.

The following designs are available:

"Parallel" => 'A':'A' + num_formulations - 1
"2x2" => ["RT", "TR"]
"Balaam" => ["RR", "RT", "TR", "TT"]
"Dual" => ["RTT", "TRR"]
"2S4P1" => ["RTTR", "TRRT"]
"2S4P2" => ["RTRT", "TRTR"]
"WD3F" => ["ABC", "ACB", "BAC", "BCA", "CAB", "CBA"]
"WD4F" => ["ABCD", "CADB", "DCBA", "BDAC"]

Examples

julia> using DataFrames, Random, StableRNGs

julia> skeleton = generate_design("Parallel", 100, ["tablet", "soft", "hard"], 10)
30×8 DataFrame
 Row │ id     sequence  period  formulation  amt    time   evid   cmt
     │ Int64  Cat…      Int64   Cat…         Int64  Int64  Int64  Int64
─────┼──────────────────────────────────────────────────────────────────
   1 │     1  A              1  tablet         100      0      4      1
   2 │     2  A              1  tablet         100      0      4      1
   3 │     3  A              1  tablet         100      0      4      1
   4 │     4  A              1  tablet         100      0      4      1
   5 │     5  A              1  tablet         100      0      4      1
   6 │     6  A              1  tablet         100      0      4      1
   7 │     7  A              1  tablet         100      0      4      1
   8 │     8  A              1  tablet         100      0      4      1
   9 │     9  A              1  tablet         100      0      4      1
  10 │    10  A              1  tablet         100      0      4      1
  11 │    11  B              1  soft           100      0      4      1
  ⋮  │   ⋮       ⋮        ⋮          ⋮         ⋮      ⋮      ⋮      ⋮
  21 │    21  C              1  hard           100      0      4      1
  22 │    22  C              1  hard           100      0      4      1
  23 │    23  C              1  hard           100      0      4      1
  24 │    24  C              1  hard           100      0      4      1
  25 │    25  C              1  hard           100      0      4      1
  26 │    26  C              1  hard           100      0      4      1
  27 │    27  C              1  hard           100      0      4      1
  28 │    28  C              1  hard           100      0      4      1
  29 │    29  C              1  hard           100      0      4      1
  30 │    30  C              1  hard           100      0      4      1
                                                          9 rows omitted

julia> skeleton = generate_design("2S4P2", [50, 25], ["tablet", "capsule"], [10, 9])
76×8 DataFrame
 Row │ id     sequence  period  formulation  amt    time   evid   cmt
     │ Int64  Cat…      Int64   Cat…         Int64  Int64  Int64  Int64
─────┼──────────────────────────────────────────────────────────────────
   1 │     1  RTRT           1  tablet          50      0      4      1
   2 │     1  RTRT           2  capsule         25      0      4      1
   3 │     1  RTRT           3  tablet          50      0      4      1
   4 │     1  RTRT           4  capsule         25      0      4      1
   5 │     2  RTRT           1  tablet          50      0      4      1
   6 │     2  RTRT           2  capsule         25      0      4      1
   7 │     2  RTRT           3  tablet          50      0      4      1
   8 │     2  RTRT           4  capsule         25      0      4      1
   9 │     3  RTRT           1  tablet          50      0      4      1
  10 │     3  RTRT           2  capsule         25      0      4      1
  11 │     3  RTRT           3  tablet          50      0      4      1
  ⋮  │   ⋮       ⋮        ⋮          ⋮         ⋮      ⋮      ⋮      ⋮
  67 │    17  TRTR           3  capsule         25      0      4      1
  68 │    17  TRTR           4  tablet          50      0      4      1
  69 │    18  TRTR           1  capsule         25      0      4      1
  70 │    18  TRTR           2  tablet          50      0      4      1
  71 │    18  TRTR           3  capsule         25      0      4      1
  72 │    18  TRTR           4  tablet          50      0      4      1
  73 │    19  TRTR           1  capsule         25      0      4      1
  74 │    19  TRTR           2  tablet          50      0      4      1
  75 │    19  TRTR           3  capsule         25      0      4      1
  76 │    19  TRTR           4  tablet          50      0      4      1
                                                         55 rows omitted

julia> rng = StableRNG(123);

julia> data = innerjoin(skeleton,
                        DataFrame(id = 1:size(skeleton, 1),
                                  wt = rand(rng, 100:200, size(skeleton, 1)),
                                  age = rand(rng, 25:85, size(skeleton, 1))),
                        on = :id)
76×10 DataFrame
 Row │ id     sequence  period  formulation  amt    time   evid   cmt    wt     age
     │ Int64  Cat…      Int64   Cat…         Int64  Int64  Int64  Int64  Int64  Int64
─────┼────────────────────────────────────────────────────────────────────────────────
   1 │     1  RTRT           1  tablet          50      0      4      1    143     33
   2 │     1  RTRT           2  capsule         25      0      4      1    143     33
   3 │     1  RTRT           3  tablet          50      0      4      1    143     33
   4 │     1  RTRT           4  capsule         25      0      4      1    143     33
   5 │     2  RTRT           1  tablet          50      0      4      1    178     84
   6 │     2  RTRT           2  capsule         25      0      4      1    178     84
   7 │     2  RTRT           3  tablet          50      0      4      1    178     84
   8 │     2  RTRT           4  capsule         25      0      4      1    178     84
   9 │     3  RTRT           1  tablet          50      0      4      1    195     80
  10 │     3  RTRT           2  capsule         25      0      4      1    195     80
  11 │     3  RTRT           3  tablet          50      0      4      1    195     80
  ⋮  │   ⋮       ⋮        ⋮          ⋮         ⋮      ⋮      ⋮      ⋮      ⋮      ⋮
  67 │    17  TRTR           3  capsule         25      0      4      1    137     28
  68 │    17  TRTR           4  tablet          50      0      4      1    137     28
  69 │    18  TRTR           1  capsule         25      0      4      1    100     46
  70 │    18  TRTR           2  tablet          50      0      4      1    100     46
  71 │    18  TRTR           3  capsule         25      0      4      1    100     46
  72 │    18  TRTR           4  tablet          50      0      4      1    100     46
  73 │    19  TRTR           1  capsule         25      0      4      1    147     70
  74 │    19  TRTR           2  tablet          50      0      4      1    147     70
  75 │    19  TRTR           3  capsule         25      0      4      1    147     70
  76 │    19  TRTR           4  tablet          50      0      4      1    147     70
                                                                       55 rows omitted

Bioequivalence.pumas_be — Method

pumas_be(data::AbstractDataFrame;
         endpoint::Union{Integer, Symbol} = :AUC,
         logtransformed::Bool = false,
         σw₀::Real = 0.1,
         𝛥::Real = 1.11,
         id::Union{Integer, Symbol} = :id,
         sequence::Union{Integer, Symbol} = :sequence,
         period::Union{Integer, Symbol} = :period,
         reference::Union{Nothing, Char} = nothing,
         method::Symbol = occursin(r"(?i)tmax", string(endpoint)) ? :nonpar : :fda,
         reml::Bool = false,
         )::BioequivalenceStudy

BioequivalenceStudy constructor.

Private

Base.summary — Method

reference_scaled!(obj::BioequivalenceStudy)

Adds the reference-scaled parameters when applicable.

Bioequivalence._compute_σ²wv — Method

_compute_σ²wv(data::AbstractDataFrame, sequence::Union{AbstractString,Symbol})::DataFrame

Used by compute_σ²wv internally to compute the value per formulation. The data argument assumes a single formulation.

Bioequivalence.abe — Function

abe(::Type{T},
    data::AbstractDataFrame,
    test::AbstractChar,
    reference::AbstractChar) where T <: Union{ApproximateSignedRankTest,
                                              UnequalVarianceTTest}::DataFrame

Average Bioequivalence Bioequivalence Modeling

Bioequivalence.compute_σ²wv — Method

compute_σ²wv(data::AbstractDataFrame, id::Symbol, sequence::Symbol, endpoint::Symbol)::DataFrame

Returns the values per formulation for assessing population bioequivalence.