Model Selection Methods
In addition to structural PK/PD model development, Pumas can be used to perform covariate model selection. These methods are commonly used in pharmacometric practice to support the selection of covariates in PK/PD models. Among the several model selection methods in pharmacometrics [2] [3] and others, Pumas currently only implements the Stepwise Covariate Model (SCM). SCM is a model building strategy that is used to identify the best covariate model for a given dataset and makes certain assumptions in the process. Important definitions and considerations for automated covariate modeling are discussed at [4].
Under the SCM, there are two possible ways to screen for covariate effects, given a selection criterion (e.g. AIC or BIC) to evaluate the models:
The forward addition of covariates:
- Start with a model with no covariates;
- Add covariates one at a time, and select the covariate that improves the criterion the most; and
- Repeat until no covariates can be added that improve the criterion.
The backward elimination of covariates:
- Start with a model with all covariates;
- Remove covariates one at a time, and select the covariate that improves the criterion the most; and
- Repeat until no covariates can be removed that improve the criterion.
Pumas supports both forward addition and backward elimination of covariates. SCM is performed using the covariate_select
function. It uses the same syntax as fit
regarding positional arguments, but with different keyword arguments. Positional arguments are:
model
: anyPumasModel
model created with the@model
macro;population
: aPopulation
object created with theread_pumas
function;initial_parameters
: aNamedTuple
of initial parameters; andlikelihood_approximation
: a fitting method to use, e.g.FOCE()
.
The keyword arguments are:
control_param
: aTuple
filled withSymbol
s of covariates parameters to be tested;criterion
: criterion for covariate selection, this needs to be a function that accepts aPumasFittedModel
, e.g.aic
orbic
, and returns a real value;method
: method for covariate selection, eitherCovariateSelection.Forward
orCovariateSelection.Backward
; andfit_options
: aNamedTuple
of options to be passed tofit
during the SCM procedure.
The model
should be written such that it includes not only all relevant covariates, but also its functional forms, i.e. the covariate effect. This is because the SCM procedure will be performed on the effect of the covariate, such that a zero restriction on a parameter listed in control_param
eliminates the effect of the covariate (see example below). Hence, once the parameter listed is restricted to zero, the covariate effect is completely removed from the model.
Currently, the two methods CovariateSelection.Forward
(the default) and CovariateSelection.Backward
are supported.
The method CovariateSelection.Forward
starts from the models where all parameters in control_param
are restricted to zero. In the next step, a model is fit where all parameters in control_param
but one are restricted to zero. If none of these models attain a smaller value of criterion
then the model with all zero restrictions is chosen. Otherwise, the model with the lowest criterion
is chosen, and the procedure continues by fitting all models with two free parameters, where one of the free parameters must be the one corresponding to the best fit in the previous step. New steps are taken as long as the criterion is reduced.
If the CovariateSelection.Backward
method is chosen, then the procedure starts with all parameters in control_param
unrestricted and then one parameter is restricted at a time until no there is no further improvement.
Any callable object that returns a real value when a fitted model is passed can be used as criterion
. It is always assumed that a smaller value is better. The default value is the Akaike Information Criterion aic
. Another commonly used criterion the Bayesian Information Criterion bic
, which puts a higher penalty on the number of parameters compared to aic
.
To avoid that logging messages from the calls to fit
fills up the output window or default log, Suppressing this output might hide important issues with the model fitting, so it is possible to enable complete logging by passing verbose=true
. This might hide important issues with the model fitting so for it is possible to enable complete logging by passing verbose=true
.
Here is the function signature of the covariate_select
function:
Pumas.covariate_select
— Functionfunction covariate_select(
model::AbstractPumasModel,
population::Population,
param::NamedTuple,
approx::LikelihoodApproximation;
control_param::Union{Tuple,Nothing},
criterion = aic,
method::CovariateSelection.T = CovariateSelection.Forward,
fit_options::NamedTuple = (;),
verbose::Bool = false,
) -> CovariateSelectionResults
Use a covariate selection procedure to find the best fit of model
to population
with starting values param
and estimation method/likelihood approximation approx
according to criterion
by restricting parameters to zero for subsets of control_param
traversed according to method
.
The model
should be written such that it includes all relevant covariates, possibly included with multiple different functional forms, such that that restricting on a parameter listed in control_param
to zero eliminates the effect of the covariate (see example below).
Currently, the two methods CovariateSelection.Forward
(the default) and CovariateSelection.Backward
are supported. The method CovariateSelection.Forward
starts from the models where all parameters in control_param
are restricted to zero. In the next step, a model is fit where all parameters in control_param
but one are restricted to zero. If none of these models attain a smaller value of criterion
then the model that restricts all parameters to zero is chosen. Otherwise, the model with the lowest criterion
is chosen and the procedure continues by fitting all models with two estimated parameters where one of the estimated parameters must be the one corresponding to the best fit in the previous step. New steps are taken as long as the criterion is reduced. If the CovariateSelection.Backward
method is chosen then the procedure starts with all parameters in control_param
unrestricted and then one parameter is restricted at a time until no there is no further improvement.
Any callable object that returns a real value when a fitted model is passed can be used as criterion
. It is always assumed that a smaller value is better. The default value is the Akaike Information Criterion aic
but another commonly used criterion the Bayesian Information Criterion bic
which puts a higher penalty on the number of parameters compared to aic
.
To avoid that logging messages from the calls to fit
fills up the output window or default log, all logging messages produced by fit
are by default suppressed. This might hide important issues with the model fitting so for it is possible to enable complete logging by passing verbose = true
.
Example
model = @model begin
@param begin
β0 ∈ RealDomain()
β1 ∈ RealDomain()
β2 ∈ RealDomain()
β3 ∈ RealDomain()
β4 ∈ RealDomain()
end
@covariates x1 x2 x3 x4
@pre μ = β0 + β1 * x1 + β2 * x2 + β3 * x3 + β4 * x4
@derived begin
y ~ Normal.(μ, 0.1)
end
end
selection = covariate_select(
model,
population,
init_params(model),
NaivePooled();
control_param = (:β1, :β2, :β3, :β4),
method = CovariateSelection.Forward,
criterion = bic,
)
Example with one functional form per covariate
For the SCM example we'll use the po_sad_1
dataset from PharmaDatasets
:
using Pumas
using PharmaDatasets
using DataFrames
df = dataset("po_sad_1")
describe(df)
Row | variable | mean | min | median | max | nmissing | eltype |
---|---|---|---|---|---|---|---|
Symbol | Union… | Any | Union… | Any | Int64 | Type | |
1 | id | 9.5 | 1 | 9.5 | 18 | 0 | Int64 |
2 | time | 19.0312 | 0.0 | 10.0 | 72.0 | 0 | Float64 |
3 | dv | 353.73 | 17.8654 | 268.471 | 2232.39 | 18 | Union{Missing, Float64} |
4 | amt | 60.0 | 30.0 | 60.0 | 90.0 | 270 | Union{Missing, Float64} |
5 | evid | 0.0625 | 0 | 0.0 | 1 | 0 | Int64 |
6 | cmt | 1.0 | 1 | 1.0 | 1 | 270 | Union{Missing, Int64} |
7 | rate | 0.0 | 0.0 | 0.0 | 0.0 | 0 | Float64 |
8 | age | 36.6111 | 20 | 37.5 | 55 | 0 | Int64 |
9 | wt | 80.1667 | 30 | 72.0 | 120 | 0 | Int64 |
10 | doselevel | 60.0 | 30 | 60.0 | 90 | 0 | Int64 |
11 | isPM | no | yes | 0 | String3 | ||
12 | isfed | no | yes | 0 | String3 | ||
13 | sex | female | male | 0 | String7 | ||
14 | route | ev | ev | 0 | String3 |
Here we have three covariates wt
, isPM
and isfed
that we want to test. wt
is a continuous covariate and isPM
and isfed
are binary covariates. We can read the data into a Population
object with the read_pumas
function:
pop = read_pumas(df; observations = [:dv], covariates = [:wt, :isPM, :isfed])
Population
Subjects: 18
Covariates: wt, isPM, isfed
Observations: dv
Now we define our model with all the covariates we want to test included in the equations for the parameters for which they will be tested. SCM in Pumas needs a model with all the covariates:
model = @model begin
@metadata begin
desc = "final covariate model"
timeu = u"hr"
end
@param begin
"""
Clearance (L/hr)
"""
tvcl ∈ RealDomain(; lower = 0)
"""
Central Volume (L)
"""
tvvc ∈ RealDomain(; lower = 0)
"""
Peripheral Volume (L)
"""
tvvp ∈ RealDomain(; lower = 0)
"""
Distributional Clearance (L/hr)
"""
tvq ∈ RealDomain(; lower = 0)
"""
Fed - Absorption rate constant (h-1)
"""
tvkafed ∈ RealDomain(; lower = 0)
"""
Fasted - Absorption rate constant (h-1)
"""
tvkafasted ∈ RealDomain(; lower = 0)
"""
Power exponent on weight for Clearance
"""
dwtcl ∈ RealDomain()
"""
Proportional change in CL due to PM
"""
ispmoncl ∈ RealDomain(; lower = -1, upper = 1)
"""
Components: CL, Vc, Ka, Q, Vp
"""
Ω ∈ PDiagDomain(5)
"""
Proportional RUV (SD scale)
"""
σ_p ∈ RealDomain(; lower = 0)
end
@random begin
η ~ MvNormal(Ω)
end
@covariates begin
"""
Poor Metabolizer
"""
isPM
"""
Fed status
"""
isfed
"""
Weight (kg)
"""
wt
end
@pre begin
CL = tvcl * (wt / 70)^dwtcl * (1 + ispmoncl * (isPM == "yes")) * exp(η[1])
Vc = tvvc * (wt / 70) * exp(η[2])
Ka = (tvkafed + (isfed == "no") * (tvkafasted)) * exp(η[3])
Q = tvq * (wt / 70)^0.75 * exp(η[4])
Vp = tvvp * (wt / 70) * exp(η[5])
end
@dynamics Depots1Central1Periph1
@derived begin
cp := @. 1000 * (Central / Vc)
"""
DrugY Concentration (ng/mL)
"""
dv ~ @. Normal(cp, cp * σ_p)
end
end
PumasModel
Parameters: tvcl, tvvc, tvvp, tvq, tvkafed, tvkafasted, dwtcl, ispmoncl, Ω, σ_p
Random effects: η
Covariates: isPM, isfed, wt
Dynamical system variables: Depot, Central, Peripheral
Dynamical system type: Closed form
Derived: dv
Observed: dv
Note that the tvkafasted
, dwtcl
, and ispmoncl
are parameters in the @param
block that define the typical values of the magnitudes of the effects of the covariates isfed
, wt
(on clearance), and isPM
; respectively. These will be the parameters that we will be testing for the covariate selection.
Now we define our initial parameters:
iparams = (;
tvkafed = 0.4,
tvkafasted = 1.5,
tvcl = 4.0,
tvvc = 70.0,
tvq = 4.0,
tvvp = 50.0,
dwtcl = 0.75,
ispmoncl = -0.7,
Ω = Diagonal([0.04, 0.04, 0.04, 0.04, 0.04]),
σ_p = 0.1,
)
(tvkafed = 0.4,
tvkafasted = 1.5,
tvcl = 4.0,
tvvc = 70.0,
tvq = 4.0,
tvvp = 50.0,
dwtcl = 0.75,
ispmoncl = -0.7,
Ω = [0.04 0.0 … 0.0 0.0; 0.0 0.04 … 0.0 0.0; … ; 0.0 0.0 … 0.04 0.0; 0.0 0.0 … 0.0 0.04],
σ_p = 0.1,)
Finally, we can run the covariate selection procedure. We'll use FOCE()
as our estimation method and the tvkafasted
and ispmoncl
as the parameters to test for covariate selection. All others keyword arguments will be left to their default values, which implies aic
as the criterion and the forward inclusion method.
covar_result = covariate_select(
model,
pop,
iparams,
FOCE();
control_param = (:dwtcl, :tvkafasted, :ispmoncl), # this is a Tuple
)
Pumas.CovariateSelectionResults
Criterion: aic
Method: Forward
Best model:
Unrestricted parameters: dwtcl, tvkafasted, ispmoncl
Zero restricted parameters:
Criterion value: 2641.59
Fitted model summary:
7×4 DataFrame
Row │ estimated_parameters restricted_parameters fitted_model criterion
│ Tuple… Tuple… FittedPumas… Float64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ () (:dwtcl, :tvkafasted, :ispmoncl) FittedPumasModel(...) 2714.96
2 │ (:dwtcl,) (:tvkafasted, :ispmoncl) FittedPumasModel(...) 2713.25
3 │ (:tvkafasted,) (:dwtcl, :ispmoncl) FittedPumasModel(...) 2680.07
4 │ (:ispmoncl,) (:dwtcl, :tvkafasted) FittedPumasModel(...) 2702.59
5 │ (:dwtcl, :tvkafasted) (:ispmoncl,) FittedPumasModel(...) 2678.3
6 │ (:tvkafasted, :ispmoncl) (:dwtcl,) FittedPumasModel(...) 2667.72
7 │ (:dwtcl, :tvkafasted, :ispmoncl) () FittedPumasModel(...) 2641.59
You can access the results of the covariate selection procedure by accessing the fits
field of the covar_result
object, and convert it to a DataFrame
in order to perform any data wrangling or plotting. Here's an example of how to sort the results by the criterion
value:
sort(DataFrame(covar_result.fits), :criterion)
Row | estimated_parameters | restricted_parameters | fitted_model | criterion |
---|---|---|---|---|
Tuple… | Tuple… | FittedPu… | Float64 | |
1 | (:dwtcl, :tvkafasted, :ispmoncl) | () | FittedPumasModel(...) | 2641.59 |
2 | (:tvkafasted, :ispmoncl) | (:dwtcl,) | FittedPumasModel(...) | 2667.72 |
3 | (:dwtcl, :tvkafasted) | (:ispmoncl,) | FittedPumasModel(...) | 2678.3 |
4 | (:tvkafasted,) | (:dwtcl, :ispmoncl) | FittedPumasModel(...) | 2680.07 |
5 | (:ispmoncl,) | (:dwtcl, :tvkafasted) | FittedPumasModel(...) | 2702.59 |
6 | (:dwtcl,) | (:tvkafasted, :ispmoncl) | FittedPumasModel(...) | 2713.25 |
7 | () | (:dwtcl, :tvkafasted, :ispmoncl) | FittedPumasModel(...) | 2714.96 |
As you can see, after running the covariate selection procedure, we get a summary of the best model found and the fitted model for each step of the procedure.
You can recover the best model fit by accessing the best_model
field of the covar_result
object:
covar_result.best_model
FittedPumasModel
Successful minimization: true
Likelihood approximation: FOCE
Likelihood Optimizer: BFGS
Dynamical system type: Closed form
Log-likelihood value: -1306.793
Number of subjects: 18
Number of parameters: Fixed Optimized
0 14
Observation records: Active Missing
dv: 270 0
Total: 270 0
-------------------------
Estimate
-------------------------
tvcl 3.6888
tvvc 70.044
tvvp 48.706
tvq 4.3506
tvkafed 0.41114
tvkafasted 1.0894
dwtcl 0.81856
ispmoncl -0.62382
Ω₁,₁ 0.022271
Ω₂,₂ 0.050192
Ω₃,₃ 0.031194
Ω₄,₄ 0.10595
Ω₅,₅ 0.035072
σ_p 0.098043
-------------------------
Changing the Criterion and the Selection Method
As stated before, the default criterion is aic
and the default method is CovariateSelection.Forward
. You can change these by passing the criterion
and method
keyword arguments to the covariate_select
function. Any criterion function that takes a FittedPumasModel
as input and returns a real number (Float64
) can be used. The same applies to the selection method, which can be any of the following:
CovariateSelection.Forward
(default): forward inclusion method; orCovariateSelection.Backward
: backward elimination method.
Here's the previous example, but now using the bic
criterion and the backward inclusion method:
covar_result = covariate_select(
model,
pop,
iparams,
FOCE();
control_param = (:dwtcl, :tvkafasted, :ispmoncl),
criterion = bic,
method = CovariateSelection.Backward,
)
Pumas.CovariateSelectionResults
Criterion: bic
Method: Backward
Best model:
Unrestricted parameters: dwtcl, tvkafasted, ispmoncl
Zero restricted parameters:
Criterion value: 2691.96
Fitted model summary:
4×4 DataFrame
Row │ estimated_parameters restricted_parameters fitted_model criterion
│ Tuple… Tuple{Vararg{Symbol}} FittedPumas… Float64
─────┼───────────────────────────────────────────────────────────────────────────────────────────
1 │ (:dwtcl, :tvkafasted, :ispmoncl) () FittedPumasModel(...) 2691.96
2 │ (:tvkafasted, :ispmoncl) (:dwtcl,) FittedPumasModel(...) 2714.5
3 │ (:dwtcl, :ispmoncl) (:tvkafasted,) FittedPumasModel(...) 2724.05
4 │ (:dwtcl, :tvkafasted) (:ispmoncl,) FittedPumasModel(...) 2725.08
As you can see the results are the same from the previous example, but with different criterion
values. This is because the bic
criterion value is penalized more for the number of parameters than the aic
criterion value.
Example with a mixed routine
One common mixed routine is to first perform a forward selection, followed by a backwards elimination. In all of these SCM routines, we'll use a likelihood-ratio test to determine if the covariate should be included or excluded.
The criterion for the likelihood-ratio test is the $\chi^2$ distribution with one degree of freedom:
lrt_criterion(f; α = 0.01) = dof(f) * cquantile(Chisq(1), α) - (2 * loglikelihood(f))
lrt_criterion (generic function with 1 method)
Additionally, we'll use different α
values for the forward and backward selection steps:
α = 0.01
for the forward selection step; andα = 0.001
for the backward selection step.
The setup is similar to the previous examples, with the same model and dataset. First, we'll run covariate_select
with lrt_criterion
as a criterion and the forward selection method. Hence, our α = 0.01
(we use the default value defined in lrt_criterion
):
covar_result_fs = covariate_select(
model,
pop,
iparams,
FOCE();
control_param = (:dwtcl, :tvkafasted, :ispmoncl),
criterion = lrt_criterion,
method = CovariateSelection.Forward,
)
Pumas.CovariateSelectionResults
Criterion: lrt_criterion
Method: Forward
Best model:
Unrestricted parameters: dwtcl, tvkafasted, ispmoncl
Zero restricted parameters:
Criterion value: 2706.47
Fitted model summary:
7×4 DataFrame
Row │ estimated_parameters restricted_parameters fitted_model criterion
│ Tuple… Tuple… FittedPumas… Float64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ () (:dwtcl, :tvkafasted, :ispmoncl) FittedPumasModel(...) 2765.94
2 │ (:dwtcl,) (:tvkafasted, :ispmoncl) FittedPumasModel(...) 2768.87
3 │ (:tvkafasted,) (:dwtcl, :ispmoncl) FittedPumasModel(...) 2735.69
4 │ (:ispmoncl,) (:dwtcl, :tvkafasted) FittedPumasModel(...) 2758.21
5 │ (:dwtcl, :tvkafasted) (:ispmoncl,) FittedPumasModel(...) 2738.55
6 │ (:tvkafasted, :ispmoncl) (:dwtcl,) FittedPumasModel(...) 2727.97
7 │ (:dwtcl, :tvkafasted, :ispmoncl) () FittedPumasModel(...) 2706.47
Now we extract the best model from the covar_result
by accessing the best_model
field. That gives us a model fit, which we can use to extract the best model:
best_model_fs = covar_result_fs.best_model.model
PumasModel
Parameters: tvcl, tvvc, tvvp, tvq, tvkafed, tvkafasted, dwtcl, ispmoncl, Ω, σ_p
Random effects: η
Covariates: isPM, isfed, wt
Dynamical system variables: Depot, Central, Peripheral
Dynamical system type: Closed form
Derived: dv
Observed: dv
Finally, we run covariate_select
with lrt_criterion
as a criterion and the backward elimination method. Hence, our α = 0.001
(we need to override the default value defined in lrt_criterion
):
covar_result_bs = covariate_select(
best_model_fs,
pop,
iparams,
FOCE();
control_param = (:dwtcl, :tvkafasted, :ispmoncl),
criterion = x -> lrt_criterion(x; α = 0.001),
method = CovariateSelection.Backward,
)
Pumas.CovariateSelectionResults
Criterion: #16
Method: Backward
Best model:
Unrestricted parameters: dwtcl, tvkafasted, ispmoncl
Zero restricted parameters:
Criterion value: 2765.17
Fitted model summary:
4×4 DataFrame
Row │ estimated_parameters restricted_parameters fitted_model criterion
│ Tuple… Tuple{Vararg{Symbol}} FittedPumas… Float64
─────┼───────────────────────────────────────────────────────────────────────────────────────────
1 │ (:dwtcl, :tvkafasted, :ispmoncl) () FittedPumasModel(...) 2765.17
2 │ (:tvkafasted, :ispmoncl) (:dwtcl,) FittedPumasModel(...) 2782.48
3 │ (:dwtcl, :ispmoncl) (:tvkafasted,) FittedPumasModel(...) 2792.03
4 │ (:dwtcl, :tvkafasted) (:ispmoncl,) FittedPumasModel(...) 2793.06