Model Selection Methods

In addition to structural PK/PD model development, Pumas can be used to perform covariate model selection. These methods are commonly used in pharmacometric practice to support the selection of covariates in PK/PD models. Among the several model selection methods in pharmacometrics [4] [5] and others, Pumas currently only implements the Stepwise Covariate Model (SCM). SCM is a model building strategy that is used to identify the best covariate model for a given dataset and makes certain assumptions in the process. Important definitions and considerations for automated covariate modeling are discussed at [6].

Under the SCM, there are two possible ways to screen for covariate effects, given a selection criterion (e.g. AIC or BIC) to evaluate the models:

The forward addition of covariates:
- Start with a model with no covariates;
- Add covariates one at a time, and select the covariate that improves the criterion the most; and
- Repeat until no covariates can be added that improve the criterion.
The backward elimination of covariates:
- Start with a model with all covariates;
- Remove covariates one at a time, and select the covariate that improves the criterion the most; and
- Repeat until no covariates can be removed that improve the criterion.

Pumas supports both forward addition and backward elimination of covariates. SCM is performed using the covariate_select function. It uses the same syntax as fit regarding positional arguments, but with different keyword arguments. Positional arguments are:

model: any PumasModel model created with the @model macro;
population: a Population object created with the read_pumas function;
initial_parameters: a NamedTuple of initial parameters; and
likelihood_approximation: a fitting method to use, e.g. FOCE().

The keyword arguments are:

control_param: a Tuple filled with Symbols of covariates parameters to be tested;
criterion: criterion for covariate selection, this needs to be a function that accepts a PumasFittedModel, e.g. aic or bic, and returns a real value;
method: method for covariate selection, either CovariateSelection.Forward or CovariateSelection.Backward; and
fit_options: a NamedTuple of options to be passed to fit during the SCM procedure.

The model should be written such that it includes not only all relevant covariates, but also its functional forms, i.e. the covariate effect. This is because the SCM procedure will be performed on the effect of the covariate, such that a zero restriction on a parameter listed in control_param eliminates the effect of the covariate (see example below). Hence, once the parameter listed is restricted to zero, the covariate effect is completely removed from the model.

Currently, the two methods CovariateSelection.Forward (the default) and CovariateSelection.Backward are supported.

The method CovariateSelection.Forward starts from the models where all parameters in control_param are restricted to zero. In the next step, a model is fit where all parameters in control_param but one are restricted to zero. If none of these models attain a smaller value of criterion then the model with all zero restrictions is chosen. Otherwise, the model with the lowest criterion is chosen, and the procedure continues by fitting all models with two free parameters, where one of the free parameters must be the one corresponding to the best fit in the previous step. New steps are taken as long as the criterion is reduced.

If the CovariateSelection.Backward method is chosen, then the procedure starts with all parameters in control_param unrestricted and then one parameter is restricted at a time until no there is no further improvement.

Any callable object that returns a real value when a fitted model is passed can be used as criterion. It is always assumed that a smaller value is better. The default value is the Akaike Information Criterion aic. Another commonly used criterion the Bayesian Information Criterion bic, which puts a higher penalty on the number of parameters compared to aic.

To avoid that logging messages from the calls to fit fills up the output window or default log, Suppressing this output might hide important issues with the model fitting, so it is possible to enable complete logging by passing verbose=true. This might hide important issues with the model fitting so for it is possible to enable complete logging by passing verbose=true.

Here is the function signature of the covariate_select function:

Pumas.covariate_select — Function

function covariate_select(
    model::AbstractPumasModel,
    population::Population,
    param::NamedTuple,
    approx::LikelihoodApproximation;
    control_param::Union{Tuple,Nothing},
    criterion = aic,
    method::CovariateSelection.T = CovariateSelection.Forward,
    fit_options::NamedTuple = (;),
    verbose::Bool = false,
) -> CovariateSelectionResults

Use a covariate selection procedure to find the best fit of model to population with starting values param and estimation method/likelihood approximation approx according to criterion by restricting parameters to zero for subsets of control_param traversed according to method.

The model should be written such that it includes all relevant covariates, possibly included with multiple different functional forms, such that that restricting on a parameter listed in control_param to zero eliminates the effect of the covariate (see example below).

Currently, the two methods CovariateSelection.Forward (the default) and CovariateSelection.Backward are supported. The method CovariateSelection.Forward starts from the models where all parameters in control_param are restricted to zero. In the next step, a model is fit where all parameters in control_param but one are restricted to zero. If none of these models attain a smaller value of criterion then the model that restricts all parameters to zero is chosen. Otherwise, the model with the lowest criterion is chosen and the procedure continues by fitting all models with two estimated parameters where one of the estimated parameters must be the one corresponding to the best fit in the previous step. New steps are taken as long as the criterion is reduced. If the CovariateSelection.Backward method is chosen then the procedure starts with all parameters in control_param unrestricted and then one parameter is restricted at a time until no there is no further improvement.

Any callable object that returns a real value when a fitted model is passed can be used as criterion. It is always assumed that a smaller value is better. The default value is the Akaike Information Criterion aic but another commonly used criterion the Bayesian Information Criterion bic which puts a higher penalty on the number of parameters compared to aic.

To avoid that logging messages from the calls to fit fills up the output window or default log, all logging messages produced by fit are by default suppressed. This might hide important issues with the model fitting so for it is possible to enable complete logging by passing verbose = true.

Example

model = @model begin
  @param begin
    β0 ∈ RealDomain()
    β1 ∈ RealDomain()
    β2 ∈ RealDomain()
    β3 ∈ RealDomain()
    β4 ∈ RealDomain()
  end
  @covariates x1 x2 x3 x4
  @pre μ = β0 + β1 * x1 + β2 * x2 + β3 * x3 + β4 * x4
  @derived begin
    y ~ Normal.(μ, 0.1)
  end
end

selection = covariate_select(
  model,
  population,
  init_params(model),
  NaivePooled();
  control_param = (:β1, :β2, :β3, :β4),
  method = CovariateSelection.Forward,
  criterion = bic,
  fit_options = (; constantcoef = (:β2,)),
)

Example with one functional form per covariate

For the SCM example we'll use the po_sad_1 dataset from PharmaDatasets:

using Pumas
using PharmaDatasets
using DataFrames

df = dataset("po_sad_1")
describe(df)

14×7 DataFrame

Row	variable	mean	min	median	max	nmissing	eltype
	Symbol	Union…	Any	Union…	Any	Int64	Type
1	id	9.5	1	9.5	18	0	Int64
2	time	19.0312	0.0	10.0	72.0	0	Float64
3	dv	353.73	17.8654	268.471	2232.39	18	Union{Missing, Float64}
4	amt	60.0	30.0	60.0	90.0	270	Union{Missing, Float64}
5	evid	0.0625	0	0.0	1	0	Int64
6	cmt	1.0	1	1.0	1	270	Union{Missing, Int64}
7	rate	0.0	0.0	0.0	0.0	0	Float64
8	age	36.6111	20	37.5	55	0	Int64
9	wt	80.1667	30	72.0	120	0	Int64
10	doselevel	60.0	30	60.0	90	0	Int64
11	isPM		no		yes	0	String3
12	isfed		no		yes	0	String3
13	sex		female		male	0	String7
14	route		ev		ev	0	String3

Here we have three covariates wt, isPM and isfed that we want to test. wt is a continuous covariate and isPM and isfed are binary covariates. We can read the data into a Population object with the read_pumas function:

pop = read_pumas(df; observations = [:dv], covariates = [:wt, :isPM, :isfed])

Population
  Subjects: 18
  Covariates: wt, isPM, isfed
  Observations: dv

Now we define our model with all the covariates we want to test included in the equations for the parameters for which they will be tested. SCM in Pumas needs a model with all the covariates:

model = @model begin
    @metadata begin
        desc = "final covariate model"
        timeu = u"hr"
    end

    @param begin
        """
        Clearance (L/hr)
        """
        tvcl ∈ RealDomain(; lower = 0)
        """
        Central Volume (L)
        """
        tvvc ∈ RealDomain(; lower = 0)
        """
        Peripheral Volume (L)
        """
        tvvp ∈ RealDomain(; lower = 0)
        """
        Distributional Clearance (L/hr)
        """
        tvq ∈ RealDomain(; lower = 0)
        """
        Fed - Absorption rate constant (h-1)
        """
        tvkafed ∈ RealDomain(; lower = 0)
        """
        Fasted - Absorption rate constant (h-1)
        """
        tvkafasted ∈ RealDomain(; lower = 0)
        """
        Power exponent on weight for Clearance
        """
        dwtcl ∈ RealDomain()
        """
        Proportional change in CL due to PM
        """
        ispmoncl ∈ RealDomain(; lower = -1, upper = 1)
        """
        Components: CL, Vc, Ka, Q, Vp
        """
        Ω ∈ PDiagDomain(5)
        """
        Proportional RUV (SD scale)
        """
        σ_p ∈ RealDomain(; lower = 0)
    end

    @random begin
        η ~ MvNormal(Ω)
    end

    @covariates begin
        """
        Poor Metabolizer
        """
        isPM
        """
        Fed status
        """
        isfed
        """
        Weight (kg)
        """
        wt
    end

    @pre begin
        CL = tvcl * (wt / 70)^dwtcl * (1 + ispmoncl * (isPM == "yes")) * exp(η[1])
        Vc = tvvc * (wt / 70) * exp(η[2])
        Ka = (tvkafed + (isfed == "no") * (tvkafasted)) * exp(η[3])
        Q = tvq * (wt / 70)^0.75 * exp(η[4])
        Vp = tvvp * (wt / 70) * exp(η[5])
    end

    @dynamics Depots1Central1Periph1

    @derived begin
        cp := @. 1000 * (Central / Vc)
        """
        DrugY Concentration (ng/mL)
        """
        dv ~ @. Normal(cp, cp * σ_p)
    end
end

PumasModel
  Parameters: tvcl, tvvc, tvvp, tvq, tvkafed, tvkafasted, dwtcl, ispmoncl, Ω, σ_p
  Random effects: η
  Covariates: isPM, isfed, wt
  Dynamical system variables: Depot, Central, Peripheral
  Dynamical system type: Closed form
  Derived: dv
  Observed: dv

Note that the tvkafasted, dwtcl, and ispmoncl are parameters in the @param block that define the typical values of the magnitudes of the effects of the covariates isfed, wt (on clearance), and isPM; respectively. These will be the parameters that we will be testing for the covariate selection.

Now we define our initial parameters:

iparams = (;
    tvkafed = 0.4,
    tvkafasted = 1.5,
    tvcl = 4.0,
    tvvc = 70.0,
    tvq = 4.0,
    tvvp = 50.0,
    dwtcl = 0.75,
    ispmoncl = -0.7,
    Ω = Diagonal([0.04, 0.04, 0.04, 0.04, 0.04]),
    σ_p = 0.1,
)

(tvkafed = 0.4,
 tvkafasted = 1.5,
 tvcl = 4.0,
 tvvc = 70.0,
 tvq = 4.0,
 tvvp = 50.0,
 dwtcl = 0.75,
 ispmoncl = -0.7,
 Ω = [0.04 0.0 … 0.0 0.0; 0.0 0.04 … 0.0 0.0; … ; 0.0 0.0 … 0.04 0.0; 0.0 0.0 … 0.0 0.04],
 σ_p = 0.1,)

Finally, we can run the covariate selection procedure. We'll use FOCE() as our estimation method and the tvkafasted and ispmoncl as the parameters to test for covariate selection. All others keyword arguments will be left to their default values, which implies aic as the criterion and the forward inclusion method.

covar_result = covariate_select(
    model,
    pop,
    iparams,
    FOCE();
    control_param = (:dwtcl, :tvkafasted, :ispmoncl), # this is a Tuple
)

Pumas.CovariateSelectionResults
Criterion:                                               aic
Method:                                              Forward

Best model:
  Unrestricted parameters:       dwtcl, tvkafasted, ispmoncl
  Zero restricted parameters:                               
  Criterion value:                                   2641.59

Fitted model summary:
7×4 DataFrame
 Row │ estimated_parameters              restricted_parameters             fitted_model           criterion
     │ Tuple…                            Tuple…                            FittedPumas…           Float64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ ()                                (:dwtcl, :tvkafasted, :ispmoncl)  FittedPumasModel(...)    2714.96
   2 │ (:dwtcl,)                         (:tvkafasted, :ispmoncl)          FittedPumasModel(...)    2713.25
   3 │ (:tvkafasted,)                    (:dwtcl, :ispmoncl)               FittedPumasModel(...)    2680.07
   4 │ (:ispmoncl,)                      (:dwtcl, :tvkafasted)             FittedPumasModel(...)    2702.59
   5 │ (:dwtcl, :tvkafasted)             (:ispmoncl,)                      FittedPumasModel(...)    2678.3
   6 │ (:tvkafasted, :ispmoncl)          (:dwtcl,)                         FittedPumasModel(...)    2667.72
   7 │ (:dwtcl, :tvkafasted, :ispmoncl)  ()                                FittedPumasModel(...)    2641.59

You can access the results of the covariate selection procedure by accessing the fits field of the covar_result object, and convert it to a DataFrame in order to perform any data wrangling or plotting. Here's an example of how to sort the results by the criterion value:

sort(DataFrame(covar_result.fits), :criterion)

7×4 DataFrame

Row	estimated_parameters	restricted_parameters	fitted_model	criterion
	Tuple…	Tuple…	FittedPu…	Float64
1	(:dwtcl, :tvkafasted, :ispmoncl)	()	FittedPumasModel(...)	2641.59
2	(:tvkafasted, :ispmoncl)	(:dwtcl,)	FittedPumasModel(...)	2667.72
3	(:dwtcl, :tvkafasted)	(:ispmoncl,)	FittedPumasModel(...)	2678.3
4	(:tvkafasted,)	(:dwtcl, :ispmoncl)	FittedPumasModel(...)	2680.07
5	(:ispmoncl,)	(:dwtcl, :tvkafasted)	FittedPumasModel(...)	2702.59
6	(:dwtcl,)	(:tvkafasted, :ispmoncl)	FittedPumasModel(...)	2713.25
7	()	(:dwtcl, :tvkafasted, :ispmoncl)	FittedPumasModel(...)	2714.96

As you can see, after running the covariate selection procedure, we get a summary of the best model found and the fitted model for each step of the procedure.

You can recover the best model fit by accessing the best_model field of the covar_result object:

covar_result.best_model

FittedPumasModel

Dynamical system type:                 Closed form

Number of subjects:                             18

Observation records:         Active        Missing
    dv:                         270              0
    Total:                      270              0

Number of parameters:      Constant      Optimized
                                  0             14

Likelihood approximation:                     FOCE
Likelihood optimizer:                         BFGS

Termination Reason:              NoObjectiveChange
Log-likelihood value:                    -1306.793

-----------------------
             Estimate
-----------------------
tvcl          3.6888
tvvc         70.044
tvvp         48.706
tvq           4.3506
tvkafed       0.41114
tvkafasted    1.0894
dwtcl         0.81856
ispmoncl     -0.62382
Ω₁,₁          0.022271
Ω₂,₂          0.050192
Ω₃,₃          0.031194
Ω₄,₄          0.10595
Ω₅,₅          0.035072
σ_p           0.098043
-----------------------

Changing the Criterion and the Selection Method

As stated before, the default criterion is aic and the default method is CovariateSelection.Forward. You can change these by passing the criterion and method keyword arguments to the covariate_select function. Any criterion function that takes a FittedPumasModel as input and returns a real number (Float64) can be used. The same applies to the selection method, which can be any of the following:

CovariateSelection.Forward (default): forward inclusion method; or
CovariateSelection.Backward: backward elimination method.

Here's the previous example, but now using the bic criterion and the backward inclusion method:

covar_result = covariate_select(
    model,
    pop,
    iparams,
    FOCE();
    control_param = (:dwtcl, :tvkafasted, :ispmoncl),
    criterion = bic,
    method = CovariateSelection.Backward,
)

Pumas.CovariateSelectionResults
Criterion:                                               bic
Method:                                             Backward

Best model:
  Unrestricted parameters:       dwtcl, tvkafasted, ispmoncl
  Zero restricted parameters:                               
  Criterion value:                                   2691.96

Fitted model summary:
4×4 DataFrame
 Row │ estimated_parameters              restricted_parameters  fitted_model           criterion
     │ Tuple…                            Tuple{Vararg{Symbol}}  FittedPumas…           Float64
─────┼───────────────────────────────────────────────────────────────────────────────────────────
   1 │ (:dwtcl, :tvkafasted, :ispmoncl)  ()                     FittedPumasModel(...)    2691.96
   2 │ (:tvkafasted, :ispmoncl)          (:dwtcl,)              FittedPumasModel(...)    2714.5
   3 │ (:dwtcl, :ispmoncl)               (:tvkafasted,)         FittedPumasModel(...)    2724.05
   4 │ (:dwtcl, :tvkafasted)             (:ispmoncl,)           FittedPumasModel(...)    2725.08

As you can see the results are the same from the previous example, but with different criterion values. This is because the bic criterion value is penalized more for the number of parameters than the aic criterion value.

Example with a mixed routine

One common mixed routine is to first perform a forward selection, followed by a backwards elimination. In all of these SCM routines, we'll use a likelihood-ratio test to determine if the covariate should be included or excluded.

The criterion for the likelihood-ratio test is the $\chi^2$ distribution with one degree of freedom:

lrt_criterion(f; α = 0.01) = dof(f) * cquantile(Chisq(1), α) - (2 * loglikelihood(f))

lrt_criterion (generic function with 1 method)

Additionally, we'll use different α values for the forward and backward selection steps:

α = 0.01 for the forward selection step; and
α = 0.001 for the backward selection step.

The setup is similar to the previous examples, with the same model and dataset. First, we'll run covariate_select with lrt_criterion as a criterion and the forward selection method. Hence, our α = 0.01 (we use the default value defined in lrt_criterion):

covar_result_fs = covariate_select(
    model,
    pop,
    iparams,
    FOCE();
    control_param = (:dwtcl, :tvkafasted, :ispmoncl),
    criterion = lrt_criterion,
    method = CovariateSelection.Forward,
)

Pumas.CovariateSelectionResults
Criterion:                                     lrt_criterion
Method:                                              Forward

Best model:
  Unrestricted parameters:       dwtcl, tvkafasted, ispmoncl
  Zero restricted parameters:                               
  Criterion value:                                   2706.47

Fitted model summary:
7×4 DataFrame
 Row │ estimated_parameters              restricted_parameters             fitted_model           criterion
     │ Tuple…                            Tuple…                            FittedPumas…           Float64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ ()                                (:dwtcl, :tvkafasted, :ispmoncl)  FittedPumasModel(...)    2765.94
   2 │ (:dwtcl,)                         (:tvkafasted, :ispmoncl)          FittedPumasModel(...)    2768.87
   3 │ (:tvkafasted,)                    (:dwtcl, :ispmoncl)               FittedPumasModel(...)    2735.69
   4 │ (:ispmoncl,)                      (:dwtcl, :tvkafasted)             FittedPumasModel(...)    2758.21
   5 │ (:dwtcl, :tvkafasted)             (:ispmoncl,)                      FittedPumasModel(...)    2738.55
   6 │ (:tvkafasted, :ispmoncl)          (:dwtcl,)                         FittedPumasModel(...)    2727.97
   7 │ (:dwtcl, :tvkafasted, :ispmoncl)  ()                                FittedPumasModel(...)    2706.47

Now we extract the best model from the covar_result by accessing the best_model field. That gives us a model fit, which we can use to extract the best model:

best_model_fs = covar_result_fs.best_model.model

PumasModel
  Parameters: tvcl, tvvc, tvvp, tvq, tvkafed, tvkafasted, dwtcl, ispmoncl, Ω, σ_p
  Random effects: η
  Covariates: isPM, isfed, wt
  Dynamical system variables: Depot, Central, Peripheral
  Dynamical system type: Closed form
  Derived: dv
  Observed: dv

Finally, we run covariate_select with lrt_criterion as a criterion and the backward elimination method. Hence, our α = 0.001 (we need to override the default value defined in lrt_criterion):

covar_result_bs = covariate_select(
    best_model_fs,
    pop,
    iparams,
    FOCE();
    control_param = (:dwtcl, :tvkafasted, :ispmoncl),
    criterion = x -> lrt_criterion(x; α = 0.001),
    method = CovariateSelection.Backward,
)

Pumas.CovariateSelectionResults
Criterion:                                               #16
Method:                                             Backward

Best model:
  Unrestricted parameters:       dwtcl, tvkafasted, ispmoncl
  Zero restricted parameters:                               
  Criterion value:                                   2765.17

Fitted model summary:
4×4 DataFrame
 Row │ estimated_parameters              restricted_parameters  fitted_model           criterion
     │ Tuple…                            Tuple{Vararg{Symbol}}  FittedPumas…           Float64
─────┼───────────────────────────────────────────────────────────────────────────────────────────
   1 │ (:dwtcl, :tvkafasted, :ispmoncl)  ()                     FittedPumasModel(...)    2765.17
   2 │ (:tvkafasted, :ispmoncl)          (:dwtcl,)              FittedPumasModel(...)    2782.48
   3 │ (:dwtcl, :ispmoncl)               (:tvkafasted,)         FittedPumasModel(...)    2792.03
   4 │ (:dwtcl, :tvkafasted)             (:ispmoncl,)           FittedPumasModel(...)    2793.06