Covariate Modeling
Pumas can handle subject covariates, both time-varying and constant. They are handled inside the @covariates model block and parsed in a Population with the covariates keyword argument from the read_pumas function.
Let's show an example with the nlme_sample dataset from thePharmaDatasets package:
using Pumas
using PharmaDatasets
using DataFrames
pkdata = dataset("nlme_sample")
describe(pkdata)| Row | variable | mean | min | median | max | nmissing | eltype |
|---|---|---|---|---|---|---|---|
| Symbol | Union… | Any | Union… | Any | Int64 | Type | |
| 1 | ID | 15.5 | 1 | 15.5 | 30 | 0 | Int64 |
| 2 | TIME | 82.6527 | 0.0 | 84.0 | 168.0 | 0 | Float64 |
| 3 | DV | 157.315 | 0.0336074 | 133.221 | 597.654 | 240 | Union{Missing, Float64} |
| 4 | AMT | 750.0 | 500 | 750.0 | 1000 | 540 | Union{Missing, Int64} |
| 5 | EVID | 0.307692 | 0 | 0.0 | 1 | 0 | Int64 |
| 6 | CMT | 1.0 | 1 | 1.0 | 1 | 540 | Union{Missing, Int64} |
| 7 | RATE | 115.385 | 0 | 0.0 | 500 | 0 | Int64 |
| 8 | WT | 81.6 | 57 | 82.0 | 100 | 0 | Int64 |
| 9 | AGE | 40.0333 | 21 | 41.0 | 60 | 0 | Int64 |
| 10 | SEX | F | M | 0 | String1 | ||
| 11 | CRCL | 72.5667 | 32 | 72.5 | 117 | 0 | Int64 |
| 12 | GROUP | 1000 mg | 750 mg | 0 | String7 | ||
| 13 | ROUTE | Inf | Inf | Inf | Inf | 0 | Float64 |
| 14 | DURATION | 2.0 | 2 | 2.0 | 2 | 540 | Union{Missing, Int64} |
| 15 | OCC | 4.15385 | 1 | 4.5 | 8 | 0 | Int64 |
pkdata here is an intravenous infusion given over 2 hours with demographic information (age, weight, sex, and creating clearance). The first thing we need to do is to parse pkdata into a Population with read_pumas while adding the desired columns to be parsed as covariates in the covariates keyword argument:
pop = read_pumas(
pkdata;
id = :ID,
time = :TIME,
amt = :AMT,
covariates = [:WT, :AGE, :SEX, :CRCL, :GROUP],
observations = [:DV],
cmt = :CMT,
evid = :EVID,
rate = :RATE,
)Population
Subjects: 30
Covariates: WT, AGE, SEX, CRCL, GROUP
Observations: DVFor the model part, we need to declare the covariates in the @covariates model block. Then, we can use it in the @pre and @dosecontrol blocks. Here's a two-compartment combined-error model that uses WT as a covariate in allometric scaling of both CL and Vc PK parameters:
model = @model begin
@param begin
tvcl ∈ RealDomain(; lower = 0)
tvvc ∈ RealDomain(; lower = 0)
tvq ∈ RealDomain(; lower = 0)
tvvp ∈ RealDomain(; lower = 0)
Ω ∈ PDiagDomain(2)
σ²_add ∈ RealDomain(; lower = 0)
σ²_prop ∈ RealDomain(; lower = 0)
end
@random begin
η ~ MvNormal(Ω)
end
@covariates WT
@pre begin
wtCL = (WT / 70)^0.75
wtV = (WT / 70)
CL = tvcl * wtCL * exp(η[1])
Vc = tvvc * wtV * exp(η[2])
Q = tvq
Vp = tvvp
end
@dynamics Central1Periph1
@derived begin
cp := @. Central / Vc
DV ~ @. Normal(cp, sqrt(abs(cp)^2 * σ²_prop + σ²_add))
end
endPumasModel
Parameters: tvcl, tvvc, tvq, tvvp, Ω, σ²_add, σ²_prop
Random effects: η
Covariates: WT
Dynamical system variables: Central, Peripheral
Dynamical system type: Closed form
Derived: DV
Observed: DVWe first declare WT in the @covariates block then we are free to use WT in the @pre block.
Now an example with multiple covariates:
model_multiple = @model begin
@param begin
tvcl ∈ RealDomain(; lower = 0)
tvvc ∈ RealDomain(; lower = 0)
tvq ∈ RealDomain(; lower = 0)
tvvp ∈ RealDomain(; lower = 0)
Ω ∈ PDiagDomain(2)
σ²_add ∈ RealDomain(; lower = 0)
σ²_prop ∈ RealDomain(; lower = 0)
end
@random begin
η ~ MvNormal(Ω)
end
@covariates begin
WT
CRCL
end
@pre begin
wtCL = (WT / 70)^0.75
wtV = (WT / 70)
crcl_eff = (CRCL / 95)^0.75
CL = tvcl * wtCL * crcl_eff * exp(η[1])
Vc = tvvc * wtV * exp(η[2])
Q = tvq
Vp = tvvp
end
@dynamics Central1Periph1
@derived begin
cp := @. Central / Vc
DV ~ @. Normal(cp, sqrt(abs(cp)^2 * σ²_prop + σ²_add))
end
endPumasModel
Parameters: tvcl, tvvc, tvq, tvvp, Ω, σ²_add, σ²_prop
Random effects: η
Covariates: WT, CRCL
Dynamical system variables: Central, Peripheral
Dynamical system type: Closed form
Derived: DV
Observed: DVThe @covariates model block, like other Pumas' model blocks, supports one statement per line inside begin ... end.
Creating Binary Covariates
You can use binary covariates inside a Pumas model using either ifelse function or the ternary operator ?. The nlme_sample dataset has the following values for the :SEX column:
unique(pkdata.SEX)2-element Vector{InlineStrings.String1}:
"M"
"F"Here's an example where we are using different residual unexplained variance (the σ²s parameters) for each unique value of :SEX:
model_multiple_errors = @model begin
@param begin
tvcl ∈ RealDomain(; lower = 0)
tvvc ∈ RealDomain(; lower = 0)
tvq ∈ RealDomain(; lower = 0)
tvvp ∈ RealDomain(; lower = 0)
Ω ∈ PDiagDomain(2)
σ²_add_F ∈ RealDomain(; lower = 0)
σ²_prop_F ∈ RealDomain(; lower = 0)
σ²_add_M ∈ RealDomain(; lower = 0)
σ²_prop_M ∈ RealDomain(; lower = 0)
end
@random begin
η ~ MvNormal(Ω)
end
@covariates SEX
@pre begin
CL = tvcl * exp(η[1])
Vc = tvvc * exp(η[2])
Q = tvq
Vp = tvvp
# a different σ for SEX
σ²_add = ifelse(SEX == "M", σ²_add_M, σ²_add_F)
σ²_prop = ifelse(SEX == "M", σ²_prop_M, σ²_prop_F)
end
@dynamics Central1Periph1
@derived begin
cp := @. Central / Vc
DV ~ @. Normal(cp, sqrt(abs(cp)^2 * σ²_prop + σ²_add))
end
endPumasModel
Parameters: tvcl, tvvc, tvq, tvvp, Ω, σ²_add_F, σ²_prop_F, σ²_add_M, σ²_prop_M
Random effects: η
Covariates: SEX
Dynamical system variables: Central, Peripheral
Dynamical system type: Closed form
Derived: DV
Observed: DVYou can nest as many ifelse statements as needed if your covariate has more than 2 categories. For example with :GROUP column:
unique(pkdata.GROUP)3-element Vector{InlineStrings.String7}:
"1000 mg"
"750 mg"
"500 mg"You perform the following nested ifelse:
σ = ifelse(GROUP == "500mg", σ_500, ifelse(GROUP == "750mg", σ_750, σ_1000))Covariates with Coefficients
Some covariates need coefficients. In Pumas models, this can be accomplished by adding the coefficient in the @param block, the covariate in the @covariates block, and the pre-computation of the covariate effect in the @pre block. Here's an example of a linear relationship with no intercept:
@param begin
tvcovar ∈ RealDomain() # the covariate effect coefficient
end
@covariates COVAR
@pre begin
COVAR_effect = tvcovar * COVAR
endFitting Covariate Models
Covariate model fitting is the same as any other Pumas model fitting. You'll need to pass the required positional arguments to the fit function:
- model
- population
- initial parameter values
- estimation method
iparams = (;
tvvc = 5,
tvcl = 0.02,
tvq = 0.01,
tvvp = 10,
Ω = Diagonal([0.01, 0.01]),
σ²_add = 0.01,
σ²_prop = 0.01,
)(tvvc = 5,
tvcl = 0.02,
tvq = 0.01,
tvvp = 10,
Ω = [0.01 0.0; 0.0 0.01],
σ²_add = 0.01,
σ²_prop = 0.01,)model_fit = fit(
model_multiple,
pop,
iparams,
FOCE();
# hide the trace output during fitting
optim_options = (; show_trace = false),
)FittedPumasModel
Dynamical system type: Closed form
Number of subjects: 30
Observation records: Active Missing
DV: 540 0
Total: 540 0
Number of parameters: Constant Optimized
0 8
Likelihood approximation: FOCE
Likelihood optimizer: BFGS
Termination Reason: GradientNorm
Log-likelihood value: -1891.253
--------------------
Estimate
--------------------
tvcl 0.1788
tvvc 3.9762
tvq 0.042664
tvvp 3.8242
Ω₁,₁ 0.17121
Ω₂,₂ 0.081303
σ²_add 0.0010351
σ²_prop 0.0037221
--------------------Annotating Covariates
Users can also annotate covariates inside the @covariates block by adding a string before the covariate. For example:
model_annotated = @model begin
@param begin
tvcl ∈ RealDomain(; lower = 0)
tvvc ∈ RealDomain(; lower = 0)
tvq ∈ RealDomain(; lower = 0)
tvvp ∈ RealDomain(; lower = 0)
Ω ∈ PDiagDomain(2)
σ²_add ∈ RealDomain(; lower = 0)
σ²_prop ∈ RealDomain(; lower = 0)
end
@random begin
η ~ MvNormal(Ω)
end
@covariates begin
"""
Weight
"""
WT
"""
Creatine Clearance
"""
CRCL
end
@pre begin
wtCL = (WT / 70)^0.75
wtV = (WT / 70)
crcl_eff = (CRCL / 95)^0.75
CL = tvcl * wtCL * crcl_eff * exp(η[1])
Vc = tvvc * wtV * exp(η[2])
Q = tvq
Vp = tvvp
end
@dynamics Central1Periph1
@derived begin
cp := @. Central / Vc
DV ~ @. Normal(cp, sqrt(abs(cp)^2 * σ²_prop + σ²_add))
end
endPumasModel
Parameters: tvcl, tvvc, tvq, tvvp, Ω, σ²_add, σ²_prop
Random effects: η
Covariates: WT, CRCL
Dynamical system variables: Central, Peripheral
Dynamical system type: Closed form
Derived: DV
Observed: DVMost of the Pumas functions and plots will use the annotated information in their output:
model_annotated_fit = fit(
model_annotated,
pop,
iparams,
FOCE();
# hide the trace output during fitting
optim_options = (; show_trace = false),
)FittedPumasModel
Dynamical system type: Closed form
Number of subjects: 30
Observation records: Active Missing
DV: 540 0
Total: 540 0
Number of parameters: Constant Optimized
0 8
Likelihood approximation: FOCE
Likelihood optimizer: BFGS
Termination Reason: GradientNorm
Log-likelihood value: -1891.253
--------------------
Estimate
--------------------
tvcl 0.1788
tvvc 3.9762
tvq 0.042664
tvvp 3.8242
Ω₁,₁ 0.17121
Ω₂,₂ 0.081303
σ²_add 0.0010351
σ²_prop 0.0037221
--------------------model_inspect = inspect(model_annotated_fit)FittedPumasModelInspection
Likelihood approximation used for weighted residuals: FOCEusing PumasUtilities
empirical_bayes_vs_covariates(model_inspect)
As you can see WT and CRCL which were annotated in our Pumas model has enhanced labels in empirical_bayes_vs_covariates.
Additionally, notice that even if our model specifies only two covariates, empirical_bayes_vs_covariates uses the underlying information in the Population that the model was fitted to plot all the remaining covariates along with the ones the model specifies.
Time-varying Covariates
Pumas can handle time-varying covariates. This happens automatically if, when parsing a dataset, read_pumas detects that covariate values change over time.
Here's an example with an ordinal regression using the painord dataset from PharmaDatasets. :painord is our observations measuring the perceived pain in a scale from 0 to 3, which we need to have the range shifted by 1 (1 to 4). Additionally, we'll use the concentration in plasma, :conc as a covariate. Of course, :conc varies with time, thus, it is a time-varying covariate:
using Pumas
using PharmaDatasets
using DataFrames
painord = dataset("pumas/pain_remed")
describe(painord)| Row | variable | mean | min | median | max | nmissing | eltype |
|---|---|---|---|---|---|---|---|
| Symbol | Float64 | Real | Float64 | Real | Int64 | DataType | |
| 1 | id | 80.5 | 1 | 80.5 | 160 | 0 | Int64 |
| 2 | arm | 1.5 | 0 | 1.5 | 3 | 0 | Int64 |
| 3 | dose | 26.25 | 0 | 12.5 | 80 | 0 | Int64 |
| 4 | time | 3.375 | 0.0 | 2.75 | 8.0 | 0 | Float64 |
| 5 | conc | 0.93018 | 0.0 | 0.22869 | 8.47195 | 0 | Float64 |
| 6 | painord | 1.50208 | 0 | 1.0 | 3 | 0 | Int64 |
| 7 | dv | 0.508333 | 0 | 1.0 | 1 | 0 | Int64 |
| 8 | remed | 0.059375 | 0 | 0.0 | 1 | 0 | Int64 |
using DataFramesMeta
@rtransform! painord :painord = :painord + 1
pop =
read_pumas(painord; observations = [:painord], covariates = [:conc], event_data = false)Population
Subjects: 160
Covariates: conc (heterogenous)
Observations: painord (heterogenous)The following model uses the Categorical distribution inside the @derived for our observation :painord. There are also some computations in the @pre block to calculate the log-cumulative-odds and probabilities for each categorical value:
ordinal_model = @model begin
@param begin
b₁ ∈ RealDomain(; init = 0)
b₂ ∈ RealDomain(; init = 1)
b₃ ∈ RealDomain(; init = 1)
slope ∈ RealDomain(; init = 0)
end
@covariates conc # time varying
@pre begin
effect = slope * conc
# Logit of cumulative probabilities
lge₁ = b₁ + effect
lge₂ = lge₁ - b₂
lge₃ = lge₂ - b₃
# Probabilities of >=1 and >=2 and >=3
pge₁ = logistic(lge₁)
pge₂ = logistic(lge₂)
pge₃ = logistic(lge₃)
# Probabilities of Y=1,2,3,4
p₁ = 1.0 - pge₁
p₂ = pge₁ - pge₂
p₃ = pge₂ - pge₃
p₄ = pge₃
end
@derived begin
painord ~ @. Categorical(p₁, p₂, p₃, p₄)
end
endPumasModel
Parameters: b₁, b₂, b₃, slope
Random effects:
Covariates: conc
Dynamical system variables:
Dynamical system type: No dynamical model
Derived: painord
Observed: painordAs expected this model fits without problems using NaivePooled as an estimation method:
ordinal_fit = fit(
ordinal_model,
pop,
init_params(ordinal_model),
NaivePooled();
# hide the trace output during fitting
optim_options = (; show_trace = false),
)FittedPumasModel
Dynamical system type: No dynamical model
Number of subjects: 160
Observation records: Active Missing
painord: 1920 0
Total: 1920 0
Number of parameters: Constant Optimized
0 4
Likelihood approximation: NaivePooled
Likelihood optimizer: BFGS
Termination Reason: GradientNorm
Log-likelihood value: -2316.3554
-----------------
Estimate
-----------------
b₁ 2.5112
b₂ 2.1951
b₃ 1.9643
slope -0.38871
-----------------Missing Values in Covariates
The way that Pumas handles missing values inside covariates depends on whether the covariate is constant or time-varying. For both cases Pumas will interpolate the available values to fill in the missing values. If, for any subject, all the covariate's values are missing, Pumas will throw an error while parsing the data with read_pumas.
For both missing constant and time-varying covariates, Pumas, by default, does piece-wise constant interpolation with "next observation carried backward" (NOCB, NONMEM default). Of course for constant covariates the interpolated values over the missing values will be constant values. This can be adjusted with the covariates_direction keyword argument of read_pumas. The default value :right is NOCB and :left is "last observation carried forward" (LOCF, Monolix default).
NONMEM default interpolation can be unintuitive. Imagine a covariate is observed at times [0,1,2,3] and has values [1,2,3,4] then the default interpolation (:right) would interpolate as follows:
| time | value |
|---|---|
| [0] | 1 |
| (0,1] | 2 |
| (1,2] | 3 |
| (2,3] | 4 |
Hence, if your covariate is indicating a shift in occasion you probably did not mean for it to change like that, you wanted it to be constant until you see an updated value - and to be honest that's what I want for all covariates.
For example, at time = 0.5 the covariate value is 2, but since the value was set to 1 at time 0 and 2 only at time 1, you probably would expect the value to be 1 (first occasion for example).
Hence, for LOCF, you can use the following:
pop = read_pumas(pkdata; covariates_direction = :left)along with any other required keyword arguments for column mapping.