Simulation of Pumas Models
Simulation of a PumasModel
is performed via the simobs
function. The function is given by the values:
Pumas.simobs
— Functionsimobs(
model::AbstractPumasModel,
population::Union{Subject, Population},
param,
randeffs=sample_randeffs(model, param, population);
obstimes=nothing,
ensemblealg=EnsembleSerial(),
diffeq_options=NamedTuple(),
rng=Random.default_rng(),
simulate_error=Val(true),
)
Simulate random observations from model
for population
with parameters param
at obstimes
(by default, use the times of the existing observations for each subject). If no randeffs
is provided, then random ones are generated according to the distribution in the model.
Arguments
model
may either be aPumasModel
or aPumasEMModel
.population
may either be aPopulation
ofSubject
s or a singleSubject
.param
should be either a single parameter set, in the form of aNamedTuple
, or a vector of such parameter sets. If a vector then each of the parameter sets in that vector will be applied in turn. Example:(; tvCL=1., tvKa=2., tvV=1.)
randeffs
is an optional argument that, if used, should specify the random effects for each subject inpopulation
. Such random effects are specified byNamedTuple
s forPumasModels
(e.g.(; tvCL=1., tvKa=2.)
) and byTuples
forPumasEMModel
s (e.g.(1., 2.)
). Ifpopulation
is a singleSubject
(without being enclosed in a vector) then a single such random effect specifier should be passed. If, however,population
is aPopulation
of multipleSubject
s thenrandeffs
should be a vector containing one such specifier for eachSubject
. The functionsinit_randeffs
,zero_randeffs
, andsample_randeffs
are sometimes convenient for generatingrandeffs
:
randeffs = zero_randeffs(model, param, population)
solve(model, population, param, randeffs)
If no randeffs
is provided, then random ones are generated according to the distribution in the model.
obstimes
is a keyword argument where you can pass a vector of times at which to simulate observations. The default,nothing
, ensures the use of the existing observation times for eachSubject
.ensemblealg
is a keyword argument that allows you to choose between different modes of parallelization. Options includeEnsembleSerial()
,EnsembleThreads()
andEnsembleDistributed()
.diffeq_options
is a keyword argument that takes aNamedTuple
of options to pass on to the differential equation solver.rng
is a keyword argument where you can specify which random number generator to use.simulate_error
is a keyword argument that allows you to choose whether a sample (Val(true)
ortrue
) or the mean (Val(false)
orfalse
) of the predictive distribution's error model will be returned for each subject.
simobs(model::PumasModel, population::Population; samples::Int = 10, simulate_error = true, rng::AbstractRNG = default_rng())
Simulates model predictions for population
using the prior distributions for the population and subject-specific parameters. samples
is the number of simulated predictions returned per subject. If simulate_error
is false
, the mean of the predictive distribution's error model will be returned, otherwise a sample from the predictive distribution's error model will be returned for each subject. rng
is the pseudo-random number generator used in the sampling.
simobs(model::PumasModel, subject::Subject; samples::Int = 10, simulate_error = true, rng::AbstractRNG = default_rng())
Simulates model predictions for subject
using the prior distributions for the population and subject-specific parameters. samples
is the number of simulated predictions returned. If simulate_error
is false
, the mean of the predictive distribution's error model will be returned, otherwise a sample from the predictive distribution's error model will be returned for each subject. rng
is the pseudo-random number generator used in the sampling.
simobs(b::BayesMCMCResults; subject = nothing, samples::Int = 10, simulate_error = false, rng::AbstractRNG = default_rng())
Simulates model predictions using the posterior samples for the population and subject-specific parameters. The predictions are either for the subject with index subject
or the whole population if no index is input. samples
is the number of simulated predictions returned per subject. If simulate_error
is false
, the mean of the predictive distribution's error model will be returned, otherwise a sample from the predictive distribution's error model will be returned for each subject. rng
is the pseudo-random number generator used in the sampling.
simobs(pmi::FittedPumasModelInference, population::Population = pmi.fpm.data, randeffs::Union{Nothing, AbstractVector{<:NamedTuple}} = nothing; samples::Int, rng = default_rng(), kwargs...,)
Simulates observations from the fitted model using a truncated multivariate normal distribution for the parameter values when a covariance estimate is available and using non-parametric sampling for simulated inference and bootstrapping. When a covariance estimate is available, the optimal parameter values are used as the mean and the variance-covariance estimate is used as the covariance matrix. Rejection sampling is used to avoid parameter values outside the parameter domains. Each sample uses a different parameter value. samples
is the number of samples to sample. population
is the population of subjects which defaults to the population associated the fitted model. randeffs
can be set to a vector of named tuples, one for each sample. If randeffs
is not specified (the default behaviour), it will be sampled from its distribution.
simobs(fpm::FittedPumasModel, [population::Population,] vcov::AbstractMatrix, randeffs::Union{Nothing, AbstractVector{<:NamedTuple}} = nothing; samples::Int, rng = default_rng(), kwargs...)
Simulates observations from the fitted model using a truncated multi-variate normal distribution for the parameter values. The optimal parameter values are used for the mean and the user supplied variance-covariance (vcov
) is used as the covariance matrix. Rejection sampling is used to avoid parameter values outside the parameter domains. Each sample uses a different parameter value. samples
is the number of samples to sample. population
is the population of subjects which defaults to the population associated the fitted model so it's optional to pass. randeffs
can be set to a vector of named tuples, one for each sample. If randeffs
is not specified (the default behaviour), it will be sampled from its distribution.
For the examples showcase here, consider the following, model, population and parameters that we will use to simulate from:
using Pumas
using PumasUtilities
onecomp = @model begin
@param begin
tvcl ∈ RealDomain(; lower = 0, init = 4)
tvvc ∈ RealDomain(; lower = 0, init = 70)
Ω ∈ PDiagDomain(init = [0.04, 0.04])
σ ∈ RealDomain(; lower = 0.00001, init = 0.1)
end
@random begin
η ~ MvNormal(Ω)
end
@pre begin
CL = tvcl * exp(η[1])
Vc = tvvc * exp(η[2])
end
@dynamics Central1
@derived begin
cp := @. Central / Vc
dv ~ @. Normal(cp, cp * σ)
end
end
model_params = init_params(onecomp)
ext_param = (; tvcl = 3, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1)
regimen = DosageRegimen(100)
pop = map(i -> Subject(; id = i, events = regimen, observations = (:dv,)), 1:3)
sims = simobs(onecomp, pop, ext_param; obstimes = 1:24)
Simulated population (Vector{<:Subject})
Simulated subjects: 3
Simulated variables: dv
Options and settings for simobs
Below we will go through visualization and post-processing of results and some common simulation patterns in terms of subject or population level simulations and how to set or draw fixed effects and random effects. Before we get to that, let us establish how to control parallelization and settings for the ordinary differential equation (ODE) solver. The main options are:
obstimes
ensemblealg
diffeq_options
rng
simulate_error
Adding times to the simulations with obstimes
When defining the subjects to be used in simulations it is possible to input the time
information. This can be the values in the time
column of a DataFrame
with read_pumas
or given directly to a subject constructor Subject(; id = "1", time = [0.0, 1.0, 5.0])
. If it is desired at the time of simulation to add more time points, these can be added to the simulations using the obstimes
keyword.
simobs(model, population, parameters; obstimes = [2.0, 8.0, 24.0])
This will add the time points 2
, 8
, and 24
to the simulated time points on top of the existing time points in the subejct.
Choosing mode of parallelization ensemblealg
Simulations from simobs
are multithreaded by default. This ensures that all simulations uses all the available cores or virtual CPUs on the machine being used. Other possibilities are EnsembleSerial()
or EnsembleDistributed()
. EnsembleSerial()
is mostly present as an option for backwards compatibility. Prior to v2.6.0 of Pumas a simobs
call with EnsembleThreads
would not necessarily produce stable results even if a seed was set. simobs
is stable across runs even with EnsembleThreads
so there is no longer a good reason to use EnsembleSerial()
. EnsembleDistributed()
uses the distributed computing functionality in Julia. This generally means that work is divided to workers using the pmap
function. Simulations are rarely sufficiently expensive per subject that this mode of parallelization makes much sense. It is possible to use it, but it is mostly relevant for large datasets with expensive model solutions in a fit
context or in covariate model selection using covariate_select
that always uses a pmap
.
Setting numerical ODE solver options with diffeq_options
When specifying a model we very often have to include a dynamical model to account for the pharmacokinetics and pharmacodynamics. Such a model can have a closed form, or analytical, solution if it is sufficiently simple. However, in many cases there is no closed form solution for example if there are non-linear terms in the dynamics or if the model depends explicitly on t
. Then, we need to solve the model numerically. This is called numerical integration because the solution of the model is the integral of the differential equations over time.
If the model is to be solved numerically there are three settings we may have to set in some circumstances. Sometimes a model does not solve well or fast enough and it is sometimes possible to change the numerical integration method (the "solver" or "algorithm") and sometimes we may want to adjust tolerances if it is impractical to solve the model to the usual tolerances. The three options we want to highlight are:
abstol
roughly interpreted as the error around0
reltol
roughly interpreted as controlling the number of correct digits (1e-3
would then mean3
)alg
is the solver that will actually perform the numerical integration
The tolerances for the solution control the following scaled error
err_scaled = err/(abstol + (max(uprev, u)*reltol))
where u
and uprev
are two candidates for the solution. The scaled error is guaranteed to be less than 1 and the interpretation of the two tolerances are as explained in the list above.
The default values for the tolerances depend on the context in Pumas.
- In
fit
we use theAutoVern7
automatic switching algorithm withRodas5P
inside. This means thatVern7
is used by default and if stiffness is detected we switch toRodas5P
. In terms of tolerances, we setabstol=1e-12
andreltol=1e-8
. - Outside of
fit
we use theAutoTsit5
automatic switching algorithm withRosenbrock23
inside. This means thatTsit5
is used by default and if stiffness is detected we switch toRosenbrock23
. In terms of tolerances, we setabstol=1e-6
andreltol=1e-3
.
Understanding the diffeq_options
in terms of NONMEM terminology
To map the advice above to what users might know from the NONMEM $SUBROUTINES
functionality consider the following list:
TOL
sets the relative tolerance and can be a number or a user providedTOL
routine. If the input is a numbern
it must be a an integer and sets the approximate number of correct digits. In terms of Pumas options, this corresponds todiffeq_options = (; reltol = 1e-n)
.ATOL
sets the absolute tolerance and should be an integern
. This corresponds todiffeq_options = (; abstol = 1e-n)
in Pumas.ADVAN
s such asADVAN6
,ADVAN9
,ADVAN13
corresponds to setting thealg
option as described above. Note, that the use ofADVAN
does not exactly correspond todiffeq_options = (; alg = ...)
becausealg
is only use for numerical integrator choice in Pumas whereas it is also used to specify closed form solutions in NONMEM.
Turning simulation of derived
distributions off or on
When running simobs
there are several levels of random draws that might be made. First, if the random effects are not passed in after the fixed effects the random effects will be sampled. Second, in the @derived
block all dependent variables will also be drawn. Consider the following:
@derived begin
cp = @. 1_000 * (Central / Vc)
dv1 ~ @. Normal(cp, sqrt(cp^2 * σ_prop))
dv2 ~ @. Normal(cp, σ_add)
end
This model would simulate both a proportional and an additive dependent variable for each subject with the same mean. If we run simobs
as usual (not options set) the output will contain two variables with the same mean, but with different variances. This source of randomness is often called Residual Unexplained Variability or RUV. To run simulations with uncertainty using the parameter uncertainty of estimated fixed effects or some other simulation that requires you to simulate without RUV we use simulate_error = false
simobs(model, population, parameters, randeffs; simulate_error = false)
This will use the mean
of the derived variables instead of sampling from them and in this case that would mean that all dv1
and dv2
observations would be simulated to be the exact same values.
Seeding the simulations with rng
As mentioned above, simobs
might sample both Between Subject Variability (BSV) or Residual Unexplained Variability (RUV). The exact values that are sampled is controlled through the random number generator (RNG).
To set a seed for the default random number generator it is possible to use code as the following:
using Random
Random.seed!(859)
This will set the seed of the default RNG and as such we can simply call simobs
after without no additional settings. It is also possible to assign a RNG instance to a variables and manually seed the simobs
called for example using Random
again:
using Random
my_rng = Random.seed!(999)
simobs(model, population, parameters; rng = my_rng)
The seed!
approach without manually setting the keywords should work fine in most cases though.
Seeded random sequences and simulation of sampled variables in @derived
There is an important point to make about how the random components are sampled in simobs
and what it means for situations where you might want to first run a simulation for some time interval and later expand that time interval. An example could be an adaptive dosing trial where you want to first simulate up until some time point that designates a time where the dose may be modified. You look at the values, add some doses to the subjects to match their doses from the time of the visit to the clinic, and then you simulate onwards in time.
When running simobs
we take the model, subject(s), fixed effects, random effects, and potentially additional obstimes
and use them to first solve the underlying model. This model solution can then be used to define variables in @derived
that have measurement error or some other RUV component added on top of the variables in the dynamical system as above:
@derived begin
cp = @. 1_000 * (Central / Vc)
dv1 ~ @. Normal(cp, sqrt(cp^2 * σ_prop))
dv2 ~ @. Normal(cp, σ_add)
end
Consider the case where we are simulating the model that contains the above @derived
block. If we simulate at the time points [0.0,1.0,2.0]
we will then get two sequences of numbers for dv1
and dv2
. These values will be sampled according to the seed that we set. If no seed is explicitly set the current global RNG state is used.
Let us imagine that we looked at the values and want to continue the simulation onwards such that the full time vector becomes [0.0,1.0,2.0,4.0,5.0,6.0]
then we have to be careful in how we actually sample the @derived
variables. Internally we sample all variables together per time point. First, we sample dv1
and dv2
at time = 0.0
, then we sample dv1
and dv2
at time = 1.0
, and so on until the final time point. This ensures that if we expand our time vector beyond the original time vector, we still get the same results out for the first part of the time vector given the same seed.
The point may be a little subtle, but it can be illustrated as follows. First, consider setting a seed and drawing two uniform values that will stand in for the dependent variables:
julia> Random.seed!(234)
TaskLocalRNG()
julia> a = rand(2)
2-element Vector{Float64}:
0.6393022684352254
0.04151237262182306
julia> b = rand(2)
2-element Vector{Float64}:
0.6035818204037028
0.8866568936656358
Imagine, that we now want to simulate three values in a
and b
instead. Then we could re-set the seed and get:
julia> Random.seed!(234)
TaskLocalRNG()
julia> a = rand(3)
3-element Vector{Float64}:
0.6393022684352254
0.04151237262182306
0.6035818204037028
julia> b = rand(3)
3-element Vector{Float64}:
0.8866568936656358
0.952245838494935
0.7773247257534464
However, this is not what we wanted! We wanted to original values from a
and b
and then an additional sample for each of them. If you look closely, you will see that the first three values of the new a
are exactly the first two values of the original a
and then first value of the original b
. For b
we then get the last value of the original b
and then two new values.
To ensure that Pumas models always simulate the same way in the first part of the time domain even if we expand it with additional times we take another approach than sampling the full dv
vectors at once. Instead, we do something analogous to the following:
julia> Random.seed!(234)
TaskLocalRNG()
julia> a_alt = []
Any[]
julia> b_alt = []
Any[]
julia> for i = 1:2
push!(a_alt, rand())
push!(b_alt, rand())
end
julia> a_alt
2-element Vector{Any}:
0.6393022684352254
0.6035818204037028
julia> b_alt
2-element Vector{Any}:
0.04151237262182306
0.8866568936656358
julia> Random.seed!(234)
TaskLocalRNG()
julia> a_alt = []
Any[]
julia> b_alt = []
Any[]
julia> for i = 1:3
push!(a_alt, rand())
push!(b_alt, rand())
end
julia> a_alt
3-element Vector{Any}:
0.6393022684352254
0.6035818204037028
0.952245838494935
julia> b_alt
3-element Vector{Any}:
0.04151237262182306
0.8866568936656358
0.7773247257534464
While it is beyond the scope of this document to show a full example this is important for the use-case mentioned above where you have an adaptive trial and you want to sequentially run the simobs
up until some time point, check the state of the subjects, add some doses to the subject, and then simobs
again but this time going further to the next visit to the clinic.
Visualizing Simulated Returns
The object returned by simobs
have automatic plotting and DataFrame
conversion. To plot a simulation return, simply call sim_plot
on the output:
sim_plot(sims)
This generates a plot for each variable(s) specified in observations
keyword. For example, sim_plot(m, obs; observations=[:dv1, :dv2])
would only plot the values dv1
and dv2
. In addition, all the specified attributes can be used in this plot
command. For more advanced use of the underlying plotting ecosystem, please see the Makie layout tutorial. Note that if the simulated return is a SimulatedPopulation
, then the plots overlay the results of the various subjects.
To generate the DataFrame
associated with the observed outputs, simply call DataFrame
on the simulated return. For example, the following builds the tabular output from the returned object:
df = DataFrame(sims)
first(df, 5)
Row | id | time | dv | evid | amt | cmt | rate | duration | ss | ii | route | Central | CL | Vc | η_1 | η_2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
String | Float64 | Float64? | Int64 | Float64? | Symbol? | Float64? | Float64? | Int8? | Float64? | NCA.Route? | Float64? | Float64? | Float64? | Float64 | Float64 | |
1 | 1 | 0.0 | missing | 1 | 100.0 | Central | 0.0 | 0.0 | 0 | 0.0 | NullRoute | missing | 2.61489 | 56.5201 | -0.137389 | -0.0597491 |
2 | 1 | 1.0 | 1.93378 | 0 | missing | missing | missing | missing | missing | missing | missing | 95.4789 | 2.61489 | 56.5201 | -0.137389 | -0.0597491 |
3 | 1 | 2.0 | 1.67916 | 0 | missing | missing | missing | missing | missing | missing | missing | 91.1622 | 2.61489 | 56.5201 | -0.137389 | -0.0597491 |
4 | 1 | 3.0 | 1.64777 | 0 | missing | missing | missing | missing | missing | missing | missing | 87.0407 | 2.61489 | 56.5201 | -0.137389 | -0.0597491 |
5 | 1 | 4.0 | 1.66429 | 0 | missing | missing | missing | missing | missing | missing | missing | 83.1055 | 2.61489 | 56.5201 | -0.137389 | -0.0597491 |
Different usage patterns of simobs
Below, we showcase the different usage patterns of simobs
.
Subject level simulation
In this pattern, you can use simobs
to do a single subject simulation by passing a Subject
instead of a Population
:
sims2 = simobs(onecomp, pop[1], ext_param; obstimes = 1:24)
SimulatedObservations
Simulated variables: dv
Time: 1:24
where, onecomp
is the model, pop[1]
represents one subject in the Population
, pop
. The parameters of the model are passed in as the NamedTuple
defined earlier.
Subject level simulation with passing random effects
In this pattern, you can use simobs
to do a single subject simulation by passing an additional positional argument of individual subject random effects represented as vector. The length of the random effect vector should be the same as the number of parameters for each subject:
ext_randeffs = (; η = [0.7, -0.44])
sims3 = simobs(onecomp, pop[1], ext_param, ext_randeffs; obstimes = 1:24)
SimulatedObservations
Simulated variables: dv
Time: 1:24
where, onecomp
is the model, pop[1]
represents one subject in the Population
, pop
. The parameters of the model are passed in as the NamedTuple
defined earlier. ext_randeffs
are the specific values of the η
's that are passed in, which means that simobs
uses these values as opposed to those sampled from the distributions specified in the @random
block.
Subject level simulation with passing in an array of parameters
In this pattern, you can use simobs
to do a single subject simulation by passing a positional argument, an array of parameters that generates a unique simulation solution for each element of the parameter array. E.g. let's simulate pop[1]
with three different combinations of tvcl
and tvvc
. In the example below, we change the value of tvcl
to represent lo
, med
, hi
of 3
, 6
, 9
, but we don't change the other values (Note that are more efficient programmatic ways to generate such arrays) using the syntax below:
ext_param_array = [
(tvcl = 3, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1),
(tvcl = 6, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1),
(tvcl = 9, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1),
]
sims4 = simobs(onecomp, pop[1], ext_param_array; obstimes = 1:24)
Simulated population (Vector{<:Subject})
Simulated subjects: 3
Simulated variables: dv
where, onecomp
is the model, pop[1]
represents one subject in the Population
, pop
. The parameters of the model are passed in as the vector of NamedTuple
s ext_param_array
.
A DataFrame of parameter values can be converted to a NamedTuple
easily using Tables.rowtable
syntax. See the example below.
myparams = DataFrame(;
tvcl = [1, 2, 3],
tvvc = [20, 30, 40],
Ω = [Diagonal([0.04, 0.04]), Diagonal([0.04, 0.04]), Diagonal([0.04, 0.04])],
σ = 0.1,
)
Row | tvcl | tvvc | Ω | σ |
---|---|---|---|---|
Int64 | Int64 | Diagonal… | Float64 | |
1 | 1 | 20 | [0.04 0.0; 0.0 0.04] | 0.1 |
2 | 2 | 30 | [0.04 0.0; 0.0 0.04] | 0.1 |
3 | 3 | 40 | [0.04 0.0; 0.0 0.04] | 0.1 |
This can be converted to a NamedTuple
myparam_tuple = Tables.rowtable(myparams)
3-element Vector{@NamedTuple{tvcl::Int64, tvvc::Int64, Ω::Diagonal{Float64, Vector{Float64}}, σ::Float64}}:
(tvcl = 1, tvvc = 20, Ω = [0.04 0.0; 0.0 0.04], σ = 0.1)
(tvcl = 2, tvvc = 30, Ω = [0.04 0.0; 0.0 0.04], σ = 0.1)
(tvcl = 3, tvvc = 40, Ω = [0.04 0.0; 0.0 0.04], σ = 0.1)
myparam_tuple
is of the same form as the ext_param_array
.
Subject level simulation with passing in an array of parameters and random effects
The two usage patterns above can be combined such that one can pass in an array of parameters and corresponding random effects with the usage pattern below:
ext_randeffs = [(; η = [0.7, -0.44]), (; η = [0.6, -0.9]), (; η = [-0.7, 0.6])]
sims5 = simobs(onecomp, pop[1], ext_param_array, ext_randeffs; obstimes = 1:24)
Simulated population (Vector{<:Subject})
Simulated subjects: 3
Simulated variables: dv
Population level simulations
In this pattern, you can use simobs
to do a whole population simulation by passing a Population
instead of a Subject
:
sims6 = simobs(onecomp, pop, ext_param; obstimes = 1:24)
Simulated population (Vector{<:Subject})
Simulated subjects: 3
Simulated variables: dv
where, onecomp
is the model, pop
is the Population
. The parameters of the model are passed in as the NamedTuple
defined earlier.
Population level simulation passing random effects
In this pattern, you can use simobs
to do a population simulation by passing an additional positional argument of individual subject random effects represented as vector. The length of the random effect vector should be the same as the number of parameters for each subject:
You can use this method to pass in the empirical_bayes
of a FittedPumasModel
.
ext_randeffs = [(; η = rand(2)), (; η = rand(2)), (; η = rand(2))]
sims8 = simobs(onecomp, pop, ext_param, ext_randeffs; obstimes = 1:24)
Simulated population (Vector{<:Subject})
Simulated subjects: 3
Simulated variables: dv
where, onecomp
is the model, pop
represents the Population
, pop
. The parameters of the model are explicitly passed in as the NamedTuple
defined earlier. ext_randeffs
are the specific values of the η
's that are passed in, which means that simobs
uses these values as opposed to those sampled from the distributions specified in the @random
block.
Population level simulation with passing in an array of parameters
The two usage patterns above can be combined such that one can pass in an array of parameters and corresponding random effects.
Here, a population can be simulated with an array of parameters that generates a unique simulation solution for each element of the parameter array. E.g. let's simulate pop
with three different combinations of tvcl
and tvvc
. In the example below, we change the value of tvcl
to represent lo
, med
, hi
of 3, 6, 9, but we don't change the other values (Note that are more efficient programmatic ways to generate such arrays):
ext_param_array = [
(; tvcl = 3, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1),
(; tvcl = 6, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1),
(; tvcl = 9, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1),
]
sims9 = simobs(onecomp, pop, ext_param_array; obstimes = 1:24)
3-element Vector{Vector{Pumas.SimulatedObservations{PumasModel{(tvcl = 1, tvvc = 1, Ω = 2, σ = 1), 2, (:Central,), (:CL, :Vc), ParamSet{@NamedTuple{tvcl::RealDomain{Int64, TransformVariables.Infinity{true}, Int64}, tvvc::RealDomain{Int64, TransformVariables.Infinity{true}, Int64}, Ω::PDiagDomain{PDMats.PDiagMat{Float64, Vector{Float64}}}, σ::RealDomain{Float64, TransformVariables.Infinity{true}, Float64}}}, Main.var"#1#6", Main.var"#2#7", Nothing, Main.var"#3#8", Central1, Pumas.DerivedObj{(:dv,), Main.var"#4#9"}, Main.var"#5#10", Nothing, Pumas.PumasModelOptions}, Subject{@NamedTuple{dv::Vector{Missing}}, Pumas.ConstantCovar{@NamedTuple{}}, DosageRegimen{Float64, Symbol, Float64, Float64, Float64, Float64}, Nothing}, UnitRange{Int64}, @NamedTuple{tvcl::Int64, tvvc::Int64, Ω::Diagonal{Float64, Vector{Float64}}, σ::Float64}, @NamedTuple{η::Vector{Float64}}, @NamedTuple{}, @NamedTuple{callback::Nothing, continuity::Symbol, saveat::UnitRange{Int64}, alg::CompositeAlgorithm{1, Tuple{OrdinaryDiffEq.Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, Rosenbrock23{1, true, LinearSolve.GenericLUFactorization{LinearAlgebra.RowMaximum}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!)}}, OrdinaryDiffEq.AutoSwitch{OrdinaryDiffEq.Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, Rosenbrock23{1, true, LinearSolve.GenericLUFactorization{LinearAlgebra.RowMaximum}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!)}, Rational{Int64}, Int64}}}, Pumas.PKPDAnalyticalSolution{StaticArraysCore.SVector{1, Float64}, 2, Vector{StaticArraysCore.SVector{1, Float64}}, Vector{Float64}, Vector{StaticArraysCore.SVector{1, Float64}}, Vector{StaticArraysCore.SVector{1, Float64}}, Returns{@NamedTuple{CL::Float64, Vc::Float64}}, Pumas.AnalyticalPKPDProblem{StaticArraysCore.SVector{1, Float64}, Float64, false, Central1, Vector{Pumas.Event{Float64, Float64, Float64}}, Vector{Float64}, Returns{@NamedTuple{CL::Float64, Vc::Float64}}}}, @NamedTuple{}, @NamedTuple{CL::Vector{Float64}, Vc::Vector{Float64}}, @NamedTuple{dv::Vector{Float64}}, @NamedTuple{}}}}:
Simulated population (Vector{<:Subject}), n = 3, variables: dv
Simulated population (Vector{<:Subject}), n = 3, variables: dv
Simulated population (Vector{<:Subject}), n = 3, variables: dv
where, onecomp
is the model, pop
represents the Population
, pop
. The parameters of the model are passed in as a vector of the NamedTuple
s ext_param_array
as defined earlier represents the array of parameter values we want to simulate the model at.
Population level simulation with passing in an array of parameters and random effects
The two usage patterns above can be combined such that one can pass in an array of parameters and corresponding random effects with the usage pattern below:
ext_randeffs_array = [(; η = rand(2)) for i = 1:length(ext_param_array)*length(pop)]
sims10 = simobs(onecomp, pop, ext_param_array, ext_randeffs_array; obstimes = 1:24)
3-element Vector{Vector{Pumas.SimulatedObservations{PumasModel{(tvcl = 1, tvvc = 1, Ω = 2, σ = 1), 2, (:Central,), (:CL, :Vc), ParamSet{@NamedTuple{tvcl::RealDomain{Int64, TransformVariables.Infinity{true}, Int64}, tvvc::RealDomain{Int64, TransformVariables.Infinity{true}, Int64}, Ω::PDiagDomain{PDMats.PDiagMat{Float64, Vector{Float64}}}, σ::RealDomain{Float64, TransformVariables.Infinity{true}, Float64}}}, Main.var"#1#6", Main.var"#2#7", Nothing, Main.var"#3#8", Central1, Pumas.DerivedObj{(:dv,), Main.var"#4#9"}, Main.var"#5#10", Nothing, Pumas.PumasModelOptions}, Subject{@NamedTuple{dv::Vector{Missing}}, Pumas.ConstantCovar{@NamedTuple{}}, DosageRegimen{Float64, Symbol, Float64, Float64, Float64, Float64}, Nothing}, UnitRange{Int64}, @NamedTuple{tvcl::Int64, tvvc::Int64, Ω::Diagonal{Float64, Vector{Float64}}, σ::Float64}, @NamedTuple{η::Vector{Float64}}, @NamedTuple{}, @NamedTuple{callback::Nothing, continuity::Symbol, saveat::UnitRange{Int64}, alg::CompositeAlgorithm{1, Tuple{OrdinaryDiffEq.Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, Rosenbrock23{1, true, LinearSolve.GenericLUFactorization{LinearAlgebra.RowMaximum}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!)}}, OrdinaryDiffEq.AutoSwitch{OrdinaryDiffEq.Tsit5{typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!), Static.False}, Rosenbrock23{1, true, LinearSolve.GenericLUFactorization{LinearAlgebra.RowMaximum}, typeof(OrdinaryDiffEq.DEFAULT_PRECS), Val{:forward}, true, nothing, typeof(OrdinaryDiffEq.trivial_limiter!), typeof(OrdinaryDiffEq.trivial_limiter!)}, Rational{Int64}, Int64}}}, Pumas.PKPDAnalyticalSolution{StaticArraysCore.SVector{1, Float64}, 2, Vector{StaticArraysCore.SVector{1, Float64}}, Vector{Float64}, Vector{StaticArraysCore.SVector{1, Float64}}, Vector{StaticArraysCore.SVector{1, Float64}}, Returns{@NamedTuple{CL::Float64, Vc::Float64}}, Pumas.AnalyticalPKPDProblem{StaticArraysCore.SVector{1, Float64}, Float64, false, Central1, Vector{Pumas.Event{Float64, Float64, Float64}}, Vector{Float64}, Returns{@NamedTuple{CL::Float64, Vc::Float64}}}}, @NamedTuple{}, @NamedTuple{CL::Vector{Float64}, Vc::Vector{Float64}}, @NamedTuple{dv::Vector{Float64}}, @NamedTuple{}}}}:
Simulated population (Vector{<:Subject}), n = 3, variables: dv
Simulated population (Vector{<:Subject}), n = 3, variables: dv
Simulated population (Vector{<:Subject}), n = 3, variables: dv
Simulation using a FittedPumasModelInference
The usage pattern is represented below:
Pumas.simobs
— Methodsimobs(pmi::FittedPumasModelInference, population::Population = pmi.fpm.data, randeffs::Union{Nothing, AbstractVector{<:NamedTuple}} = nothing; samples::Int, rng = default_rng(), kwargs...,)
Simulates observations from the fitted model using a truncated multivariate normal distribution for the parameter values when a covariance estimate is available and using non-parametric sampling for simulated inference and bootstrapping. When a covariance estimate is available, the optimal parameter values are used as the mean and the variance-covariance estimate is used as the covariance matrix. Rejection sampling is used to avoid parameter values outside the parameter domains. Each sample uses a different parameter value. samples
is the number of samples to sample. population
is the population of subjects which defaults to the population associated the fitted model. randeffs
can be set to a vector of named tuples, one for each sample. If randeffs
is not specified (the default behaviour), it will be sampled from its distribution.
Any model that has been fitted results in a FittedPumasModelInference
. simobs
can simulate from the inferred object where it utilizes the variance-covariance-matrix of the infer
and the design
of the data that was fitted.
Post-processing simulations
The output of a simobs
call stores the simulated observations but also all the intermediate values computed such as the parameters used, individual coefficients, dose control parameters, covariates, differential equation solution, etc. There are a number of post-processing operations you can do on the simulation output to compute various queries and summary statistics based on the simulations.
The postprocess
function is a powerful tool that allows you to make various queries using the simulation results.
Pumas.postprocess
— Functionpostprocess(obs::SimulatedObservations)
postprocess(func::Function, obs::SimulatedObservations)
postprocess(obs::SimulatedPopulation; stat = identity)
postprocess(func::Function, obs::SimulatedPopulation; stat = identity)
Post-processes the output of simobs
. The output of a simobs
call stores the simulated observations but also all the intermediate values computed such as the parameters used, individual coefficients, dose control parameters, covariates, differential equation solution, etc. There are a number of post-processing operations you can do on the simulation output to compute various queries and summary statistics based on the simulations.
The postprocess
function is a powerful tool that allows you to make various queries using the simulation results. There are multiple ways to use the postprocess
function. The first way to use the postprocess
function is to extract all the information stored in the simulation result in the form of a vector of named tuples. Each named tuple has all the intermediate values evaluated when simulating 1 run. Let sims
be the output of any simobs
operation.
generated = Pumas.postprocess(sims)
generated
is a vector of simulation runs. Hence, generated[i]
is the result from the i
th simulation run. Time-dependent variables are stored as a vector with 1 element for each time point where there is an observation. Alternatively, if obstimes
was set instead when calling simobs
, the time-dependent variables will be evaluated at the time points in obstimes
instead.
The second way to use postprocess
is by passing in a post-processing function. The post-processing function can be used to:
- Transform the simulated quantities, or
- Compare the simulated quantities to the observations.
We use the do
syntax here which is short for passing in a function as the first argument to postprocess
. The do
syntax to pass in a post-processing function is:
Pumas.postprocess(sims) do gen, obs
...
end
where gen
is the named tuple of all generated quantities from 1 simulation run and obs
is the named tuple of observations. For instance to query the ratio of simulated observations dv
that are higher than the observed quantity dv
at the respective time points, you can use:
Pumas.postprocess(sims) do gen, obs
sum(gen.dv .> obs.dv) / length(gen.dv)
end
gen.dv
is the simulated vector of dv
whose length is the same as the number of observation time points. obs.dv
is the observation vector dv
. gen.dv .> obs.dv
returns a vector of true
/false
, with one element for each time point. The sum of this vector gives the number of time points where the simulation was higher than the observation. Dividing by the number of time points gives the ratio. When using postprocess
in this way, the output is always a vector of the query results. In the query function body, you can choose to use only gen
or only obs
but the header must always have both gen
and obs
:
Pumas.postprocess(sims) do gen, obs
...
end
The third way to use the postprocess
function is to compute summary statistics of the simulated quantities or of functions of the simulated quantities. Summary statistics can be computed by passing a statistic function as the stat
keyword argument. For example in order to estimate the probability that a simulated value is higher than an observation, you can use:
Pumas.postprocess(sims, stat = mean) do gen, obs
gen.dv .> obs.dv
end
This function will do 2 things:
- Concatenate the query results (e.g.
gen.dv .> obs.dv
) from all the simulation runs into a single vector. - Compute the mean value of the combined vector.
Alternatively, you can use the mean
function to do the same thing without using the keyword argument. The following will call the postprocess
function under the hood.
mean(sims) do gen, obs
gen.dv .> obs.dv
end
The result of this operation will be a scalar equal to the mean value of the concatenated vector of queries.
In order to get the probability that the simulated quantity is higher than the observation for each time point, you can call the mean
function externally as such:
generated = Pumas.postprocess(sims) do gen, obs
gen.dv .> obs.dv
end
mean(generated)
This returns a vector of probabilities of the same length as the number of time points without concatenating all the queries together.
To compute a summary statistic of all the generated quantity, you can also call:
Pumas.postprocess(sims, stat = mean)
without specifying a post-processing function or for short:
mean(sims)
Besides mean
, you can also use any of the following summary statistic functions in the same way:
std
for element-wise standard deviationvar
for element-wise variancecor
for correlation between multiple quantitiescov
for covariance between multiple quantities
These functions can be passed in as the stat
keyword argument to postprocess
or they can be used in the short form, e.g.:
generated = Pumas.postprocess(sims, stat = std) do gen, obs
...
end
std(sims) do gen, obs
...
end
std(sims)
generated = Pumas.postprocess(sims, stat = var) do gen, obs
...
end
var(sims) do gen, obs
...
end
var(sims)
The cor
and cov
statistics are unique in that they require a post-processing function that outputs a vector. For example, to estimate the correlation between the CL
and Vc
parameters in a 1-compartment model, you can use any of the following:
Pumas.postprocess(s, stat = cor) do gen, obs
[gen.CL[1], gen.Vc[1]]
end
cor(s) do gen, obs
[gen.CL[1], gen.Vc[1]]
end
Note that gen.CL
is a vector of simulated CL
values for all the time points. But since the value is constant across time, we can use the first element gen.CL[1]
only. cov
can be used instead of cor
to compute the covariance matrix. The output of this operation is either a correlation or a covariance matrix.