Simulation of Pumas Models
The simobs Function
Simulation of a PumasModel is performed via the simobs function. The function is given by the values:
Pumas.simobs — Functionsimobs(
model::AbstractPumasModel,
population::Union{Subject, Population}
param,
randeffs=sample_randeffs(model, param, population);
obstimes=nothing,
ensemblealg=EnsembleSerial(),
diffeq_options=NamedTuple(),
rng=Random.default_rng(),
)Simulate random observations from model for population with parameters param at obstimes (by default, use the times of the existing observations for each subject). If no randeffs is provided, then random ones are generated according to the distribution in the model.
Arguments
modelmay either be aPumasModelor aPumasEMModel.populationmay either be aPopulationofSubjects or a singleSubject.paramshould be either a single parameter set, in the form of aNamedTuple, or a vector of such parameter sets. If a vector then each of the parameter sets in that vector will be applied in turn. Example:(; tvCL=1., tvKa=2., tvV=1.)
randeffsis an optional argument that, if used, should specify the random effects for each subject inpopulation. Such random effects are specified byNamedTuples forPumasModels(e.g.(; tvCL=1., tvKa=2.)) and byTuplesforPumasEMModels (e.g.(1., 2.)). Ifpopulationis a singleSubject(without being enclosed in a vector) then a single such random effect specifier should be passed. If, however,populationis aPopulationof multipleSubjects thenrandeffsshould be a vector containing one such specifier for eachSubject. The functionsinit_randeffs,zero_randeffs, andsample_randeffsare sometimes convenient for generatingrandeffs:
randeffs = zero_randeffs(model, param, population)
solve(model, population, param, randeffs)If no randeffs is provided, then random ones are generated according to the distribution in the model.
obstimesis a keyword argument where you can pass a vector of times at which to simulate observations. The default,nothing, ensures the use of the existing observation times for eachSubject.ensemblealgis a keyword argument that allows you to choose between different modes of parallelization. Options includeEnsembleSerial(),EnsembleThreads()andEnsembleDistributed().diffeq_optionsis a keyword argument that takes aNamedTupleof options to pass on to the differential equation solver.rngis a keyword argument where you can specify which random number generator to use.
simobs(model::PumasModel, population::Population; samples::Int = 10, simulate_error = true, rng::AbstractRNG = default_rng())Simulates model predictions for population using the prior distributions for the population and subject-specific parameters. samples is the number of simulated predictions returned per subject. If simulate_error is false, the mean of the predictive distribution's error model will be returned, otherwise a sample from the predictive distribution's error model will be returned for each subject. rng is the pseudo-random number generator used in the sampling.
simobs(model::PumasModel, subject::Subject; samples::Int = 10, simulate_error = true, rng::AbstractRNG = default_rng())Simulates model predictions for subject using the prior distributions for the population and subject-specific parameters. samples is the number of simulated predictions returned. If simulate_error is false, the mean of the predictive distribution's error model will be returned, otherwise a sample from the predictive distribution's error model will be returned for each subject. rng is the pseudo-random number generator used in the sampling.
simobs(b::BayesMCMCResults; subject = nothing, samples::Int = 10, simulate_error = false, rng::AbstractRNG = default_rng())Simulates model predictions using the posterior samples for the population and subject-specific parameters. The predictions are either for the subject with index subject or the whole population if no index is input. samples is the number of simulated predictions returned per subject. If simulate_error is false, the mean of the predictive distribution's error model will be returned, otherwise a sample from the predictive distribution's error model will be returned for each subject. rng is the pseudo-random number generator used in the sampling.
simobs(pmi::FittedPumasModelInference, population::Population = pmi.fpm.data, randeffs::Union{Nothing, AbstractVector{<:NamedTuple}} = nothing; samples::Int, rng = default_rng(), kwargs...,)Simulates observations from the fitted model using a truncated multivariate normal distribution for the parameter values when a covariance estimate is available and using non-parametric sampling for simulated inference and bootstrapping. When a covariance estimate is available, the optimal parameter values are used as the mean and the variance-covariance estimate is used as the covariance matrix. Rejection sampling is used to avoid parameter values outside the parameter domains. Each sample uses a different parameter value. samples is the number of samples to sample. population is the population of subjects which defaults to the population associated the fitted model. randeffs can be set to a vector of named tuples, one for each sample. If randeffs is not specified (the default behaviour), it will be sampled from its distribution.
simobs(fpm::FittedPumasModel, [population::Population,] vcov::AbstractMatrix, randeffs::Union{Nothing, AbstractVector{<:NamedTuple}} = nothing; samples::Int, rng = default_rng(), kwargs...)Simulates observations from the fitted model using a truncated multi-variate normal distribution for the parameter values. The optimal parameter values are used for the mean and the user supplied variance-covariance (vcov) is used as the covariance matrix. Rejection sampling is used to avoid parameter values outside the parameter domains. Each sample uses a different parameter value. samples is the number of samples to sample. population is the population of subjects which defaults to the population associated the fitted model so it's optional to pass. randeffs can be set to a vector of named tuples, one for each sample. If randeffs is not specified (the default behaviour), it will be sampled from its distribution.
Handling Simulated Returns
When running:
sim = simobs(m, data, param)sim is a SimulatedPopulation which a collection of SimulatedObservations, and when indexed like sim[i] it returns the SimulatedObservation of the ith simulation subject. The fields of SimulatedObservation are:
subject: theSubjectused to generate the observationtime: the times associated with the observationsobservations: the resulting observations of the simulationretcode: the status of the simulation process, success or failure.
Here, observations is a NamedTuple where the names give the associated values.
Visualizing Simulated Returns
These objects have automatic plotting and dataframe visualization. To plot a simulation return, simply call sim_plot on the output. For example, the following will run a simulation and plot the observed variables:
sim = simobs(m, data, param)
using PumasUtilities
sim_plot(m, obs; observations=[:dv])This generates a plot for each variable(s) specified in observations keyword. For example, sim_plot(m, obs; observations=[:dv1, :dv2]) would only plot the values dv1 and dv2. In addition, all of the specified attributes can be used in this plot command. For more advanced use of the underlying plotting ecosystem, please see the CairoMakie tutorial. Note that if the simulated return is a SimulatedPopulation, then the plots overlay the results of the various subjects.
To generate the DataFrame associated with the observed outputs, simply call DataFrame on the simulated return. For example, the following builds the tabular output from the returned object:
sim = simobs(m, data, param)
df = DataFrame(sim)Different usage patterns of simobs
Below, we showcase the different usage patterns of simobs. Consider the following model, population and parameters that we will use to simulate from.
onecomp = @model begin
@param begin
tvcl ∈ RealDomain(; lower=0, init=4)
tvvc ∈ RealDomain(; lower=0, init=70)
Ω ∈ PDiagDomain(init=[0.04,0.04])
σ ∈ RealDomain(; lower=0.00001, init=0.1)
end
@random begin
η ~ MvNormal(Ω)
end
@pre begin
CL = tvcl * exp(η[1])
Vc = tvvc * exp(η[2])
end
@dynamics Central1
@derived begin
cp = @. Central/Vc
dv ~ @. Normal(cp, cp*σ)
end
end
model_params = init_params(onecomp)
ext_param = (; tvcl=3, tvvc=60, Ω=Diagonal([0.04, 0.04]), σ=0.1)
regimen = DosageRegimen(100)
pop = map(i -> Subject(; id=i, events=regimen), 1:3)NOTE: In all the examples below, the default simulation time is till 24 hours.
Subject level simulation
In this pattern, you can use simobs to do a single subject simulation using the syntax below.
simobs(model::PumasModel, subject::Subject, param::NamedTuple)Particularly, for this example, it translates to:
sims2 = simobs(onecomp, pop[1], ext_param)where, onecomp is the model, pop[1] represents one subject in the Population, pop. The parameters of the model are passed in as the NamedTuple defined earlier.
Subject level simulation with passing random effects
In this pattern, you can use simobs to do a single subject simulation using the syntax below.
simobs(model::PumasModel, subject::Subject, param::NamedTuple, randeffs::Union{Nothing, NamedTuple})Here individual subject random effects can be passed in as vector. The length of the random effect vector should be the same as the number of parameters for each subject.
ext_randeffs = (; η=[0.7, -0.44])Particularly, for this example, it translates to
sims3 = simobs(onecomp, pop[1], ext_param, ext_randeffs)where, onecomp is the model, pop[1] represents one subject in the Population, pop. The parameters of the model are passed in as the NamedTuple defined earlier. ext_randeffs are the specific values of the η's that are passed in, which means that simobs uses these values as opposed to those sampled from the distributions specified in the @random block.
Subject level simulation with passing in an array of parameters
In this pattern, you can use simobs to do a single subject simulation using the syntax below.
simobs(model::PumasModel, subject::Subject, vparam::AbstractArray{<:NamedTuple,1})Here, a subject can be simulated with an array of parameters that generates a unique simulation solution for each element of the parameter array. E.g. let's simulate pop[1] with three different combinations of tvcl and tvvc. In the example below, we change the value of tvcl to represent lo, med, hi of 3, 6, 9 but we don't change the other values (Note that are more efficient programmatic ways to generate such arrays)
ext_param_array = [(tvcl=3, tvvc=60, Ω=Diagonal([0.04, 0.04]), σ=0.1),
(tvcl=6, tvvc=60, Ω=Diagonal([0.04, 0.04]), σ=0.1),
(tvcl=9, tvvc=60, Ω=Diagonal([0.04, 0.04]), σ=0.1)]Particularly, for this example, it translates to
sims4 = simobs(onecomp, pop[1], ext_param_array)where, onecomp is the model, pop[1] represents one subject in the Population, pop. The parameters of the model are passed in as the vector of NamedTuples ext_param_array.
A DataFrame of parameter values can be converted to a NamedTuple easily using Tables.rowtable syntax. See the example below.
julia> myparams = DataFrame(
tvcl=[1, 2, 3],
tvvc=[20, 30, 40],
Ω=[Diagonal([0.04, 0.04]), Diagonal([0.04, 0.04]), Diagonal([0.04, 0.04])],
σ=0.1,
)
3×4 DataFrame
│ Row │ tvcl │ tvvc │ Ω │ σ │
│ │ Int64 │ Int64 │ Diagonal… │ Float64 │
├─────┼───────┼───────┼──────────────────────┼─────────┤
│ 1 │ 1 │ 20 │ [0.04 0.0; 0.0 0.04] │ 0.1 │
│ 2 │ 2 │ 30 │ [0.04 0.0; 0.0 0.04] │ 0.1 │
│ 3 │ 3 │ 40 │ [0.04 0.0; 0.0 0.04] │ 0.1 │This can be converted to a NamedTuple
julia> using Tables
julia> myparam_tuple = Tables.rowtable(myparams)
3-element Array{NamedTuple{(:tvcl, :tvvc, :Ω, :σ),Tuple{Int64,Int64,Diagonal{Float64,Array{Float64,1}},Float64}},1}:
(tvcl = 1, tvvc = 20, Ω = [0.04 0.0; 0.0 0.04], σ = 0.1)
(tvcl = 2, tvvc = 30, Ω = [0.04 0.0; 0.0 0.04], σ = 0.1)
(tvcl = 3, tvvc = 40, Ω = [0.04 0.0; 0.0 0.04], σ = 0.1)myparam_tuple is of the same form as the ext_param_array.
Subject level simulation with passing in an array of parameters and random effects
The two usage patterns above can be combined such that one can pass in an array of parameters and corresponding random effects with the usage pattern below
simobs(model::PumasModel, subject::Subject, vparam::AbstractArray{<:NamedTuple,1}, vrandeffs::Union{Nothing, AbstractArray{<:NamedTuple,1}})ext_randeffs = [(; η=[0.7, -0.44]),
(; η=[0.6, -0.9]),
(; η=[-0.7, 0.6])]sims5 = simobs(onecomp, pop[1], ext_param_array, ext_randeffs)Population level simulations
Population level simulation with passing in parameters
In this pattern, you can use simobs to do a population simulation using the syntax below.
simobs(model::PumasModel, population::AbstractArray{<:Subject,1}, param::NamedTuple)Particularly, for this example, it translates to
sims7 = simobs(onecomp, pop, ext_param)where, onecomp is the model, pop represents the Population, pop. The parameters of the model are passed in as the NamedTuple defined earlier.
Population level simulation passing random effects
In this pattern, you can use simobs to do a population simulation using the syntax below.
simobs(model::PumasModel, population::AbstractArray{<:Subject,1}, param::NamedTuple, vrandeffs::Union{Nothing, AbstractArray{<:NamedTuple,1}})Here individual subject random effects can be passed in as vector. The length of the random effect vector should be the same as the number of parameters for each subject.
You can use this method to pass in the empirical_bayes() of a FittedPumasModel.
ext_randeffs = [(; η = rand(2)),
(; η = rand(2)),
(; η = rand(2))]Particularly, for this example, it translates to
sims8 = simobs(onecomp, pop, ext_param, ext_randeffs)where, onecomp is the model, pop represents the Population, pop. The parameters of the model are explicitly passed in as the NamedTuple defined earlier. ext_randeffs are the specific values of the η's that are passed in, which means that simobs uses these values as opposed to those sampled from the distributions specified in the @random block.
Population level simulation with passing in an array of parameters
In this pattern, you can use simobs to do a population simulation using the syntax below.
simobs(model::PumasModel, population::AbstractArray{<:Subject,1}, vparam::AbstractArray{<:NamedTuple,1})Here, a population can be simulated with an array of parameters that generates a unique simulation solution for each element of the parameter array. E.g. let's simulate pop with three different combinations of tvcl and tvvc. In the example below, we change the value of tvcl to represent lo, med, hi of 3, 6, 9 but we don't change the other values (Note that are more efficient programmatic ways to generate such arrays)
ext_param_array = [(; tvcl=3, tvvc=60, Ω=Diagonal([0.04, 0.04]), σ=0.1),
(; tvcl=6, tvvc=60, Ω=Diagonal([0.04, 0.04]), σ=0.1),
(; tvcl=9, tvvc=60, Ω=Diagonal([0.04, 0.04]), σ=0.1)]Particularly, for this example, it translates to
sims9 = simobs(onecomp, pop, ext_param_array)where, onecomp is the model, pop represents the Population, pop. The parameters of the model are passed in as a vector of the NamedTuples ext_param_array as defined earlier represents the array of parameter values we want to simulate the model at.
Population level simulation with passing in an array of parameters and random effects
The two usage patterns above can be combined such that one can pass in an array of parameters and corresponding random effects with the usage pattern below
simobs(model::PumasModel, population::AbstractArray{T,1} where T<:Subject, vparam::AbstractArray{T,1} where T, vrandeffs::Union{Nothing, AbstractArray{T,1} where T}, args...; rng, kwargs...)ext_randeffs_array = [(; η=rand(2)) for i in 1:length(ext_param_array)*length(pop)]sims10 = simobs(onecomp, pop, ext_param_array, ext_randeffs_array)Simulation using FittedPumasModel
The usage pattern is represented below
simobs(fpm::Pumas.FittedPumasModel)Any model that has been fitted results in a FittedPumasModel. simobs can simulate from the object where it utilizes the final parameter estimates of the fit and the design of the data that was fitted.
Simulation using a FittedPumasModelInference
The usage pattern is represented below
simobs(fpm::Pumas.FittedPumasModelInference)Any model that has been fitted results in a FittedPumasModelInference. simobs can simulate from the inferred object where it utilizes the variance-covariance-matrix of the infer and the design of the data that was fitted.
Post-processing simulations
The output of a simobs call stores the simulated observations but also all the intermediate values computed such as the parameters used, individual coefficients, dose control parameters, covariates, differential equation solution, etc. There are a number of post-processing operations you can do on the simulation output to compute various queries and summary statistics based on the simulations.
The postprocess function is a powerful tool that allows you to make various queries using the simulation results. There are multiple ways to use the postprocess function. The first way to use the postprocess function is to extract all the information stored in the simulation result in the form of a vector of named tuples. Each named tuple has all the intermediate values evaluated when simulating 1 run. Let sims be the output of any simobs operation.
generated = Pumas.postprocess(sims)generated is a vector of simulation runs. Hence, generated[i] is the result from the ith simulation run. Time-dependent variables are stored as a vector with 1 element for each time point where there is an observation. Alternatively if obstimes was set instead when calling simobs, the time-dependent variables will be evaluated at the time points in obstimes instead.
The second way to use postprocess is by passing in a post-processing function. The post-processing function can be used to:
- Transform the simulated quantities, or
- Compare the simulated quantities to the observations.
We use the do syntax here which is short for passing in a function as the first argument to postprocess. The do syntax to pass in a post-processing function is:
Pumas.postprocess(sims) do gen, obs
...
endwhere gen is the named tuple of all generated quantities from 1 simulation run and obs is the named tuple of observations. For instance to query the ratio of simulated observations dv that are higher than the observed quantity dv at the respective time points, you can use:
Pumas.postprocess(sims) do gen, obs
sum(gen.dv .> obs.dv) / length(gen.dv)
endgen.dv is the simulated vector of dv whose length is the same as the number of observation time points. obs.dv is the observation vector dv. gen.dv .> obs.dv returns a vector of true/false, with one element for each time point. The sum of this vector gives the number of time points where the simulation was higher than the observation. Dividing by the number of time points gives the ratio. When using postprocess in this way, the output is always a vector of the query results. In the query function body, you can choose to use only gen or only obs but the header must always have both gen and obs:
Pumas.postprocess(sims) do gen, obs
...
endThe third way to use the postprocess function is to compute summary statistics of the simulated quantities or of functions of the simulated quantities. Summary statistics can be computed by passing a statistic function as the stat keyword argument. For example in order to estimate the probability that a simulated value is higher than an observation, you can use:
Pumas.postprocess(sims; stat= mean) do gen, obs
gen.dv .> obs.dv
endThis function will do 2 things:
- Concatenate the query results (e.g.
gen.dv .> obs.dv) from all the simulation runs into a single vector. - Compute the mean value of the combined vector.
Alternatively, you can use the mean function to do the same thing without using the keyword argument. The following will call the postprocess function under the hood.
mean(sims) do gen, obs
gen.dv .> obs.dv
endThe result of this operation will be a scalar equal to the mean value of the concatenated vector of queries.
In order to get the probability that the simulated quantity is higher than the observation for each time point, you can call the mean function externally as such:
generated = Pumas.postprocess(sims) do gen, obs
gen.dv .> obs.dv
end
mean(generated)This returns a vector of probabilities of the same length as the number of time points without concatenating all the queries together.
To compute a summary statistic of all the generated quantity, you can also call:
Pumas.postprocess(sims; stat=mean)without specifying a post-processing function or for short:
mean(sims)Beside mean, you can also use any of the following summary statistic functions in the same way:
stdfor element-wise standard deviationvarfor element-wise variancecorfor correlation between multiple quantitiescovfor covariance between multiple quantities
These functions can be passed in as the stat keyword argument to postprocess or they can be used in the short form, e.g.:
generated = Pumas.postprocess(sims; stat=std) do gen, obs
...
end
std(sims) do gen, obs
...
end
std(sims)
generated = Pumas.postprocess(sims; stat= var) do gen, obs
...
end
var(sims) do gen, obs
...
end
var(sims)The cor and cov statistics are unique in that they require a post-processing function which outputs a vector. For example to estimate the correlation between the CL and Vc parameter in a 1 compartment model, you can use any of the following:
Pumas.postprocess(sims; stat=cor) do gen, obs
[gen.CL[1], gen.Vc[1]]
end
cor(s) do gen, obs
[gen.CL[1], gen.Vc[1]]
endNote that gen.CL is a vector of simulated CL values for all the time points. But since the value is constant across time, we can use the first element gen.CL[1] only. cov can be used instead of cor to compute the covariance matrix. The output of this operation is either a correlation or a covariance matrix.