# Simulation of Pumas Models

## The `simobs`

Function

Simulation of a `PumasModel`

are performed via the `simobs`

function. The function is given by the values:

`Pumas.simobs`

— Function```
simobs(
model::AbstractPumasModel,
population::Union{Subject, Population}
param,
randeffs=sample_randeffs(model, param, population);
obstimes=nothing,
ensemblealg=EnsembleSerial(),
diffeq_options=NamedTuple(),
rng=Random.default_rng(),
)
```

Simulate random observations from `model`

for `population`

with parameters `param`

at `obstimes`

(by default, use the times of the existing observations for each subject). If no `randeffs`

is provided, then random ones are generated according to the distribution in the model.

**Arguments**

`model`

may either be a`PumasModel`

or a`PumasEMModel`

.`population`

may either be a`Population`

of`Subject`

s or a single`Subject`

.`param`

should be either a single parameter set, in the form of a`NamedTuple`

, or a vector of such parameter sets. If a vector then each of the parameter sets in that vector will be applied in turn. Example:`(; tvCL=1., tvKa=2., tvV=1.)`

`randeffs`

is an optional argument that, if used, should specify the random effects for each subject in`population`

. Such random effects are specified by`NamedTuple`

s for`PumasModels`

(e.g.`(; tvCL=1., tvKa=2.)`

) and by`Tuples`

for`PumasEMModel`

s (e.g.`(1., 2.)`

). If`population`

is a single`Subject`

(without being enclosed in a vector) then a single such random effect specifier should be passed. If, however,`population`

is a`Population`

of multiple`Subject`

s then`randeffs`

should be a vector containing one such specifier for each`Subject`

. The functions`init_randeffs`

,`zero_randeffs`

, and`sample_randeffs`

are sometimes convenient for generating`randeffs`

:

```
randeffs = zero_randeffs(model, param, population)
solve(model, population, param, randeffs)
```

If no `randeffs`

is provided, then random ones are generated according to the distribution in the model.

`obstimes`

is a keyword argument where you can pass a vector of times at which to simulate observations. The default,`nothing`

, ensures the use of the existing observation times for each`Subject`

.`ensemblealg`

is a keyword argument that allows you to choose between different modes of parallelization. Options include`EnsembleSerial()`

,`EnsembleThreads()`

and`EnsembleDistributed()`

.`diffeq_options`

is a keyword argument that takes a`NamedTuple`

of options to pass on to the differential equation solver.`rng`

is a keyword argument where you can specify which random number generator to use.

`simobs(model::PumasModel, population::Population; samples::Int = 10, simulate_error = true, rng::AbstractRNG = default_rng())`

Simulates model predictions for `population`

using the prior distributions for the population and subject-specific parameters. `samples`

is the number of simulated predictions returned per subject. If `simulate_error`

is `false`

, the mean of the predictive distribution's error model will be returned, otherwise a sample from the predictive distribution's error model will be returned for each subject. `rng`

is the pseudo-random number generator used in the sampling.

`simobs(model::PumasModel, subject::Subject; samples::Int = 10, simulate_error = true, rng::AbstractRNG = default_rng())`

Simulates model predictions for `subject`

using the prior distributions for the population and subject-specific parameters. `samples`

is the number of simulated predictions returned. If `simulate_error`

is `false`

, the mean of the predictive distribution's error model will be returned, otherwise a sample from the predictive distribution's error model will be returned for each subject. `rng`

is the pseudo-random number generator used in the sampling.

`simobs(b::BayesMCMCResults; subject = nothing, samples::Int = 10, simulate_error = false, rng::AbstractRNG = default_rng())`

Simulates model predictions using the posterior samples for the population and subject-specific parameters. The predictions are either for the subject with index `subject`

or the whole population if no index is input. `samples`

is the number of simulated predictions returned per subject. If `simulate_error`

is `false`

, the mean of the predictive distribution's error model will be returned, otherwise a sample from the predictive distribution's error model will be returned for each subject. `rng`

is the pseudo-random number generator used in the sampling.

`simobs(pmi::FittedPumasModelInference, population::Population = pmi.fpm.data, randeffs::Union{Nothing, AbstractVector{<:NamedTuple}} = nothing; samples::Int, rng = default_rng(), kwargs...,)`

Simulates observations from the fitted model using a truncated multivariate normal distribution for the parameter values when a covariance estimate is available and using non-parametric sampling for simulated inference and bootstrapping. When a covariance estimate is available, the optimal parameter values are used as the mean and the variance-covariance estimate is used as the covariance matrix. Rejection sampling is used to avoid parameter values outside the parameter domains. Each sample uses a different parameter value. `samples`

is the number of samples to sample. `population`

is the population of subjects which defaults to the population associated the fitted model. `randeffs`

can be set to a vector of named tuples, one for each sample. If `randeffs`

is not specified (the default behaviour), it will be sampled from its distribution.

`simobs(fpm::FittedPumasModel, [population::Population,] vcov::AbstractMatrix, randeffs::Union{Nothing, AbstractVector{<:NamedTuple}} = nothing; samples::Int, rng = default_rng(), kwargs...)`

Simulates observations from the fitted model using a truncated multi-variate normal distribution for the parameter values. The optimal parameter values are used for the mean and the user supplied variance-covariance (`vcov`

) is used as the covariance matrix. Rejection sampling is used to avoid parameter values outside the parameter domains. Each sample uses a different parameter value. `samples`

is the number of samples to sample. `population`

is the population of subjects which defaults to the population associated the fitted model so it's optional to pass. `randeffs`

can be set to a vector of named tuples, one for each sample. If `randeffs`

is not specified (the default behaviour), it will be sampled from its distribution.

## Handling Simulated Returns

When running

`sim = simobs(m, data, param)`

`sim`

is a `SimulatedPopulation`

which a collection of `SimulatedObservation`

s, and when indexed like `sim[i]`

it returns the `SimulatedObservation`

of the `i`

th simulation subject. The fields of `SimulatedObservation`

are:

`subject`

: the`Subject`

used to generate the observation`time`

: the times associated with the observations`observations`

: the resulting observations of the simulation`retcode`

: the status of the simulation process, success or failure.

Here, `observations`

is a `NamedTuple`

where the names give the associated values.

## Visualizing Simulated Returns

These objects have automatic plotting and dataframe visualization. To plot a simulation return, simply call `sim_plot`

on the output. For example, the following will run a simulation and plot the observed variables:

```
sim = simobs(m, data, param)
using PumasUtilities
sim_plot(m, obs, observations = [:dv])
```

This generates a plot for each variable(s) specified in `observations`

keyword. For example, `sim_plot(m, obs, observations = [:dv1, :dv2])`

would only plot the values `dv1`

and `dv2`

. In addition, all of the specified attributes can be used in this `plot`

command. For more advanced use of the underlying plotting ecosystem, please see the CairoMakie tutorial. Note that if the simulated return is a `SimulatedPopulation`

, then the plots overlay the results of the various subjects.

To generate the DataFrame associated with the observed outputs, simply call `DataFrame`

on the simulated return. For example, the following builds the tabular output from the returned object:

```
sim = simobs(m, data, param)
df = DataFrame(sim)
```

## Different usage patterns of `simobs`

Below, we showcase the different usage patterns of `simobs`

. Consider the following model, population and parameters that we will use to simulate from.

```
onecomp = @model begin
@param begin
tvcl ∈ RealDomain(lower=0, init=4)
tvvc ∈ RealDomain(lower=0, init=70)
Ω ∈ PDiagDomain(init=[0.04,0.04])
σ ∈ RealDomain(lower=0.00001, init = 0.1)
end
@random begin
η ~ MvNormal(Ω)
end
@pre begin
CL = tvcl * exp(η[1])
Vc = tvvc * exp(η[2])
end
@dynamics Central1
@derived begin
cp = @. Central/Vc
dv ~ @. Normal(cp, cp*σ)
end
end
model_params = init_params(onecomp)
ext_param = (tvcl = 3, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1)
regimen = DosageRegimen(100)
pop = map(i -> Subject(id=i, events = regimen), 1:3)
```

NOTE: In all the examples below, the default simulation time is till 24 hours.

### Subject level simulation

In this pattern, you can use `simobs`

to do a single subject simulation using the syntax below.

`simobs(model::PumasModel, subject::Subject, param::NamedTuple)`

Particularly, for this example, it translates to

`sims2 = simobs(onecomp, pop[1], ext_param)`

where, `onecomp`

is the model, `pop[1]`

represents one subject in the `Population`

, `pop`

. The parameters of the model are passed in as the `NamedTuple`

defined earlier.

### Subject level simulation with passing random effects

In this pattern, you can use `simobs`

to do a single subject simulation using the syntax below.

`simobs(model::PumasModel, subject::Subject, param::NamedTuple, randeffs::Union{Nothing, NamedTuple})`

Here individual subject random effects can be passed in as vector. The length of the random effect vector should be the same as the number of parameters for each subject.

`ext_randeffs = (η = [0.7, -0.44],)`

Particularly, for this example, it translates to

`sims3 = simobs(onecomp, pop[1], ext_param, ext_randeffs)`

where, `onecomp`

is the model, `pop[1]`

represents one subject in the `Population`

, `pop`

. The parameters of the model are passed in as the `NamedTuple`

defined earlier. `ext_randeffs`

are the specific values of the `η`

's that are passed in, which means that `simobs`

uses these values as opposed to those sampled from the distributions specified in the `@random`

block.

### Subject level simulation with passing in an array of parameters

In this pattern, you can use `simobs`

to do a single subject simulation using the syntax below.

`simobs(model::PumasModel, subject::Subject, vparam::AbstractArray{<:NamedTuple,1})`

Here, a subject can be simulated with an array of parameters that generates a unique simulation solution for each element of the parameter array. E.g. let's simulate `pop[1]`

with three different combinations of `tvcl`

and `tvvc`

. In the example below, we change the value of `tvcl`

to represent `lo`

, `med`

, `hi`

of 3, 6, 9 but we don't change the other values (Note that are more efficient programmatic ways to generate such arrays)

```
ext_param_array = [(tvcl = 3, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1),
(tvcl = 6, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1),
(tvcl = 9, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1)]
```

Particularly, for this example, it translates to

`sims4 = simobs(onecomp, pop[1], ext_param_array)`

where, `onecomp`

is the model, `pop[1]`

represents one subject in the `Population`

, `pop`

. The parameters of the model are passed in as the vector of `NamedTuple`

s `ext_param_array`

.

A DataFrame of parameter values can be converted to a NamedTuple easily using `Tables.rowtable`

syntax. See the example below

```
julia> myparams = DataFrame(
tvcl = [1, 2, 3],
tvvc =[20, 30, 40],
Ω = [Diagonal([0.04, 0.04]), Diagonal([0.04, 0.04]), Diagonal([0.04, 0.04])],
σ = 0.1)
3×4 DataFrame
│ Row │ tvcl │ tvvc │ Ω │ σ │
│ │ Int64 │ Int64 │ Diagonal… │ Float64 │
├─────┼───────┼───────┼──────────────────────┼─────────┤
│ 1 │ 1 │ 20 │ [0.04 0.0; 0.0 0.04] │ 0.1 │
│ 2 │ 2 │ 30 │ [0.04 0.0; 0.0 0.04] │ 0.1 │
│ 3 │ 3 │ 40 │ [0.04 0.0; 0.0 0.04] │ 0.1 │
```

This can be converted to a `NamedTuple`

```
julia> using Tables
julia> myparam_tuple = Tables.rowtable(myparams)
3-element Array{NamedTuple{(:tvcl, :tvvc, :Ω, :σ),Tuple{Int64,Int64,Diagonal{Float64,Array{Float64,1}},Float64}},1}:
(tvcl = 1, tvvc = 20, Ω = [0.04 0.0; 0.0 0.04], σ = 0.1)
(tvcl = 2, tvvc = 30, Ω = [0.04 0.0; 0.0 0.04], σ = 0.1)
(tvcl = 3, tvvc = 40, Ω = [0.04 0.0; 0.0 0.04], σ = 0.1)
```

`myparam_tuple`

is of the same form as the `ext_param_array`

.

### Subject level simulation with passing in an array of parameters and random effects

The two usage patterns above can be combined such that one can pass in an array of parameters and corresponding random effects with the usage pattern below

`simobs(model::PumasModel, subject::Subject, vparam::AbstractArray{<:NamedTuple,1}, vrandeffs::Union{Nothing, AbstractArray{<:NamedTuple,1}})`

```
ext_randeffs = [(η = [0.7, -0.44],),
(η = [0.6, -0.9],),
(η = [-0.7, 0.6],)]
```

`sims5 = simobs(onecomp, pop[1], ext_param_array, ext_randeffs)`

### Population level simulation

### Population level simulation with passing in parameters

In this pattern, you can use `simobs`

to do a population simulation using the syntax below.

`simobs(model::PumasModel, population::AbstractArray{<:Subject,1}, param::NamedTuple)`

Particularly, for this example, it translates to

`sims7 = simobs(onecomp, pop, ext_param)`

where, `onecomp`

is the model, `pop`

represents the `Population`

, `pop`

. The parameters of the model are passed in as the `NamedTuple`

defined earlier.

### Population level simulation passing random effects

In this pattern, you can use `simobs`

to do a population simulation using the syntax below.

`simobs(model::PumasModel, population::AbstractArray{<:Subject,1}, param::NamedTuple, vrandeffs::Union{Nothing, AbstractArray{<:NamedTuple,1}})`

Here individual subject random effects can be passed in as vector. The length of the random effect vector should be the same as the number of parameters for each subject.

You can use this method to pass in the `empirical_bayes()`

of a `FittedPumasModel`

```
ext_randeffs = [(η = rand(2),),
(η = rand(2),),
(η = rand(2),)]
```

Particularly, for this example, it translates to

`sims8 = simobs(onecomp, pop, ext_param, ext_randeffs)`

where, `onecomp`

is the model, `pop`

represents the `Population`

, `pop`

. The parameters of the model are explicitly passed in as the `NamedTuple`

defined earlier. `ext_randeffs`

are the specific values of the `η`

's that are passed in, which means that `simobs`

uses these values as opposed to those sampled from the distributions specified in the `@random`

block.

### Population level simulation with passing in an array of parameters

In this pattern, you can use `simobs`

to do a population simulation using the syntax below.

`simobs(model::PumasModel, population::AbstractArray{<:Subject,1}, vparam::AbstractArray{<:NamedTuple,1})`

Here, a population can be simulated with an array of parameters that generates a unique simulation solution for each element of the parameter array. E.g. let's simulate `pop`

with three different combinations of `tvcl`

and `tvvc`

. In the example below, we change the value of `tvcl`

to represent `lo`

, `med`

, `hi`

of 3, 6, 9 but we don't change the other values (Note that are more efficient programmatic ways to generate such arrays)

```
ext_param_array = [(tvcl = 3, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1),
(tvcl = 6, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1),
(tvcl = 9, tvvc = 60, Ω = Diagonal([0.04, 0.04]), σ = 0.1)]
```

Particularly, for this example, it translates to

`sims9 = simobs(onecomp, pop, ext_param_array)`

where, `onecomp`

is the model, `pop`

represents the `Population`

, `pop`

. The parameters of the model are passed in as a vector of the `NamedTuple`

s `ext_param_array`

as defined earlier represents the array of parameter values we want to simulate the model at.

### Population level simulation with passing in an array of parameters and random effects

The two usage patterns above can be combined such that one can pass in an array of parameters and corresponding random effects with the usage pattern below

`simobs(model::PumasModel, population::AbstractArray{T,1} where T<:Subject, vparam::AbstractArray{T,1} where T, vrandeffs::Union{Nothing, AbstractArray{T,1} where T}, args...; rng, kwargs...)`

`ext_randeffs_array = [(η = rand(2),) for i in 1:length(ext_param_array)*length(pop)]`

`sims10 = simobs(onecomp, pop, ext_param_array, ext_randeffs_array)`

### Simulation using `FittedPumasModel`

The usage pattern is represented below

`simobs(fpm::Pumas.FittedPumasModel)`

Any model that has been *fitted* results in a `FittedPumasModel`

. `simobs`

can simulate from the object where it utilizes the final parameter estimates of the `fit`

and the `design`

of the data that was *fitted*.

### Simulation using a `FittedPumasModelInference`

The usage pattern is represented below

`simobs(fpm::Pumas.FittedPumasModelInference)`

Any model that has been *fitted* results in a `FittedPumasModelInference`

. `simobs`

can simulate from the inferred object where it utilizes the *variance-covariance-matrix* of the `infer`

and the `design`

of the data that was *fitted*.

## Post-processing simulations

The output of a `simobs`

call stores the simulated observations but also all the intermediate values computed such as the parameters used, individual coefficients, dose control parameters, covariates, differential equation solution, etc. There are a number of post-processing operations you can do on the simulation output to compute various queries and summary statistics based on the simulations.

The `postprocess`

function is a powerful tool that allows you to make various queries using the simulation results. There are multiple ways to use the `postprocess`

function. The first way to use the `postprocess`

function is to extract all the information stored in the simulation result in the form of a vector of named tuples. Each named tuple has all the intermediate values evaluated when simulating 1 run. Let `sims`

be the output of any `simobs`

operation.

`generated = Pumas.postprocess(sims)`

`generated`

is a vector of simulation runs. Hence, `generated[i]`

is the result from the `i`

th simulation run. Time-dependent variables are stored as a vector with 1 element for each time point where there is an observation. Alternatively if `obstimes`

was set instead when calling `simobs`

, the time-dependent variables will be evaluated at the time points in `obstimes`

instead.

The second way to use `postprocess`

is by passing in a post-processing function. The post-processing function can be used to:

- Transform the simulated quantities, or
- Compare the simulated quantities to the observations.

We use the `do`

syntax here which is short for passing in a function as the first argument to `postprocess`

. The `do`

syntax to pass in a post-processing function is:

```
Pumas.postprocess(sims) do gen, obs
...
end
```

where `gen`

is the named tuple of all generated quantities from 1 simulation run and `obs`

is the named tuple of observations. For instance to query the ratio of simulated observations `dv`

that are higher than the observed quantity `dv`

at the respective time points, you can use:

```
Pumas.postprocess(sims) do gen, obs
sum(gen.dv .> obs.dv) / length(gen.dv)
end
```

`gen.dv`

is the simulated vector of `dv`

whose length is the same as the number of observation time points. `obs.dv`

is the observation vector `dv`

. `gen.dv .> obs.dv`

returns a vector of `true`

/`false`

, with one element for each time point. The sum of this vector gives the number of time points where the simulation was higher than the observation. Dividing by the number of time points gives the ratio. When using `postprocess`

in this way, the output is always a vector of the query results. In the query function body, you can choose to use only `gen`

or only `obs`

but the header must always have both `gen`

and `obs`

:

```
Pumas.postprocess(sims) do gen, obs
...
end
```

The third way to use the `postprocess`

function is to compute summary statistics of the simulated quantities or of functions of the simulated quantities. Summary statistics can be computed by passing a statistic function as the `stat`

keyword argument. For example in order to estimate the probability that a simulated value is higher than an observation, you can use:

```
Pumas.postprocess(sims, stat = mean) do gen, obs
gen.dv .> obs.dv
end
```

This function will do 2 things:

- Concatenate the query results (e.g.
`gen.dv .> obs.dv`

) from all the simulation runs into a single vector. - Compute the mean value of the combined vector.

Alternatively, you can use the `mean`

function to do the same thing without using the keyword argument. The following will call the `postprocess`

function under the hood.

```
mean(sims) do gen, obs
gen.dv .> obs.dv
end
```

The result of this operation will be a scalar equal to the mean value of the concatenated vector of queries.

In order to get the probability that the simulated quantity is higher than the observation for each time point, you can call the `mean`

function externally as such:

```
generated = Pumas.postprocess(sims) do gen, obs
gen.dv .> obs.dv
end
mean(generated)
```

This returns a vector of probabilities of the same length as the number of time points without concatenating all the queries together.

To compute a summary statistic of all the generated quantity, you can also call:

`Pumas.postprocess(sims, stat = mean)`

without specifying a post-processing function or for short:

`mean(sims)`

Beside `mean`

, you can also use any of the following summary statistic functions in the same way:

`std`

for element-wise standard deviation`var`

for element-wise variance`cor`

for correlation between multiple quantities`cov`

for covariance between multiple quantities

These functions can be passed in as the `stat`

keyword argument to `postprocess`

or they can be used in the short form, e.g.:

```
generated = Pumas.postprocess(sims, stat = std) do gen, obs
...
end
std(sims) do gen, obs
...
end
std(sims)
generated = Pumas.postprocess(sims, stat = var) do gen, obs
...
end
var(sims) do gen, obs
...
end
var(sims)
```

The `cor`

and `cov`

statistics are unique in that they require a post-processing function which outputs a vector. For example to estimate the correlation between the `CL`

and `Vc`

parameter in a 1 compartment model, you can use any of the following:

```
Pumas.postprocess(s, stat = cor) do gen, obs
[gen.CL[1], gen.Vc[1]]
end
cor(s) do gen, obs
[gen.CL[1], gen.Vc[1]]
end
```

Note that `gen.CL`

is a vector of simulated `CL`

values for all the time points. But since the value is constant across time, we can use the first element `gen.CL[1]`

only. `cov`

can be used instead of `cor`

to compute the covariance matrix. The output of this operation is either a correlation or a covariance matrix.