# Model-based optimal design of experiments

In Pumas, users can do model-based optimal design of experiments. Currently, different variants of sample time optimization are supported and we will walk through them in this tutorial.

## Loading the package

The first step is to load the `OptimalDesign`

package using:

`using OptimalDesign`

By loading `OptimalDesign`

, `Pumas`

will be automatically loaded as well.

## Define a model

Once `OptimalDesign`

is loaded, you can define a normal `Pumas`

model as follows.

```
model = @model begin
@param begin
θ ∈ VectorDomain(3, lower=zeros(3), init=ones(3))
Ω ∈ PSDDomain(2)
end
@random η ~ MvNormal(Ω)
@pre begin
Ka = θ[1]
CL = θ[2]*exp(η[1])
Vc = θ[3]*exp(η[2])
end
@dynamics Depots1Central1
@derived begin
cp := @. Central / Vc
dv ~ @. Normal(cp, 1e-2)
end
end
```

## Define a subject

The next step is to define an observation-free subject.

```
subject = Subject(
events=DosageRegimen(100),
observations=(dv = nothing,),
)
```

Note that the `observations`

field should be defined using a `NamedTuple`

whose names match the observed variables' names in the `@derived`

block. The value of the observations can be `nothing`

but any other value can be used and will be ignored.

## Specify the fixed effects

Currently, only local optimal design is available so the user must specify the values of the fixed effects to use.

`params = (θ = [1.5, 1.0, 30.0], Ω = [0.1 0.0; 0.0 0.1])`

## Time window specification

### Date and time API

In `OptimalDesign`

, users can use the `Dates`

standard Julia package to specify the time bounds for optimization using specific dates and times through the `DateTime`

function. Regular floating point number bounds, i.e. `Float64`

, are also supported but the `Dates`

user interface allows a pain-free incoroporation of hospital schedules. If the time bounds are passed in as `Float64`

, the optimal design will be in `Float64`

. And if the time bounds are passed in using `DateTime`

, the optimal design will be in `DateTime`

.

`t0`

and `tend`

below define the start and end times of a time window.

```
t0 = DateTime(2022, 1, 25, 9) # 25 Jan 2022 - 9:00 am
tend = DateTime(2022, 1, 25, 12) # 25 Jan 2022 - 12:00 pm
```

The `DateTime`

function's inputs are of the form: (Year, Month, Day, Hour, Minute, Second, MilliSecond). Dropping arguments from the end assigns their value to 0, e.g. the minute, second and millisecond values in `t0`

and `tend`

above are 0.

### DateTime arithmetic

One can use arithmetic functions to increment dates and times as follows:

- Next second:
`t0 + Second(1)`

- Next minute:
`t0 + Minute(1)`

- Next hour:
`t0 + Hour(1)`

- Next day:
`t0 + Day(1)`

- Next week:
`t0 + Day(7)`

- Next month:
`t0 + Month(1)`

- Next year:
`t0 + Year(1)`

Multiplication with integers and subtraction are also supported.

### Intervals

To define time bounds, `OptimalDesign`

makes use of intervals. One can define a time window that starts at `t0`

and ends at `tend`

using:

`t0..tend`

Arithmetic on intervals is also supported to increment the whole interval by 1 day or 1 hour. However, parentheses must be used carefully. The following is the correct syntax to increment the start and finish bounds by 1 day:

`(t0..tend) + Day(1)`

The following syntax parses differently incrementing only the finish bound by a day.

`t0..tend + Day(1)`

## Decision definition

### Fixed samples per time window

When doing sample time optimization, one way to specify the decision is using a fixed number of samples per time window per subject. The following dictionary can be used to associate a number of samples for each time window using the interval and arithmetic feature presented above:

```
bounds = Dict(
(t0..tend) => 10,
(t0..tend) + Day(1) => 10,
)
```

After defining the time bounds and number of samples per time window, a decision can be constructed using:

```
dec = decision(
model, subject, params,
type = :observation_times,
bounds = bounds,
minimum_offset = Minute(15),
model_time_unit = Hour(1),
)
```

for a single subject or for a whole population using:

```
population = [subject, subject]
dec = decision(
model, population, params,
type = :observation_times,
bounds = bounds,
minimum_offset = Minute(15),
model_time_unit = Hour(1),
)
```

Since models in `Pumas`

are unitless, when `DateTime`

decisions are used, a `model_time_unit`

must be passed in to specify the unit of time assumed in the model definition and dynamics model. The `minimum_offset`

argument can be optionally specified to enforce a minimum duration between any 2 samples. The default value of `minimum_offset`

is `1e-4`

.

Some additional options which can be specified here include:

`rng`

: the random number generator used to generate random initial of sample times. The default value is`Random.GLOBAL_RNG`

.`init`

: The initial sample times. The default value is`nothing`

. If`nothing`

is input, random sample times are used to initialize the optimization. The initial sample times for a subject is a vector of`DateTime`

or`Float64`

. The initial sample times for a population is a vector of vectors of`DateTime`

or`Float64`

.`subject_multiple`

: The number of times each subject is counted. This is a number if only a single subject is passed in the second argument, or a vector if a population is used. The default value is 1 for an individual subject or a vector of ones for a population.`time_multiples`

: a vector of integers used to replicate each sample multiple times. If not specified, each sample will be counted only once.

### Total number of samples for all windows

The second way to specify a sample times decision in `OptimalDesign`

is using a total number of observations to be allocated to different time windows. Mixed integer nonlinear programming will be used to find the optimal design. In this case, `bounds`

should be a vector of intervals:

`bounds = [(t0..tend), (t0..tend) + Day(1)]`

and an additional keyword argument `N`

must be passed to indicate the total number of samples per subject. Example:

```
dec = decision(
model, subject, params,
type = :observation_times,
bounds = bounds, N = 20,
minimum_offset = Minute(5),
model_time_unit = Hour(1),
)
```

Similarly, a population can be passed in as the second argument.

Some additional options which can be specified here include:

`rng`

: the random number generator used to generate random initial of sample times. The default value is`Random.GLOBAL_RNG`

.`init`

: The initial sample times. The default value is`nothing`

. If`nothing`

is input, random sample times are used to initialize the optimization. The initial sample times for a subject is a vector of`DateTime`

or`Float64`

. The initial sample times for a population is a vector of vectors of`DateTime`

or`Float64`

.`subject_multiple`

: The number of times each subject is counted. This is a number if only a single subject is passed in the second argument, or a vector if a population is used. The default value is 1 for an individual subject or a vector of ones for a population.`time_multiples`

: a vector of integers used to replicate each sample multiple times. If not specified, each sample will be counted only once.`minimum_per_window`

: a vector of integers indicating the minimum number of samples to allocate to each time window per subject.`maximum_per_window`

: a vector of integers indicating the maximum number of samples to allocate to each time window per subject.

## Design

In `OptimalDesign`

, a number of types of optimal design are supported. The optimality type can be specified using the keyword argument `optimality`

which can take the following values:

`:aoptimal`

: minimizes the trace of the inverse of the expected information matrix.`:doptimal`

: maximizes the (log) determinant of the expected information matrix. In the optimization, negative the log determinant is minimized instead.`:toptimal`

: maximizes the trace of the expected information matrix. In the optimization, negative the trace is minimized instead.

To perform the sample time optimization using the previously constructed decision `dec`

, run:

`result = design(dec, optimality = :doptimal)`

for example where `:doptimal`

can be replaced with `:aoptimal`

or `:toptimal`

.

Additionally, the following options can be specified using keyword argument:

`verbose`

: can be`true`

or`false`

. If`true`

, the optimization progress will be displayed. The default value is`false`

.`time_limit`

: time limit in seconds, default value is 1200 seconds (20 minutes)`nlp_tol`

: tolerance for the optimization solver`auto_initialize`

: auto-intializes the optimizer to a feasible design, default value is true.`processors`

: number of processors to use in the mixed integer optimizer.

## Query the design

The output of `design`

can be queried using:

`optimaltimes(result)`

: returns the optimal sample times for each subject. This will be a vector of vectors for a population.`initvalue(result)`

: returns the initial objective value.`optimalvalue(result)`

: returns the optimal objective value.`defficiency(result)`

: returns`(det(FIM_2) / det(FIM_1))^(1/M)`

where`FIM_2`

is the Fisher information matrix (FIM) at the optimal design,`FIM_1`

is the FIM at the initial design, and`M`

is the number of model parameters.

## Distributed mixed integer optimization on JuliaHub

If the decision is defined using multiple time windows and a shared total number of samples to be allocated, mixed integer optimization is performed to find the optimal allocation. In mixed integer optimization, you can use multiple processors in a distributed computing environment such as JuliaHub to speed up the computation. For that, you must make sure that `OptimalDesign`

is visible and loaded on all the processors used. To do this:

- Start Julia in an environment with
`OptimalDesign`

available. - Run the following to load
`OptimalDesign`

on all the processors.

```
using Distributed
@everywhere using Pkg
@everywhere pkg"activate ."
@everywhere pkg"instantiate"
using OptimalDesign
@everywhere using OptimalDesign
```

- Define the model, subject(s), parameters and decision as described in the above sections.
- Use the
`processors`

option in`design`

to set the number of processors to use during the mixed integer optimization, e.g.`processors = nprocs() - 1`

where`nprocs()`

returns the total number of processors available.

## Full example

The following is an example of optimal design of sample times for a warfarin model.

```
using OptimalDesign, PharmaDatasets
datapath = dataset("warfarin.csv", String)
df = CSV.read(datapath, DataFrame, missingstrings=[".", "", "NA"])
data = read_pumas(df)
model = @model begin
@param begin
θ₁ ∈ RealDomain(lower=0.0, init=0.15)
θ₂ ∈ RealDomain(lower=0.0, init=8.0)
θ₃ ∈ RealDomain(lower=0.0, init=1.0)
Ω ∈ PSDDomain(3)
σ ∈ RealDomain(lower=0.0001, init=sqrt(0.01))
end
@random begin
η ~ MvNormal(Ω)
end
@pre begin
Tvcl = θ₁
Tvv = θ₂
Tvka = θ₃
CL = Tvcl*exp(η[1])
Vc = Tvv*exp(η[2])
Ka = Tvka*exp(η[3])
end
@dynamics Depots1Central1
@vars begin
conc = Central / Vc
end
@derived begin
dv ~ @. Normal(log(conc), σ)
end
end
params = (
θ₁ = 0.15,
θ₂ = 8.0,
θ₃ = 1.0,
Ω = [0.07 0.0 0.0; 0.0 0.02 0.0; 0.0 0.0 0.6],
σ = sqrt(0.01),
)
# 25 Jan 2021 - 9:00 am
t0 = DateTime(2021, 1, 25, 9)
# 25 Jan 2021 - 9:00 pm
tend = DateTime(2021, 1, 25, 21)
time_windows = [t0..tend]
# Decision variables
Nsubjects = length(data)
dec = decision(
model, data, params,
type = :observation_times,
bounds = time_windows, N = 15,
subject_multiples = fill(3, Nsubjects),
minimum_offset = Minute(30),
model_time_unit = Hour(1),
)
result = design(
dec, optimality = :doptimal,
time_limit = 400.0,
nlp_tol = 1e-4,
verbose = true,
)
optimaltimes(result)
```