# Model-based optimal design of experiments

In Pumas, users can do model-based optimal design of experiments. Currently, different variants of sample time optimization are supported and we will walk through them in this tutorial.

The first step is to load the OptimalDesign package using:

using OptimalDesign

By loading OptimalDesign, Pumas will be automatically loaded as well.

## Define a model

Once OptimalDesign is loaded, you can define a normal Pumas model as follows.

model = @model begin
@param begin
θ ∈ VectorDomain(3, lower=zeros(3), init=ones(3))
Ω ∈ PSDDomain(2)
end
@random η ~ MvNormal(Ω)
@pre begin
Ka = θ
CL = θ*exp(η)
Vc = θ*exp(η)
end
@dynamics Depots1Central1
@derived begin
cp := @. Central / Vc
dv ~ @. Normal(cp, 1e-2)
end
end

## Define a subject

The next step is to define an observation-free subject.

subject = Subject(
events=DosageRegimen(100),
observations=(dv = nothing,),
)

Note that the observations field should be defined using a NamedTuple whose names match the observed variables' names in the @derived block. The value of the observations can be nothing but any other value can be used and will be ignored.

## Specify the fixed effects

Currently, only local optimal design is available so the user must specify the values of the fixed effects to use.

params = (θ = [1.5, 1.0, 30.0], Ω = [0.1 0.0; 0.0 0.1])

## Time window specification

### Date and time API

In OptimalDesign, users can use the Dates standard Julia package to specify the time bounds for optimization using specific dates and times through the DateTime function. Regular floating point number bounds, i.e. Float64, are also supported but the Dates user interface allows a pain-free incoroporation of hospital schedules. If the time bounds are passed in as Float64, the optimal design will be in Float64. And if the time bounds are passed in using DateTime, the optimal design will be in DateTime.

t0 and tend below define the start and end times of a time window.

t0 = DateTime(2022, 1, 25, 9) # 25 Jan 2022 - 9:00 am
tend = DateTime(2022, 1, 25, 12) # 25 Jan 2022 - 12:00 pm

The DateTime function's inputs are of the form: (Year, Month, Day, Hour, Minute, Second, MilliSecond). Dropping arguments from the end assigns their value to 0, e.g. the minute, second and millisecond values in t0 and tend above are 0.

### DateTime arithmetic

One can use arithmetic functions to increment dates and times as follows:

• Next second: t0 + Second(1)
• Next minute: t0 + Minute(1)
• Next hour: t0 + Hour(1)
• Next day: t0 + Day(1)
• Next week: t0 + Day(7)
• Next month: t0 + Month(1)
• Next year: t0 + Year(1)

Multiplication with integers and subtraction are also supported.

### Intervals

To define time bounds, OptimalDesign makes use of intervals. One can define a time window that starts at t0 and ends at tend using:

t0..tend

Arithmetic on intervals is also supported to increment the whole interval by 1 day or 1 hour. However, parentheses must be used carefully. The following is the correct syntax to increment the start and finish bounds by 1 day:

(t0..tend) + Day(1)

The following syntax parses differently incrementing only the finish bound by a day.

t0..tend + Day(1)

## Decision definition

### Fixed samples per time window

When doing sample time optimization, one way to specify the decision is using a fixed number of samples per time window per subject. The following dictionary can be used to associate a number of samples for each time window using the interval and arithmetic feature presented above:

bounds = Dict(
(t0..tend) => 10,
(t0..tend) + Day(1) => 10,
)

After defining the time bounds and number of samples per time window, a decision can be constructed using:

dec = decision(
model, subject, params,
type = :observation_times,
bounds = bounds,
minimum_offset = Minute(15),
model_time_unit = Hour(1),
)

for a single subject or for a whole population using:

population = [subject, subject]
dec = decision(
model, population, params,
type = :observation_times,
bounds = bounds,
minimum_offset = Minute(15),
model_time_unit = Hour(1),
)

Since models in Pumas are unitless, when DateTime decisions are used, a model_time_unit must be passed in to specify the unit of time assumed in the model definition and dynamics model. The minimum_offset argument can be optionally specified to enforce a minimum duration between any 2 samples. The default value of minimum_offset is 1e-4.

Some additional options which can be specified here include:

• rng: the random number generator used to generate random initial of sample times. The default value is Random.GLOBAL_RNG.
• init: The initial sample times. The default value is nothing. If nothing is input, random sample times are used to initialize the optimization. The initial sample times for a subject is a vector of DateTime or Float64. The initial sample times for a population is a vector of vectors of DateTime or Float64.
• subject_multiple: The number of times each subject is counted. This is a number if only a single subject is passed in the second argument, or a vector if a population is used. The default value is 1 for an individual subject or a vector of ones for a population.
• time_multiples: a vector of integers used to replicate each sample multiple times. If not specified, each sample will be counted only once.

### Total number of samples for all windows

The second way to specify a sample times decision in OptimalDesign is using a total number of observations to be allocated to different time windows. Mixed integer nonlinear programming will be used to find the optimal design. In this case, bounds should be a vector of intervals:

bounds = [(t0..tend), (t0..tend) + Day(1)]

and an additional keyword argument N must be passed to indicate the total number of samples per subject. Example:

dec = decision(
model, subject, params,
type = :observation_times,
bounds = bounds, N = 20,
minimum_offset = Minute(5),
model_time_unit = Hour(1),
)

Similarly, a population can be passed in as the second argument.

Some additional options which can be specified here include:

• rng: the random number generator used to generate random initial of sample times. The default value is Random.GLOBAL_RNG.
• init: The initial sample times. The default value is nothing. If nothing is input, random sample times are used to initialize the optimization. The initial sample times for a subject is a vector of DateTime or Float64. The initial sample times for a population is a vector of vectors of DateTime or Float64.
• subject_multiple: The number of times each subject is counted. This is a number if only a single subject is passed in the second argument, or a vector if a population is used. The default value is 1 for an individual subject or a vector of ones for a population.
• time_multiples: a vector of integers used to replicate each sample multiple times. If not specified, each sample will be counted only once.
• minimum_per_window: a vector of integers indicating the minimum number of samples to allocate to each time window per subject.
• maximum_per_window: a vector of integers indicating the maximum number of samples to allocate to each time window per subject.

## Design

In OptimalDesign, a number of types of optimal design are supported. The optimality type can be specified using the keyword argument optimality which can take the following values:

• :aoptimal: minimizes the trace of the inverse of the expected information matrix.
• :doptimal: maximizes the (log) determinant of the expected information matrix. In the optimization, negative the log determinant is minimized instead.
• :toptimal: maximizes the trace of the expected information matrix. In the optimization, negative the trace is minimized instead.

To perform the sample time optimization using the previously constructed decision dec, run:

result = design(dec, optimality = :doptimal)

for example where :doptimal can be replaced with :aoptimal or :toptimal.

Additionally, the following options can be specified using keyword argument:

• verbose: can be true or false. If true, the optimization progress will be displayed. The default value is false.
• time_limit: time limit in seconds, default value is 1200 seconds (20 minutes)
• nlp_tol: tolerance for the optimization solver
• auto_initialize: auto-intializes the optimizer to a feasible design, default value is true.
• processors: number of processors to use in the mixed integer optimizer.

## Query the design

The output of design can be queried using:

• optimaltimes(result): returns the optimal sample times for each subject. This will be a vector of vectors for a population.
• initvalue(result): returns the initial objective value.
• optimalvalue(result): returns the optimal objective value.
• defficiency(result): returns (det(FIM_2) / det(FIM_1))^(1/M) where FIM_2 is the Fisher information matrix (FIM) at the optimal design, FIM_1 is the FIM at the initial design, and M is the number of model parameters.

## Distributed mixed integer optimization on JuliaHub

If the decision is defined using multiple time windows and a shared total number of samples to be allocated, mixed integer optimization is performed to find the optimal allocation. In mixed integer optimization, you can use multiple processors in a distributed computing environment such as JuliaHub to speed up the computation. For that, you must make sure that OptimalDesign is visible and loaded on all the processors used. To do this:

1. Start Julia in an environment with OptimalDesign available.
2. Run the following to load OptimalDesign on all the processors.
using Distributed
@everywhere using Pkg
@everywhere pkg"activate ."
@everywhere pkg"instantiate"
using OptimalDesign
@everywhere using OptimalDesign
1. Define the model, subject(s), parameters and decision as described in the above sections.
2. Use the processors option in design to set the number of processors to use during the mixed integer optimization, e.g. processors = nprocs() - 1 where nprocs() returns the total number of processors available.

## Full example

The following is an example of optimal design of sample times for a warfarin model.

using OptimalDesign, PharmaDatasets

datapath = dataset("warfarin.csv", String)
df = CSV.read(datapath, DataFrame, missingstrings=[".", "", "NA"])

model = @model begin
@param begin
θ₁ ∈ RealDomain(lower=0.0, init=0.15)
θ₂ ∈ RealDomain(lower=0.0, init=8.0)
θ₃ ∈ RealDomain(lower=0.0, init=1.0)
Ω  ∈ PSDDomain(3)
σ  ∈ RealDomain(lower=0.0001, init=sqrt(0.01))
end

@random begin
η ~ MvNormal(Ω)
end

@pre begin
Tvcl = θ₁
Tvv  = θ₂
Tvka = θ₃
CL   = Tvcl*exp(η)
Vc   = Tvv*exp(η)
Ka   = Tvka*exp(η)
end

@dynamics Depots1Central1

@vars begin
conc = Central / Vc
end

@derived begin
dv ~ @. Normal(log(conc), σ)
end
end

params = (
θ₁ = 0.15,
θ₂ = 8.0,
θ₃ = 1.0,
Ω  = [0.07 0.0 0.0; 0.0 0.02 0.0; 0.0 0.0 0.6],
σ  = sqrt(0.01),
)

# 25 Jan 2021 - 9:00 am
t0 = DateTime(2021, 1, 25, 9)
# 25 Jan 2021 - 9:00 pm
tend = DateTime(2021, 1, 25, 21)

time_windows = [t0..tend]

# Decision variables
Nsubjects = length(data)
dec = decision(
model, data, params,
type = :observation_times,
bounds = time_windows, N = 15,
subject_multiples = fill(3, Nsubjects),
minimum_offset = Minute(30),
model_time_unit = Hour(1),
)

result = design(
dec, optimality = :doptimal,
time_limit = 400.0,
nlp_tol = 1e-4,
verbose = true,
)

optimaltimes(result)