Introduction
In the pharmaceutical industry, there are multiple reasons to avoid collecting excessive data when developing drugs, such as:
- Collecting more samples from the same individual is inconvenient for the patients, who may have to stay in the hospital for longer or visit more frequently, making it more difficult to recruit individuals in studies.
- Taking more samples than necessary from pediatric patients, who have small veins, is risky and unethical.
- Increasing costs to collect more data, either by recruiting more individuals or by keeping patients in the clinic for longer to collect more samples.
- Requiring more nursing time, hospital bed time, and lab testing time to collect and analyze the samples.
- Slowing down clinical trials and time-to-market, especially for rare diseases, for which recruiting more individuals in studies can take years.
To avoid these problems, model-based optimal design of experiments [11] can be used to design more efficient studies, minimizing the number of subjects and of samples per subject; and maximizing the utility of the data collected, which afterwards is used to fit a pharmacometric model. In general, a better experiment is one that enables higher confidence on the estimation of model parameters or derived quantities. In other words, the goal is to maximize the information learned about the main quantities of interest in the model. One commonly used measure of information in optimal design is the Fisher information matrix (FIM). OptimalDesign.jl provides a multitude of procedures and utilities to perform optimal design of experiments in pharmacometrics.
Tutorials
- Introduction to Optimal Design: covers mathematical background and basic usage of OptimalDesign.jl.
Docstrings
OptimalDesign.design
— Functiondesign(
decisions::Union{AbstractVector{<:ObsTimesDecision},Tuple{ObsTimesDecision}};
optimality::Symbol = :doptimal,
param_weights::Union{Nothing,Vector} = nothing,
inv_fim_reducer::Union{Nothing,AbstractMatrix} = nothing,
time_limit,
nlp_tol,
processors,
verbose
) -> OptimalSchedule
Optimize the designs in decisions
, obtaining optimal sample times. Returns an OptimalSchedule
object, which includes initial and optimal designs, FIMs, objective values and more. optimality
defines the objective type, and can be set as:
:toptimal
: maximize the trace of the FIM.:aoptimal
: minimize the trace of the inverse of the FIM.:doptimal
: maximize the (log) determinant of the FIM.
Possible keyword arguments:
param_weights
(default:nothing
): optional weights vector to assign different weights on the parameters when the optimality criteria is either:toptimal
or:aoptimal
.inv_fim_reducer
(default:nothing
): an optional matrix used to compute the reduced inverse FIM asinv_fim_reducer * inv(FIM) * inv_fim_reducer'
. This is useful when minimizing the uncertainty wrt derived quantities in the model. Passinginv_fim_reducer
as the Jacobian of the derived quantities wrt the model parameters modifies the objective function value to maximize the information learned about the derived quantities rather than the model parameters themselves. Wheninv_fim_reducer
is notnothing
, theparam_weights
keyword argument will refer to the weights of the derived quantities, rather than the model parameters.time_limit
: When prespecifying the distribution of samples across windows through a ("bounds") dictionary, this sets a time limit for the optimization. Default is 1200s.solutions
: When letting the solver also optimize the distribution of samples across windows, this limits the number of attempts by the solver. Default is unlimited.nlp_tol
: optimization convergence tolerance. Default is 1e-5.processors
: number of processors used in distributed optimization. Only used when a fixed number of samples per window is not specified.max_iter
: maximum number of iterations allowed for the optimization. Default is 1000.
The keyword argument verbose
can be set to true
to display statistics about the optimization progress. The output includes the information below as columns for every iteration. Further details are included in Ipopt's documentation.
iter
: iteration index.objective
: current objective value.inf_pr
: current infeasibility in primal optimization.inf_du
: current infeasibility in dual optimization.lg(mu)
: log<sub>10</sub> of the value of the internal barrier parametermu
.||d||
: greatest absolute value of component of step in primal problem.lg(rg)
: log<sub>10</sub> of the value of the regularization term for the Hessian of the Lagrangian. A dash indicates no regularization.alpha_du
: stepsize for the dual variables.alpha_pr
: stepsize for the primal variables.ls
: number of backtracking line search steps.
OptimalDesign.decision
— Functiondecision(
model::Pumas.PumasModel,
data::Union{Pumas.Subject,Pumas.Population},
param;
type = :observation_times,
init,
time_multiples,
subject_multiples,
minimum_offset,
model_time_unit,
bounds,
N,
minimum_per_window,
maximum_per_window
) -> ObsTimesDecision
Define a decision
object for subject or population data
. A decision
represents a design yet to be optimized, including a model
and parameters param
. General keyword arguments:
init
: optional input to initialize thedecision
with specific sample times. For a single subject, provide a vector of time instants. For a population, provide a vector containing a vector of time instants for each subject. If this keyword argument is not used, thedecision
is initialized with random sample times within the time windows.time_multiples
: a vector of integers used to replicate each sample multiple times. If not specified, each sample will be counted only once. This is useful when multiple sub-samples can be taken from each sample giving multiple measurements at the same sample time.subject_multiples
: optional information to specify how many times (i.e. the group size) each unique subject should be counted when computing the expected FIM. This implies repeated subjects share the same covariates and dosing regimens.minimum_offset
: minimum interval between measurements.model_time_unit
: unit of time used inmodel
, only relevant when usingDateTime
for the sample times and time windows.
When fixing the number of samples per time window, provide the keyword argument bounds
as a dictionary that maps begin (wb
) and end (we
) times of each window to the number of time samples (n
), i.e. Dict((wb .. we) => n)
.
An alternative is to provide the total number N
of samples across all windows, and letting the optimizer distribute that amount among the windows. In this case, bounds
should be a vector of time window begin and end times, i.e. [wb1 .. we1, wb2 .. we2]
. Additionally, integers can be provided to the keyword arguments minimum_per_window
and maximum_per_window
to serve as lower and upper limits of amount of samples per window, respectively, e.g. minimum_per_window = [1, 1]
and maximum_per_window = [4, 4]
.
Time values, such as window bounds and minimum_offset
, can be provided using either types from the standard Julia package Dates or Float64
. However, these options should not be mixed. When creating a decision, use only one or the other for all inputs. For example, a time window could be defined using Float64
by bounds = Dict((0.0 .. 5.0) => 3)
; and with time definitions from Dates
by:
t0 = DateTime(2021, 1, 25, 9) # 25 Jan 2021 - 9:00 am
tend = DateTime(2021, 1, 25, 12) # 25 Jan 2021 - 12:00 pm
bounds = Dict((t0 .. tend) => 3)
OptimalDesign.optimaltimes
— Functionoptimaltimes(result::OptimalSchedule)
Extract the optimal sample times from an OptimalSchedule
object resulting from calling design
.
OptimalDesign.initvalue
— Functioninitvalue(result::OptimalSchedule)
Extract initial objective value from OptimalSchedule
object resulting from calling design
.
OptimalDesign.optimalvalue
— Functionoptimalvalue(result::OptimalSchedule)
Extract final objective value from OptimalSchedule
object resulting from calling design
.
OptimalDesign.defficiency
— Functiondefficiency(result::OptimalSchedule)
Determine the efficiency of a D-optimal design, defined by the equation below. It is based on the determinants of the initial and final FIMs ($FIM_{i}$ and $FIM_{f}$) and the number of parameters $p$.
\[\eta = \left[\frac{det(FIM_{f})}{det(FIM_{i})}\right]^{\frac{1}{p}}\]
OptimalDesign.check_feasibility
— Functioncheck_feasibility(
decisions::Union{AbstractVector{<:ObsTimesDecision},Tuple{ObsTimesDecision}},
sample_times::Vector{Vector},
)
Given decisions
, make the following verifications in the design sample_times
:
- For each subject, there are no repeated (simultaneous) samples.
- Sample times are non-negative.
- All samples are within a window.
- For each subject, samples are sorted in time.
- Samples are at least
decision.minimum_offset
apart.
OptimalDesign.named_fim
— Functionnamed_fim(fpm::Pumas.FittedPumasModel; observed = false)
Returns the observed/expected Fisher information matrix (FIM) using the fitted parameters in fpm
as a NamedArray
indexed by parameter names. If observed
is true
, the observed FIM is returned. If false
(default), the expected FIM is returned.
named_fim(
model::Pumas.PumasModel,
data::Union{Pumas.Population,Pumas.Subject},
param::NamedTuple,
times::Union{OptimalDesign.ObsTimes,AbstractVector{<:OptimalDesign.ObsTimes}}
)
Returns the expected Fisher information matrix (FIM) of a subject/population data
at param
as a NamedArray
indexed by parameter names. The observation time points times
are used to calculate the expected FIM.
named_fim(
model::Pumas.PumasModel,
data::Union{Pumas.Population,Pumas.Subject},
param::NamedTuple,
)
Returns the expected Fisher information matrix (FIM) of a subject/population data
at param
as a NamedArray
indexed by parameter names. The observation time points in the subjects are used to calculate the expected FIM.
OptimalDesign.evaluate_design
— Functionevaluate_design(
decisions::AbstractVector{<:ObsTimesDecision},
population::Union{Pumas.Population,Tuple{Pumas.Subject}},
new_times::Vector{<:Vector},
obj_type::Symbol;
param_weights = nothing,
)
Check feasibility of design new_times
for given decisions
, population
, and obj_type
. Then return a DesignEvaluation
object with the resulting FIM and objective value.
OptimalDesign.plot_time
— Functionplot_time(
dec::ObsTimesDecision,
all_times::Vector;
split_plot = :gap_criterion,
) -> Makie.Figure
plot_time(
dec::AbstractVector{<:ObsTimesDecision},
all_times::Vector{Vector};
split_plot = :gap_criterion,
) -> Makie.Figure
Plot time samples all_times
using the window definitions in the decision(s) dec
. Time instants are plotted in the x-axis, and each color-alternating line refers to a subject. If necessary, vertical lines indicate a relevant time cycle (in blue), and the inner window bounds (in yellow). For example, for the time windows
Dict(
(DateTime(2024, 1, 1, 0) .. DateTime(2024, 1, 5, 6)) => 4,
(DateTime(2024, 1, 6, 9) .. DateTime(2024, 1, 10, 15)) => 4,
)
and time_model_unit = Day(1)
in the decision(s), a blue vertical line at x = 7 days indicates a week cycle. And yellow vertical lines indicate the inner window bounds of DateTime(2024, 1, 5, 6)
and DateTime(2024, 1, 6, 9)
. The blue lines are not plotted when defining times with Float64
. Also, 8 black dots will be distributed horizontally for each patient, marking the 4 sample instants inside each window. Lastly, the keyword argument split_plot
accepts the following values:
:split
: create separate plots for each time window.:whole
: include all time windows in a single plot.:gap_criterion
(default): let the function decide to split the plots, in case there would be too much empty space in a single plot.
The function returns a Figure
object from the Makie.jl plotting package. Different backends can be used to plot the Figure
, each with its own package. The most relevant ones for this context are:
GLMakie.jl
: interactive plotting. With the code below, a window with the plot is created.
using Makie, GLMakie
GLMakie.activate!()
display(my_figure)
CairoMakie.jl
: non-interactive publication-quality vector graphics. With the code below, a PDF file with the plot is stored at the desired location.
using Makie, CairoMakie
CairoMakie.activate!()
save("[path]/saved_figure.pdf", my_figure)
OptimalDesign.analyze_identifiability
— Functionanalyze_identifiability(
model::Pumas.PumasModel,
data::Union{Pumas.Population, Pumas.Subject},
param::NamedTuple;
fim_options::NamedTuple = (;),
time_limit::Float64 = 120.0,
psd_tol::Float64 = 1e-6,
mip_infeas_tol::Float64 = 1e-8,
mip_rel_gap::Float64 = 1e-6,
prior_strength::Int = 2,
nsol::Int = 2,
normalization_threshold::Float64 = 1e-3,
cannot_fix::Vector{Int} = Int[],
assume_fixed::Vector{Int} = Int[],
)
Analyzes the identifiability of the model given the experiment design in data
using the (scaled) Fisher information matrix (FIM). The function then suggests different combinations of parameters to fix to make the model identifiable. These parameter combinations are found by finding the largest sub-matrices of the (scaled) information matrix that are invertible. The problem is formulated as a mixed integer semi-definite program (MISDP) and solved to global optimality. Unique solutions are then obtained by adding additional uniqueness constraints and resolving the MISDP.
The options available and their default values are shown below:
fim_options
: (default is(;)
) the options used to compute the FIM.time_limit
: (default = 120.0) time limit for the optimization solverpsd_tol
: (default = 1e-6) maximum infeasibility allowed in the positive definiteness constraint.prior_strength
: (default = 2) prior relative strength to mimic fixing a parameter.nsol
: (default = 2) maximum number of unique solutions to be returned. Each solution is a combination of parameters to fix to make the model locally identifiable.normalization_threshold
: (default = 1e-3) when any diagonal element in the information matrix has an absolute value larger thannormalization_threshold
, this row and column will be scaled such that the diagonal is 1. This makes the identifiability analysis more robust to the scale of the parameters.cannot_fix
: (default = Int[]) list of parameter indices to avoid fixing in the identifiability analysis. The algorithm will either return a combination of other parameters to fix excluding the parameters whose indices are incannot_fix
or it will fail with if the model cannot be made identifiable while keeping those parameters free.assume_fixed
: (default = Int[]) list of parameter indices to force fixing in the identifiability analysis.
analyze_identifiability(
fpm::Pumas.FittedPumasModel;
observed::Bool = false,
fim_options::NamedTuple = (;),
time_limit::Float64 = 120.0,
psd_tol::Float64 = 1e-6,
mip_infeas_tol::Float64 = 1e-8,
mip_rel_gap::Float64 = 1e-6,
prior_strength::Int = 2,
nsol::Int = 2,
normalization_threshold::Float64 = 1e-3,
cannot_fix::Vector{Int} = Int[],
assume_fixed::Vector{Int} = Int[],
)
Analyzes the identifiability of the model fitted in fpm
given the experiment design the fitting data using the (scaled) expected/observed (Fisher) information matrix. The function then suggests different combinations of parameters to fix to make the model identifiable. These parameter combinations are found by finding the largest sub-matrices of the (scaled) information matrix that are invertible. The problem is formulated as a mixed integer semi-definite program (MISDP) and solved to global optimality. Unique solutions are then obtained by adding additional uniqueness constraints and resolving the MISDP.
If observed
is true
, the observed information matrix is used. If observed
is false
(default), the expected (Fisher) information matrix is used. The options available and their default values are shown below:
observed
: (default isfalse
) controls whether the observed information matrix (if set totrue
) or (expected) Fisher information matrix (if set tofalse
) will be used for the identifiability analysis.fim_options
: (default is(;)
) can be used to define the options for the FIM calculation only ifobserved
is set tofalse
. By default, the differential equation optionsdiffeq_options
will be taken from the fitted Pumas modelfpm
.time_limit
: (default = 120.0) time limit for the optimization solverpsd_tol
: (default = 1e-6) maximum infeasibility allowed in the positive definiteness constraint.prior_strength
: (default = 2) prior relative strength to mimic fixing a parameter.nsol
: (default = 2) maximum number of unique solutions to be returned. Each solution is a combination of parameters to fix to make the model locally identifiable.normalization_threshold
: (default = 1e-3) when any diagonal element in the information matrix has an absolute value larger thannormalization_threshold
, this row and column will be scaled such that the diagonal is 1. This makes the identifiability analysis more robust to the scale of the parameters.cannot_fix
: (default = Int[]) list of parameter indices to avoid fixing in the identifiability analysis. The algorithm will either return a combination of other parameters to fix excluding the parameters whose indices are incannot_fix
or it will fail with if the model cannot be made identifiable while keeping those parameters free.assume_fixed
: (default = Int[]) list of parameter indices to force fixing in the identifiability analysis.
analyze_identifiability(
fim::AbstractArray;
time_limit::Float64 = 120.0,
psd_tol::Float64 = 1e-6,
mip_infeas_tol::Float64 = 1e-8,
mip_rel_gap::Float64 = 1e-6,
prior_strength::Int = 2,
nsol::Int = 2,
normalization_threshold::Float64 = 1e-3,
cannot_fix::Vector{Int} = Int[],
assume_fixed::Vector{Int} = Int[],
)
Analyzes the identifiability of the model using the input Fisher information matrix fim
. The function then suggests different combinations of parameters to fix to make the model identifiable. These parameter combinations are found by finding the largest sub-matrices of the (scaled) information matrix that are invertible. The problem is formulated as a mixed integer semi-definite program (MISDP) and solved to global optimality. Unique solutions are then obtained by adding additional uniqueness constraints and resolving the MISDP.
The options available and their default values are shown below:
time_limit
: (default = 120.0) time limit for the optimization solverpsd_tol
: (default = 1e-6) maximum infeasibility allowed in the positive definiteness constraint.prior_strength
: (default = 2) prior relative strength to mimic fixing a parameter.nsol
: (default = 2) maximum number of unique solutions to be returned. Each solution is a combination of parameters to fix to make the model locally identifiable.normalization_threshold
: (default = 1e-3) when any diagonal element in the information matrix has an absolute value larger thannormalization_threshold
, this row and column will be scaled such that the diagonal is 1. This makes the identifiability analysis more robust to the scale of the parameters.cannot_fix
: (default = Int[]) list of parameter indices to avoid fixing in the identifiability analysis. The algorithm will either return a combination of other parameters to fix excluding the parameters whose indices are incannot_fix
or it will fail with if the model cannot be made identifiable while keeping those parameters free.assume_fixed
: (default = Int[]) list of parameter indices to force fixing in the identifiability analysis.
OptimalDesign.crlb
— Functioncrlb(model::Pumas.PumasModel, param::NamedTuple, res::OptimalSchedule)
For model
, parameters param
, and the optimal Fisher Information Matrix (FIM) in the optimization results res
, compute the Cramer-Rao lower bounds on standard errors of parameter estimates. A DataFrame
object is returned, containing the lower bounds on the standard errors and the relative standard errors.