Introduction

In the pharmaceutical industry, there are multiple reasons to avoid collecting excessive data when developing drugs, such as:

  • Collecting more samples from the same individual is inconvenient for the patients, who may have to stay in the hospital for longer or visit more frequently, making it more difficult to recruit individuals in studies.
  • Taking more samples than necessary from pediatric patients, who have small veins, is risky and unethical.
  • Increasing costs to collect more data, either by recruiting more individuals or by keeping patients in the clinic for longer to collect more samples.
  • Requiring more nursing time, hospital bed time, and lab testing time to collect and analyze the samples.
  • Slowing down clinical trials and time-to-market, especially for rare diseases, for which recruiting more individuals in studies can take years.

To avoid these problems, model-based optimal design of experiments [11] can be used to design more efficient studies, minimizing the number of subjects and of samples per subject; and maximizing the utility of the data collected, which afterwards is used to fit a pharmacometric model. In general, a better experiment is one that enables higher confidence on the estimation of model parameters or derived quantities. In other words, the goal is to maximize the information learned about the main quantities of interest in the model. One commonly used measure of information in optimal design is the Fisher information matrix (FIM). OptimalDesign.jl provides a multitude of procedures and utilities to perform optimal design of experiments in pharmacometrics.

Tutorials

  1. Introduction to Optimal Design: covers mathematical background and basic usage of OptimalDesign.jl.

Docstrings

OptimalDesign.designFunction
design(
  decisions::Union{AbstractVector{<:ObsTimesDecision},Tuple{ObsTimesDecision}};
  optimality::Symbol = :doptimal,
  param_weights::Union{Nothing,Vector} = nothing,
  inv_fim_reducer::Union{Nothing,AbstractMatrix} = nothing,
  time_limit,
  nlp_tol,
  processors,
  verbose
) -> OptimalSchedule

Optimize the designs in decisions, obtaining optimal sample times. Returns an OptimalSchedule object, which includes initial and optimal designs, FIMs, objective values and more. optimality defines the objective type, and can be set as:

  • :toptimal: maximize the trace of the FIM.
  • :aoptimal: minimize the trace of the inverse of the FIM.
  • :doptimal: maximize the (log) determinant of the FIM.

Possible keyword arguments:

  • param_weights (default: nothing): optional weights vector to assign different weights on the parameters when the optimality criteria is either :toptimal or :aoptimal.
  • inv_fim_reducer (default: nothing): an optional matrix used to compute the reduced inverse FIM as inv_fim_reducer * inv(FIM) * inv_fim_reducer'. This is useful when minimizing the uncertainty wrt derived quantities in the model. Passing inv_fim_reducer as the Jacobian of the derived quantities wrt the model parameters modifies the objective function value to maximize the information learned about the derived quantities rather than the model parameters themselves. When inv_fim_reducer is not nothing, the param_weights keyword argument will refer to the weights of the derived quantities, rather than the model parameters.
  • time_limit: When prespecifying the distribution of samples across windows through a ("bounds") dictionary, this sets a time limit for the optimization. Default is 1200s.
  • solutions: When letting the solver also optimize the distribution of samples across windows, this limits the number of attempts by the solver. Default is unlimited.
  • nlp_tol: optimization convergence tolerance. Default is 1e-5.
  • processors: number of processors used in distributed optimization. Only used when a fixed number of samples per window is not specified.
  • max_iter: maximum number of iterations allowed for the optimization. Default is 1000.

The keyword argument verbose can be set to true to display statistics about the optimization progress. The output includes the information below as columns for every iteration. Further details are included in Ipopt's documentation.

  • iter: iteration index.
  • objective: current objective value.
  • inf_pr: current infeasibility in primal optimization.
  • inf_du: current infeasibility in dual optimization.
  • lg(mu): log<sub>10</sub> of the value of the internal barrier parameter mu.
  • ||d||: greatest absolute value of component of step in primal problem.
  • lg(rg): log<sub>10</sub> of the value of the regularization term for the Hessian of the Lagrangian. A dash indicates no regularization.
  • alpha_du: stepsize for the dual variables.
  • alpha_pr: stepsize for the primal variables.
  • ls: number of backtracking line search steps.
OptimalDesign.decisionFunction
decision(
  model::Pumas.PumasModel,
  data::Union{Pumas.Subject,Pumas.Population},
  param;
  type = :observation_times,
  init,
  time_multiples,
  subject_multiples,
  minimum_offset,
  model_time_unit,
  bounds,
  N,
  minimum_per_window,
  maximum_per_window
) -> ObsTimesDecision

Define a decision object for subject or population data. A decision represents a design yet to be optimized, including a model and parameters param. General keyword arguments:

  • init: optional input to initialize the decision with specific sample times. For a single subject, provide a vector of time instants. For a population, provide a vector containing a vector of time instants for each subject. If this keyword argument is not used, the decision is initialized with random sample times within the time windows.
  • time_multiples: a vector of integers used to replicate each sample multiple times. If not specified, each sample will be counted only once. This is useful when multiple sub-samples can be taken from each sample giving multiple measurements at the same sample time.
  • subject_multiples: optional information to specify how many times (i.e. the group size) each unique subject should be counted when computing the expected FIM. This implies repeated subjects share the same covariates and dosing regimens.
  • minimum_offset: minimum interval between measurements.
  • model_time_unit: unit of time used in model, only relevant when using DateTime for the sample times and time windows.

When fixing the number of samples per time window, provide the keyword argument bounds as a dictionary that maps begin (wb) and end (we) times of each window to the number of time samples (n), i.e. Dict((wb .. we) => n).

An alternative is to provide the total number N of samples across all windows, and letting the optimizer distribute that amount among the windows. In this case, bounds should be a vector of time window begin and end times, i.e. [wb1 .. we1, wb2 .. we2]. Additionally, integers can be provided to the keyword arguments minimum_per_window and maximum_per_window to serve as lower and upper limits of amount of samples per window, respectively, e.g. minimum_per_window = [1, 1] and maximum_per_window = [4, 4].

Time values, such as window bounds and minimum_offset, can be provided using either types from the standard Julia package Dates or Float64. However, these options should not be mixed. When creating a decision, use only one or the other for all inputs. For example, a time window could be defined using Float64 by bounds = Dict((0.0 .. 5.0) => 3); and with time definitions from Dates by:

t0 = DateTime(2021, 1, 25, 9) # 25 Jan 2021 - 9:00 am
tend = DateTime(2021, 1, 25, 12) # 25 Jan 2021 - 12:00 pm
bounds = Dict((t0 .. tend) => 3)
OptimalDesign.optimaltimesFunction
optimaltimes(result::OptimalSchedule)

Extract the optimal sample times from an OptimalSchedule object resulting from calling design.

OptimalDesign.initvalueFunction
initvalue(result::OptimalSchedule)

Extract initial objective value from OptimalSchedule object resulting from calling design.

OptimalDesign.optimalvalueFunction
optimalvalue(result::OptimalSchedule)

Extract final objective value from OptimalSchedule object resulting from calling design.

OptimalDesign.defficiencyFunction
defficiency(result::OptimalSchedule)

Determine the efficiency of a D-optimal design, defined by the equation below. It is based on the determinants of the initial and final FIMs ($FIM_{i}$ and $FIM_{f}$) and the number of parameters $p$.

\[\eta = \left[\frac{det(FIM_{f})}{det(FIM_{i})}\right]^{\frac{1}{p}}\]

OptimalDesign.check_feasibilityFunction
check_feasibility(
  decisions::Union{AbstractVector{<:ObsTimesDecision},Tuple{ObsTimesDecision}},
  sample_times::Vector{Vector},
)

Given decisions, make the following verifications in the design sample_times:

  • For each subject, there are no repeated (simultaneous) samples.
  • Sample times are non-negative.
  • All samples are within a window.
  • For each subject, samples are sorted in time.
  • Samples are at least decision.minimum_offset apart.
OptimalDesign.named_fimFunction
named_fim(fpm::Pumas.FittedPumasModel; observed = false)

Returns the observed/expected Fisher information matrix (FIM) using the fitted parameters in fpm as a NamedArray indexed by parameter names. If observed is true, the observed FIM is returned. If false (default), the expected FIM is returned.

named_fim(
  model::Pumas.PumasModel,
  data::Union{Pumas.Population,Pumas.Subject},
  param::NamedTuple,
  times::Union{OptimalDesign.ObsTimes,AbstractVector{<:OptimalDesign.ObsTimes}}
)

Returns the expected Fisher information matrix (FIM) of a subject/population data at param as a NamedArray indexed by parameter names. The observation time points times are used to calculate the expected FIM.

named_fim(
  model::Pumas.PumasModel,
  data::Union{Pumas.Population,Pumas.Subject},
  param::NamedTuple,
)

Returns the expected Fisher information matrix (FIM) of a subject/population data at param as a NamedArray indexed by parameter names. The observation time points in the subjects are used to calculate the expected FIM.

OptimalDesign.evaluate_designFunction
evaluate_design(
  decisions::AbstractVector{<:ObsTimesDecision},
  population::Union{Pumas.Population,Tuple{Pumas.Subject}},
  new_times::Vector{<:Vector},
  obj_type::Symbol;
  param_weights = nothing,
)

Check feasibility of design new_times for given decisions, population, and obj_type. Then return a DesignEvaluation object with the resulting FIM and objective value.

OptimalDesign.plot_timeFunction
plot_time(
  dec::ObsTimesDecision,
  all_times::Vector;
  split_plot = :gap_criterion,
) -> Makie.Figure

plot_time(
  dec::AbstractVector{<:ObsTimesDecision},
  all_times::Vector{Vector};
  split_plot = :gap_criterion,
) -> Makie.Figure

Plot time samples all_times using the window definitions in the decision(s) dec. Time instants are plotted in the x-axis, and each color-alternating line refers to a subject. If necessary, vertical lines indicate a relevant time cycle (in blue), and the inner window bounds (in yellow). For example, for the time windows

Dict(
  (DateTime(2024, 1, 1, 0) .. DateTime(2024, 1, 5, 6)) => 4,
  (DateTime(2024, 1, 6, 9) .. DateTime(2024, 1, 10, 15)) => 4,
)

and time_model_unit = Day(1) in the decision(s), a blue vertical line at x = 7 days indicates a week cycle. And yellow vertical lines indicate the inner window bounds of DateTime(2024, 1, 5, 6) and DateTime(2024, 1, 6, 9). The blue lines are not plotted when defining times with Float64. Also, 8 black dots will be distributed horizontally for each patient, marking the 4 sample instants inside each window. Lastly, the keyword argument split_plot accepts the following values:

  • :split: create separate plots for each time window.
  • :whole: include all time windows in a single plot.
  • :gap_criterion (default): let the function decide to split the plots, in case there would be too much empty space in a single plot.

The function returns a Figure object from the Makie.jl plotting package. Different backends can be used to plot the Figure, each with its own package. The most relevant ones for this context are:

  • GLMakie.jl: interactive plotting. With the code below, a window with the plot is created.
using Makie, GLMakie
GLMakie.activate!()
display(my_figure)
  • CairoMakie.jl: non-interactive publication-quality vector graphics. With the code below, a PDF file with the plot is stored at the desired location.
using Makie, CairoMakie
CairoMakie.activate!()
save("[path]/saved_figure.pdf", my_figure)
OptimalDesign.analyze_identifiabilityFunction
analyze_identifiability(
  model::Pumas.PumasModel,
  data::Union{Pumas.Population, Pumas.Subject},
  param::NamedTuple;
  fim_options::NamedTuple = (;),
  time_limit::Float64 = 120.0,
  psd_tol::Float64 = 1e-6,
  mip_infeas_tol::Float64 = 1e-8,
  mip_rel_gap::Float64 = 1e-6,
  prior_strength::Int = 2,
  nsol::Int = 2,
  normalization_threshold::Float64 = 1e-3,
  cannot_fix::Vector{Int} = Int[],
  assume_fixed::Vector{Int} = Int[],
)

Analyzes the identifiability of the model given the experiment design in data using the (scaled) Fisher information matrix (FIM). The function then suggests different combinations of parameters to fix to make the model identifiable. These parameter combinations are found by finding the largest sub-matrices of the (scaled) information matrix that are invertible. The problem is formulated as a mixed integer semi-definite program (MISDP) and solved to global optimality. Unique solutions are then obtained by adding additional uniqueness constraints and resolving the MISDP.

The options available and their default values are shown below:

  • fim_options: (default is (;)) the options used to compute the FIM.
  • time_limit: (default = 120.0) time limit for the optimization solver
  • psd_tol: (default = 1e-6) maximum infeasibility allowed in the positive definiteness constraint.
  • prior_strength: (default = 2) prior relative strength to mimic fixing a parameter.
  • nsol: (default = 2) maximum number of unique solutions to be returned. Each solution is a combination of parameters to fix to make the model locally identifiable.
  • normalization_threshold: (default = 1e-3) when any diagonal element in the information matrix has an absolute value larger than normalization_threshold, this row and column will be scaled such that the diagonal is 1. This makes the identifiability analysis more robust to the scale of the parameters.
  • cannot_fix: (default = Int[]) list of parameter indices to avoid fixing in the identifiability analysis. The algorithm will either return a combination of other parameters to fix excluding the parameters whose indices are in cannot_fix or it will fail with if the model cannot be made identifiable while keeping those parameters free.
  • assume_fixed: (default = Int[]) list of parameter indices to force fixing in the identifiability analysis.
analyze_identifiability(
  fpm::Pumas.FittedPumasModel;
  observed::Bool = false,
  fim_options::NamedTuple = (;),
  time_limit::Float64 = 120.0,
  psd_tol::Float64 = 1e-6,
  mip_infeas_tol::Float64 = 1e-8,
  mip_rel_gap::Float64 = 1e-6,
  prior_strength::Int = 2,
  nsol::Int = 2,
  normalization_threshold::Float64 = 1e-3,
  cannot_fix::Vector{Int} = Int[],
  assume_fixed::Vector{Int} = Int[],
)

Analyzes the identifiability of the model fitted in fpm given the experiment design the fitting data using the (scaled) expected/observed (Fisher) information matrix. The function then suggests different combinations of parameters to fix to make the model identifiable. These parameter combinations are found by finding the largest sub-matrices of the (scaled) information matrix that are invertible. The problem is formulated as a mixed integer semi-definite program (MISDP) and solved to global optimality. Unique solutions are then obtained by adding additional uniqueness constraints and resolving the MISDP.

If observed is true, the observed information matrix is used. If observed is false (default), the expected (Fisher) information matrix is used. The options available and their default values are shown below:

  • observed: (default is false) controls whether the observed information matrix (if set to true) or (expected) Fisher information matrix (if set to false) will be used for the identifiability analysis.
  • fim_options: (default is (;)) can be used to define the options for the FIM calculation only if observed is set to false. By default, the differential equation options diffeq_options will be taken from the fitted Pumas model fpm.
  • time_limit: (default = 120.0) time limit for the optimization solver
  • psd_tol: (default = 1e-6) maximum infeasibility allowed in the positive definiteness constraint.
  • prior_strength: (default = 2) prior relative strength to mimic fixing a parameter.
  • nsol: (default = 2) maximum number of unique solutions to be returned. Each solution is a combination of parameters to fix to make the model locally identifiable.
  • normalization_threshold: (default = 1e-3) when any diagonal element in the information matrix has an absolute value larger than normalization_threshold, this row and column will be scaled such that the diagonal is 1. This makes the identifiability analysis more robust to the scale of the parameters.
  • cannot_fix: (default = Int[]) list of parameter indices to avoid fixing in the identifiability analysis. The algorithm will either return a combination of other parameters to fix excluding the parameters whose indices are in cannot_fix or it will fail with if the model cannot be made identifiable while keeping those parameters free.
  • assume_fixed: (default = Int[]) list of parameter indices to force fixing in the identifiability analysis.
analyze_identifiability(
  fim::AbstractArray;
  time_limit::Float64 = 120.0,
  psd_tol::Float64 = 1e-6,
  mip_infeas_tol::Float64 = 1e-8,
  mip_rel_gap::Float64 = 1e-6,
  prior_strength::Int = 2,
  nsol::Int = 2,
  normalization_threshold::Float64 = 1e-3,
  cannot_fix::Vector{Int} = Int[],
  assume_fixed::Vector{Int} = Int[],
)

Analyzes the identifiability of the model using the input Fisher information matrix fim. The function then suggests different combinations of parameters to fix to make the model identifiable. These parameter combinations are found by finding the largest sub-matrices of the (scaled) information matrix that are invertible. The problem is formulated as a mixed integer semi-definite program (MISDP) and solved to global optimality. Unique solutions are then obtained by adding additional uniqueness constraints and resolving the MISDP.

The options available and their default values are shown below:

  • time_limit: (default = 120.0) time limit for the optimization solver
  • psd_tol: (default = 1e-6) maximum infeasibility allowed in the positive definiteness constraint.
  • prior_strength: (default = 2) prior relative strength to mimic fixing a parameter.
  • nsol: (default = 2) maximum number of unique solutions to be returned. Each solution is a combination of parameters to fix to make the model locally identifiable.
  • normalization_threshold: (default = 1e-3) when any diagonal element in the information matrix has an absolute value larger than normalization_threshold, this row and column will be scaled such that the diagonal is 1. This makes the identifiability analysis more robust to the scale of the parameters.
  • cannot_fix: (default = Int[]) list of parameter indices to avoid fixing in the identifiability analysis. The algorithm will either return a combination of other parameters to fix excluding the parameters whose indices are in cannot_fix or it will fail with if the model cannot be made identifiable while keeping those parameters free.
  • assume_fixed: (default = Int[]) list of parameter indices to force fixing in the identifiability analysis.
OptimalDesign.crlbFunction
crlb(model::Pumas.PumasModel, param::NamedTuple, res::OptimalSchedule)

For model, parameters param, and the optimal Fisher Information Matrix (FIM) in the optimization results res, compute the Cramer-Rao lower bounds on standard errors of parameter estimates. A DataFrame object is returned, containing the lower bounds on the standard errors and the relative standard errors.