# Estimating Parameters of Pumas Models

Pumas can use the observational data of a `Subject`

or `Population`

to estimate the parameters in many types of models. This is done by two classes of methods. First, maximum likelihood methods find the parameters such that the observational data has the highest probability of occurring according to the chosen error distributions. Second, bayesian methods find a posterior probability distribution for the parameters to describe the chance that a parameter has a given value given the data. The following section describes how to fit an NLME model in Pumas via the two methods.

```
- Maximum likelihood methods find the parameters such that the observational
data has the highest probability of occurring according to the chosen error
distributions.
- Bayesian methods find a posterior probability distribution for
the parameters to describe the chance that a parameter has a given value given
the data.
- Maximum A Posteriori (MAP) can be considered as something in between maximum
likelihood estimation and Bayesian inference. It incorporates priors of some or
all variables and can be interpreted as penalized maximum likelihood or finding
the mode of the posterior distribution in a Bayesian context.
```

The following sections describe how to fit a NLME model in Pumas via the two methods.

## Defining Data for Estimation

The observed data should be parsed using the same names as those found in the model. For example, if `subject.observations`

is a `NamedTuple`

with names `dv`

and `resp`

, the `derived`

block or function in the `PumasModel`

should define distributions with matching names. Similarly, the `error`

block should define distrubtions with matching named if using a `PumasEMModel`

instead. If `dv`

is a scalar in the observation data, then `dv`

from `derived`

should also be a scalar. Likewise, if `dv`

is an array like a time series, then `dv`

should be a size-matching time series when returned from `derived`

. The likelihood of observing multiple dependent variables is calculated under the assumtion of independence between the two.

## Maximum Likelihood Estimation

Maximum Likelihood Estimation (MLE) is performed using the `fit`

function. This function's signature is:

```
Distributions.fit(model::AbstractPumasModel,
data::Population,
param::NamedTuple,
approx::LikelihoodApproximation;
optimize_fn = DEFAULT_OPTIMIZE_FN,
constantcoef::NamedTuple = NamedTuple(),
omegas::Tuple = tuple(),
ensemblealg::DiffEqBase.EnsembleAlgorithm = EnsembleSerial(),
checkidentification=true,
diffeq_options)
```

Fit the Pumas model `model`

to the dataset `data`

with starting values `param`

using the estimation method `approx`

. Currently supported values for the `approx`

argument are `FO`

, `FOCE`

, `LaplaceI`

, `NaivePooled`

, and `BayesMCMC`

. See the bottom of this page for how to do Maximum A Posteriori (MAP) estimation based on these likelihood approximations. Example fit syntax

`myfit = fit(mymodel, drug_x_population, drug_x_parameters, Pumas.FOCE())`

Note that the `approx`

has to be qualified with `Pumas`

and has to end with the empty parenthesis.

The argument `optimize_fn`

is used for optimizing the objective function for all `approx`

methods except `BayesMCMC`

. The default optimization function uses the quasi-Newton routine `BFGS`

method from the `Optim`

package. Optimization specific arguments can be passed to `DefaultOptimizeFN`

, e.g. the optimization trace can be disabled by passing `DefaultOptimizeFN(show_trace=false)`

. See `Optim`

for more details. Example syntax

```
myfit = fit(mymodel, drug_x_population, drug_x_parameters, Pumas.FOCE(),
optimize_fn=DefaultOptimizeFN(show_trace=false))
```

If `model`

is a `PumasModel`

, it is possible to fix It is possible to fix one or more parameters of the fit by passing a `NamedTuple`

as the `constantcoef`

argument with keys and values corresponding to the names and values of the fixed parameters, e.g. `constantcoef=(σ=0.1,)`

. Example syntax

```
myfit = fit(mymodel, drug_x_population, drug_x_parameters, Pumas.FOCE(),
constantcoef = (tvcl = 4, ),
optimize_fn=DefaultOptimizeFN(show_trace=false))
```

When models include an `@random`

block and fitting with `NaivePooled`

is requested, it is required that the user supplies the names of the parameters of the random effects as the `omegas`

argument such that these can be ignored in the optimization, e.g. `omegas=(Ω,)`

. Example syntax

```
myfit = fit(mymodel, drug_x_population, drug_x_parameters, Pumas.FOCEI(),
omegas=(Ω,),
optimize_fn=DefaultOptimizeFN(show_trace=false))
```

Parallelization of the optimization is supported for most estimation methods via the ensemble interface of DifferentialEquations.jl. The default is `EnsembleThreads()`

. This is currently the only supported parallelization for model fitting. It is also possible to ask for a non-parallelized fit by setting the keyword to `EnsembleSerial()`

. Example fit syntax for this is

```
myfit = fit(mymodel, drug_x_population, drug_x_parameters, Pumas.FOCEI(),
ensemblealg = EnsembleSerial())
```

The `fit`

function will check if any gradients and throw an exception if any of the elements are exactly zero unless `checkidentification`

is set to `false`

. `checkidentification`

currently requires using a `PumasModel`

.

Keyword arguments that control the differential equation solver can be passed via the `diffeq_options`

argument. This accepts a `NamedTuple`

with arguments to the differential equations solver such as `alg`

, `abstol`

, and `reltol`

. The default values for these are `AutoVern7(Rodas5())`

, `1e-12`

, and `1e-8`

respectively. See the DifferentialEquations.jl documentation for more details.

The return type of `fit`

is a `FittedPumasModel`

.

### Marginal Likelihood Approximations

The following choices are available for the likelihood approximations:

`FO()`

: first order approximation.`FOCE()`

: first order conditional estimation (with automatic detection for when to include the interaction terms).`LaplaceI()`

: second order Laplace approximation with interaction.

Only `LaplceI()`

is supported when using a `PumasEMModel`

.

### FittedPumasModel

The relevant fields of a `FittedPumasModel`

are:

`model`

: the`model`

used in the estimation process.`data`

: the`Population`

that was estimated.`optim`

: the result returned by the optimizer`approx`

: the marginal likelihood approximation that was used.`param`

: the optimal parameters.

## Bayesian Estimation

Bayesian parameter estimation is performed by using the `fit`

function as follows:

```
Distributions.fit(
model::PumasModel,
data::Population,
param::NamedTuple,
::BayesMCMC;
nadapts::Integer=2000,
nsamples::Integer=10000,
progress = Base.is_interactive,
kwargs...
)
```

We use a `NUTS`

sampler with the generalized no-U-turn termination criterion and multinomial sampling on Hamiltonian system

whose kinetic energy is specified with a diagonal metric (diagonal matrix with positive diagonal entries). For numerical intergation of the Hamiltonian system, we use the ordinary leapfrog integrator. The adaptation is done

using the Stan’s windowed adaptation routine with a target acceptance ratio of `0.8`

.

The arguments are:

`model`

: a`PumasModel`

, either defined by the`@model`

DSL or the function-based interface.`data`

: a`Population`

.`param`

: a named tuple of parameters. Used as the initial condition for the sampler.- The
`approx`

must be`BayesMCMC()`

. `nsamples`

determines the number of samples taken along each chain.- Extra
`args`

and`kwargs`

are passed on to the internal`simobs`

call and thus control the behavior of the differential equation solvers.

The result is a `BayesMCMCResults`

type.

### BayesMCMCResults

The MCMC chain is stored in the `chain`

field of the returned `BayesMCMCResults`

. Additionally the result can be converted into a `Chains`

object from MCMCChains.jl, allowing utlilization of diagnostics and visualization tooling. This is discussed further in the Bayesian Estimation tutorial.

The following functions help with querying common results on the Bayesian posterior:

`param_mean(br)`

: returns a named tuple of parameters which represents the mean of each parameter's posterior distribution`param_var(br)`

: returns a named tuple of parameters which represents the variance of each parameter's posterior distribution`param_std(br)`

: returns a named tuple of parameters which represents the standard deviation of each parameter's posterior distribution

## Maximum A Posteriori (MAP) Estimation

MAP estimation is carried out using the `fit`

function and by wrapping the chosen likelihood approximation in the `MAP`

type. The `fit`

signature is the same as for maximum likelihood. If we have a model (`model`

), a `Population`

(`data`

), and an initial parameter specification (`param`

) we can carry out the MAP estimation with the FOCE approximation of the log-likelihood using

`result = fit(model, data, param, Pumas.MAP(Pumas.FOCE()))`

The usual post-processing and diagnostics are available and are calculated according to the approximation wrapped in `MAP`

.