# Inference

In Pumas, inference is performed after model fitting to obtain uncertainty estimates on the model's population-level parameters (fixed effects), which can the be used to derive standard errors and confidence intervals. This is accomplished by calling the `infer`

function on the object returned by `fit`

(an object of type `FittedPumasModel`

or `FittedPumasEMModel`

). The `infer`

function, described below, has several options to control how the inference is performed.

The general signature of the `infer`

function is:

`infer(fpm::AbstractFittedPumasModel[, type]; level=0.95, [,type_specific...])`

The function can be specialized to behave differently by setting the second (optional) argument `type`

to different values. Details of each specialization are given below.

The `infer`

function accepts the following arguments:

`fpm`

: a`FittedPumasModel`

or`FittedPumasEMModel`

, typically returned by`fit`

.`type`

: the second (optional) argument determines the inferential approach used. Details are given below.`level`

: for confidence intervals, the confidence level to use. Typically`0.95`

, corresponding to 95% confidence intervals.`...`

: further arguments, the meaning of which is dependent on the`type`

of inference. Details are given below.

The function returns an object of type `FittedPumasModelInference`

. Further functions can be called on this object, such as `stderror`

to obtain the standard errors.

We now describe in more detail the different inferential approaches that can be used by specifying the `type`

argument. The default is to use the asymptotic variance-covariance matrix obtained from the "robust" (also called "sandwich") estimator Halbert White (1982). Other options include:

### Inference from assymptotic variance-covariance matrices

This inferential approach is the default when no `type`

is specified in `infer`

, which then has the following signature:

`infer(fpm::FittedPumasModel; level=0.95, rethrow_error=false, sandwich_estimator=true)`

The additional optional arguments are:

`rethrow_error`

: whether to throw an error when it is not possible to compute the variance-covariance matrix (e.g., because the observed information matrix is not numerically positive definite). The default is to return a failed`FittedPumasModelInference`

object but not throw an error.`sandwich_estimator`

: whether or not to use the "robust" variance-covariance estimator Halbert White (1982). If`false`

, then the inverse of the observed Fisher information (i.e., negative inverse Hessian of the log-likelihood) is used.

Note that the variance-covariance matrix can also be obtained directly by calling the `vcov`

function directly on a `FittedPumasModel`

object.

Here is an example of obtaining confidence intervals based on robust standard errors:

`my_infer = infer(my_fit)`

Or, setting `sandwich_estimator`

to `false`

in order to not use the robust estimator:

`my_infer = infer(my_fit; sandwich_estimator=false)`

The `FittedPumasModelInference`

returned by `infer`

has a `show`

method that prints the estimates, standard errors and confidence intervals in the `REPL`

. To extract this information in a `DataFrame`

(e.g. for markdown reporting), the `coeftable`

function can be used:

`coeftable(my_infer) # Returns a DataFrame`

Inference using the robust variance-covariance estimator is consistent under heteroscedasticity of residuals, but relies on asymptotic normality of the maximum likelihood estimator. While this is often a reasonable approximation for the parameters associated with the mean of the response variable, it is often a poor approximation for the variance parameters. Furthermore, confidence intervals derived using this approach do not necessarily respects bounds on the parameters, and it is not unlikely for a parameter that by definition is strictly positive (such as a clearance, volume or variance) to have an associated confidence interval that contains negative values. For that reason, inference based on other approaches, such as `Pumas.Bootstrap`

and `Pumas.SIR`

, may be preferable in some situations.

### Bootstrap-based inference

Bootstrap-based inference is obtained by calling the function `Pumas.Bootstrap`

and passing the returned object of type `Boostrap`

as the `type`

argument of `infer`

. The resulting function signature is:

`infer(fpm::FittedPumasModel, type::Bootstrap; level=0.95)`

There are no additional keyword arguments to the `infer`

function, but `Pumas.Bootstrap`

has a number of optional arguments that can be used to control the bootstrap procedure. It has the following signature:

`Bootstrap(; rng=Random.default_rng, samples=200, stratify_by=nothing, ensemblealg=EnsembleThreads())`

The keyword arguments are as follows:

`rng`

: used to select a pseudo-random number generator to be used within the bootstrap procedure.`samples`

: the number of bootstrap replicates to generate.`stratify_by`

: covariate(s) to stratify by when resampling, specified as a single`Symbol`

or an array of`Symbol`

s. These must refer to covariates that were declared when the`Subject`

s were created, e.g. by`read_pumas`

. These should be discrete (categorical, not continuous) covariates. In general, the number of strata should not be too large, and each stratum should contain a sufficient number of individuals for resampling to make sense, though it is difficult to give specific guidance.`ensemblealg`

: A`BasicEnsembleAlgorithm`

to use for parallel execution. When running Pumas in a distributed setting, it can be beneficial to specify`EnsembleThreads`

when fitting the model and then`EnsembleDistributed`

for the bootstrap. Thereby, the different bootstrap samples will be distributed across Julia processes and each associated model fitting will benefit from multi-threaded execution.

The type of bootstrap procedure used is *nonparametric bootstrap*, also called *paired bootstrap* Bradley Efron (1979). Bootstrap datasets are obtained by resampling *individuals* from the original dataset with replacement (all records associated with the sampled individual are included), after stratifying by any covariates specified. The stratification ensures that the number of individuals in each stratum remains the same in each resampled dataset, which can help with the stability of the estimator. For example, if an analysis consists of two studies, a small study with rich sampling, and a large study with sparse sampling, stratifying on the study will ensure that each resampled dataset has the same number of subjects with rich sampling. The fitting algorithm is applied to each resampled dataset to obtain $N$ estimates for each parameter, where $N$ is the number of replicates specified in `samples`

. This collection is treated as a random draw from the sampling distribution of the estimator for the parameters, and a nonparametric 95% confidence interval is obtained from the 2.5th and 97.5th quantiles of this sample.

Inference based on the nonparametric bootstrap approach is often considered more reliable than that asymptotic variance-covariance matrix, but is computationally intensive. Unlike the latter approach, bounds on the parameters are respected (e.g. the confidence interval for a strictly positive parameter cannot contain negative values).

Here is an example of how to run a nonparametric bootstrap in Pumas using the default options:

`my_bootstrap = infer(my_fit, Pumas.Bootstrap())`

An here, an example with all options specified:

```
using Random
my_bootstrap = infer(my_fit, Pumas.Bootstrap(
rng = MersenneTwister(1234),
samples = 500,
stratify_by = [:sex, :trt],
ensemblealg = EnsembleThreads()))
```

The `coeftable`

function will work on the returned object, and the confidence interval and standard errors will be bootstrap-based.

The bootstrap requires a `FittedPumasModel`

; `FittedPumasEMModel`

support is coming in a future version of `Pumas`

.

### Sampling importance resampling (SIR)

As an alternative to the nonparametric bootstrap procedures (which can be computationally intensive or unfeasible with small sample sizes), Anne-Gaëlle Dosne, Martin Bergstrand, Kajsa Harling, Mats O Karlsson (2016) suggests the sampling importance resampling (SIR) procedure for estimating the uncertainty. This can be obtained by calling the function `Pumas.SIR`

and passing the returned object of type `SIR`

as the `type`

argument of `infer`

. The resulting function signature is:

`infer(fpm::FittedPumasModel, type::SIR; level=0.95, ensemblealg=EnsembleThreads())`

There is one additional keyword argument:

`ensemblealg`

: A`BasicEnsembleAlgorithm`

to use to control the mode of parallel execution used for evaluating the weights in the procedure.

The `Pumas.SIR`

accepts some arguments which control the procedure:

`SIR(; rng=Random.default_rng, samples, resamples)`

The keyword arguments are as follows:

`rng`

: used to select a pseudo-random number generator to be used within the SIR procedure.`samples`

: the number of samples to draw from the*proposal distribution*, a truncated multivariate normal distribution based on the maximum-likelihood estimates and robust variance-covariance matrix.`resamples`

: the number of values from the original sample to re-sample (by weighed re-sampling without replacement).

Both `samples`

and `resamples`

must be supplied, as there are no defaults. Additionally, it is required that `samples`

be greater than `resamples`

.

Note that the proposal distribution for the SIR procedure is based on the robust variance-covariance matrix and will therefore fail whenever the computation of the robust variance-covariance matrix fails.

Confidence intervals based on SIR respect bounds on the parameters (e.g. the confidence interval for a strictly positive parameter cannot contain negative values).

Here is an example of using how to use SIR in Pumas:

`my_sir = infer(my_fit, Pumas.SIR(samples=200, resamples=70))`

The `coeftable`

function will work on the returned object, and the confidence interval and standard errors will be based on SIR.

Note that if the robust variance-covariance matrix has already been computed by a call to `infer`

, it is possible to avoid recomputing it by calling `infer`

again with the output of the first call to `infer`

as the first argument, and the result from `Pumas.SIR`

as the second argument. For example:

```
my_inference = infer(fpm)
my_sir = infer(my_inference, Pumas.SIR(samples=300, resamples=100))
```

`Pumas.SIR`

requires a `FittedPumasModel`

; `FittedPumasEMModel`

support is coming in a future version of `Pumas`

.

## Extracting standard errors (`stderror`

)

The `stderror`

function is used to extract standard errors from either a `FittedPumasModel`

object returned by `fit`

or a `FittedPumasModelInference`

object returned by `infer`

:

```
stderror(fpm::FittedPumasModelInference)
stderror(fpm::FittedPumasModel)
```

When calling `stderror`

on a `FittedPumasModelInference`

object, the type of standard errors returned will depend of the options passed to `infer`

. For instance, if nonparametric bootstrap inference was performed, then the standard errors will be bootstrap-based.

Calling `stderror`

directly on a `AbstractFittedPumasModel`

object will result in using the default arguments to `infer`

, i.e. standard errors based on the robust variance-covariance matrix. The can be inefficient because it may result in the variance-covariance matrix being re-calculated unnecessarily; it is therefore recommended to not call `stderror`

directly on a `AbstractFittedPumasModel`

object, but to call `infer`

first and store the object, which is then passed to `stderror`

.