# PumasModel-Error models

The error models in Pumas are formulated as the conditional distribution of the observed variable, $Y$, conditionally on the random effects, $η$. In the following, $μ = \mathrm{E}(Y|η)$ is the individual prediction. The individual prediction $μ$ generally depends on the random effect $η$, but the dependency will be suppressed in the notation for simplicity.

A broad range of distributions are supported and a subset of the more common ones is presented below. A small code example is presented for each of the error models presented as well as the conditional mean (individual prediction) and variance of each of the models. A summary of all error models is presented as a table towards the end of this page.

In Pumas, we don't differentiate between FOCE (First Order Conditional Estimation) and FOCEI (FOCE with interaction). We detect if it is necessary to include the interaction terms as part of our automatic differentiation instead.

## Gaussian models

The Gaussian models are the traditional workhorse models for non-linear mixed effects models. In Pumas, the Gaussian models are defined by using the Normal structure which is parameterized with the mean and the standard deviation.

@derived begin
y ~ @. Normal(μ, σ)
end
$$$\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = σ^2$$$

The dispersion parameter of this model is $σ^2$.

### Proportional

@derived begin
y ~ @. Normal(μ, μ * σ)
end
$$$\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = (μσ)^2$$$

The dispersion parameter of this model is $(μσ)^2$, i.e. a function of $η$ via $μ$.

In some cases when using numerical ODE integration, it can be difficult to ensure that $μ$ remains positive. In such cases, it usually helps to writeNormal(μ, abs(μ) * σ) or alternatively Normal(μ, μ * σ + 1e-10).

### Combined (additive and proportional) without correlation

@derived begin
y ~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2))
end
$$$\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = σ_{add}^2 + (μσ_{prop})^2$$$

The dispersion parameter of this model is $σ_{add}^2 + (μσ_{prop})^2$, i.e. a function of $η$ via $μ$.

### Combined (additive and proportional) with correlation

@derived begin
y ~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2 + 2 * μ * σ_cov))
end
$$$\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = σ_{add}^2 + (μσ_{prop})^2 + 2μ\sigma_{cov}$$$

The dispersion parameter of this model is $σ_{add}^2 + (μσ_{prop})^2 + 2μ\sigma_{cov}$, i.e. a function of $η$ via $μ$.

### Log-normal (exponential)

@derived begin
y ~ @. LogNormal(log(μ), σ)
end
$$$\mathrm{E}(Y|η) = \mathrm{E}\left(\exp\left(\log(μ) + \frac{σ^2}{2}\right)|η\right) \approx μ \quad \mathrm{Var}(Y|η) \approx (μσ)^2$$$

for small values of $σ$ so this model is very similar to the Proportional error model.

The dispersion parameter of this model is $σ^2$.

The log-normal model is equivalent to the following model

@derived begin
logy ~ @. Normal(log(μ), σ)
end

where logy = log.(y). Notice that this is different from some other software packages where the log-normal model is equivalent to a proportional error model when using the first-order approximation methods.

## Other continuous models

### Student's $t$

For data with some extreme observations, it is sometimes useful to use an error model with heavier tails such that Student's $t$ distribution. The LocationScale structure is used to allow for location shift and scaling of one-parameter TDist(ν) distribution type.

@derived begin
y ~ @. LocationScale(μ, σ, TDist(ν))
end
$$$\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = σ^2 \quad \textrm{if } ν>2$$$

The Student's t distributed model does not allow for FOCE estimation but requires LaplaceI().

### Gamma

The Gamma model is traditionally used in the literature on generalized linear models, see [5], for data where the standard deviation is proportional to the mean which is the same property that the Gaussian proportional error model has.

In contrast to the Gaussian proportional error model, the dispersion parameter of the Gamma model is $ν^{-1}$ which does not depend on the mean parameter.

@derived begin
y ~ @. Gamma(ν, μ / ν)
end
$$$\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = \frac{μ^2}{ν}$$$

which shows how the Gamma model is very similar to the Gaussian proportional error model. They share the first two moments when setting $σ^2 = ν^{-1}$.

## Models for discrete data

### Bernoulli (logistic regression)

@derived begin
y ~ @. Bernoulli(μ)
end
$$$\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = μ(1 - μ)$$$

The dispersion parameter of this model is $1$.

### Poisson

@derived begin
y ~ @. Poisson(μ)
end
$$$\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = μ$$$

The dispersion parameter of this model is $1$ which allows for estimation with the FOCE() approximation.

### Negative binomial

Notice that this distribution has many parameterizations in the literature.

@derived begin
p := n / (μ + n)
y ~ @. NegativeBinomial(n, p)
end

where $p$ is the probability of success and $n$ is the size (or dispersion) parameter.

$$$\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = \frac{μ(μ + n)}{n}$$$

### Categorical

Categorical error models can be used for modeling unordered (nominal) and ordered (ordinal) data. The categorical error model is slightly different from other discrete error model in that the magnitude of the observed variable is irrelevant. For conditional mean $\mathrm{E}(Y|η)$ is not defined for the categorical error model, and instead we consider the probability $P(Y=y|η)$ of each of the outcomes $y \in 1,...,k$. This can be formulated in Pumas as

@derived begin
y ~ @. Categorical(p₁, p₂, p₃)
end

where the pᵢ parameters should sum to one. This model can be estimated with the FOCE() estimation methods. Once the fit has been obtained, it is possible to get a table with outcome probabilities using the probstable function. If the fit results are stored in result the table is generated by running probstable(result).

## Models for incomplete data

The two most common models for incomplete data with a known truncation points are the truncated and censored Gaussian models, see [6]. These kinds of models were introduced in pharmacometrics in [7] for handling data below the limit of quantification. The paper used the name M2 for the truncated model and M3 for the censored model.

A common assumption for the two models below is that the core interest is in the latent variable and not in the truncated or censored version of the variable. In consequence, predictions and weighted residuals are computed for the latent variable, not the truncated/censored variable. The residuals associated with censored observations are computed but might not be easily interpreted.

Currently, it is not possible to fit the models for incomplete data using the FOCE approximation.

### Truncated (M2)

In the truncated model, all observations outside the upper and lower truncation points $u$ and $l$ are omitted and the probability mass between the truncation points is adjusted accordingly. It is assumed that the data can be modeled with some latent variable $Y^*$ but that only the variable $Y$ is observed where

$$$Y = \begin{cases} Y^* \quad &\textrm{if } l \leqslant Y^* \leqslant u \\ missing \quad &\textrm{if } Y^* < l \vee Y^* > u \end{cases}$$$

When fitting data to this model all incomplete data records should be either removed or coded as missing

@derived begin
y ~ @. truncated(Normal(μ, σ); lower = l, upper = u)
end
$$$\mathrm{E}(Y^*|η) = μ \quad \mathrm{Var}(Y^*|η) = σ^2$$$

Notice, the mean and variance are computed for the latent variable $Y^*$.

The FOCE() estimation method is not applicable for this error model. For random effects models based on a truncated error model, the LaplaceI() estimation method should be used.

### Censored (M3)

In the censored model, the number of censored observations is assumed to be known. Hence, instead of simply removed probability mass associated with observations outside the truncation points $u$ and $l$, the censored model introduces positive probability mass at the truncation points. It is assumed that the data can be modeled with latent variable $Y^*$ but that only the variable $Y$ is observed where

$$$Y = \begin{cases} u\quad &\textrm{if } Y^* \geqslant u \\ Y^* \quad &\textrm{if } l < Y^* < u \\ l \quad &\textrm{if } Y^* \leqslant l \end{cases}$$$

Hence, incomplete data records should be encoded with a value below (above) or equal to the lower (upper) truncation point.

@derived begin
y ~ @. Censored(Normal(μ, σ), l, u)
end
$$$\mathrm{E}(Y^*|η) = μ \quad \mathrm{Var}(Y^*|η) = σ^2$$$

Notice, the mean and variance are computed for the latent variable $Y^*$.

The FOCE() estimation method is not applicable for this error model. For random effects models based on a Censored error model, the LaplaceI() estimation method should be used.

## Summary of Error Models

DataModelDerived Block ExpressionDispersionApproximation Method
Continuous
Additivey ~ @. Normal(μ, σ)σ^2FOCE()
Proportionaly ~ @. Normal(μ, μ * σ)(μσ)^2FOCE()
Combined (no correlation)y ~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2))σ_add^2 + (μ * σ_prop)^2FOCE()
Combined (with correlation)y ~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2 + 2 * μ * σ_cov))σ_add^2 + (μ * σ_prop)^2 + σ_add^2 + (μ * σ_prop)^2 + 2 * μ * σ_covFOCE()
Log-Normaly ~ @. LogNormal(log(μ), σ)σ^2FOCE()
Log-Normallogy ~ @. Normal(log(μ), σ)σ^2FOCE()
Student's Ty ~ @. LocationScale(μ, σ, TDist(ν))LaplaceI()
Gammay ~ @. Gamma(ν, μ/ν)ν^-1FOCE()
Discrete
Bernoulliy ~ @. Bernoulli(μ)1FOCE()
Poissony ~ @. Poisson(μ)1FOCE()
Negative Binomialp := n/(μ + n)<br> y ~ @. NegativeBinomial(n, p)nFOCE()
Categoricaly ~ @. Categorical(p₁, p₂, p₃)FOCE()
Incomplete Data
Truncated (M2)y ~ @. truncated(Normal(μ, σ); lower=l, upper=u)LaplaceI()
Censored (M3)y ~ @. Censored(Normal(μ, σ), lower, upper)LaplaceI()