# PumasModel-Error models

The error models in Pumas are formulated as the conditional distribution of the observed variable, $Y$, conditionally on the random effects, $η$. In the following, $μ = \mathrm{E}(Y|η)$ is the individual prediction. The individual prediction $μ$ generally depends on the random effect $η$, but the dependency will be suppressed in the notation for simplicity.

A broad range of distributions are supported and a subset of the more common ones is presented below. A small code example is presented for each of the error models presented as well as the conditional mean (individual prediction) and variance of each of the models. A summary of all error models is presented as a table towards the end of this page.

In Pumas, we don't differentiate between FOCE (First Order Conditional Estimation) and FOCEI (FOCE with interaction). We detect if it is necessary to include the interaction terms as part of our automatic differentiation instead.

## Gaussian models

The Gaussian models are the traditional workhorse models for non-linear mixed effects models. In Pumas, the Gaussian models are defined by using the `Normal`

structure which is parameterized with the mean and the standard deviation.

### Additive

```
@derived begin
y ~ @. Normal(μ, σ)
end
```

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = σ^2\]

The dispersion parameter of this model is $σ^2$.

### Proportional

```
@derived begin
y ~ @. Normal(μ, μ * σ)
end
```

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = (μσ)^2\]

The dispersion parameter of this model is $(μσ)^2$, i.e. a function of $η$ via $μ$.

In some cases when using numerical ODE integration, it can be difficult to ensure that $μ$ remains positive. In such cases, it usually helps to write`Normal(μ, abs(μ) * σ)`

or alternatively `Normal(μ, μ * σ + 1e-10)`

.

### Combined (additive and proportional) *without* correlation

```
@derived begin
y ~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2))
end
```

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = σ_{add}^2 + (μσ_{prop})^2\]

The dispersion parameter of this model is $σ_{add}^2 + (μσ_{prop})^2$, i.e. a function of $η$ via $μ$.

### Combined (additive and proportional) *with* correlation

```
@derived begin
y ~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2 + 2 * μ * σ_cov))
end
```

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = σ_{add}^2 + (μσ_{prop})^2 + 2μ\sigma_{cov}\]

The dispersion parameter of this model is $σ_{add}^2 + (μσ_{prop})^2 + 2μ\sigma_{cov}$, i.e. a function of $η$ via $μ$.

### Log-normal (exponential)

```
@derived begin
y ~ @. LogNormal(log(μ), σ)
end
```

\[\mathrm{E}(Y|η) = \mathrm{E}\left(\exp\left(\log(μ) + \frac{σ^2}{2}\right)|η\right) \approx μ \quad \mathrm{Var}(Y|η) \approx (μσ)^2\]

for small values of $σ$ so this model is very similar to the Proportional error model.

The dispersion parameter of this model is $σ^2$.

The log-normal model is equivalent to the following model

```
@derived begin
logy ~ @. Normal(log(μ), σ)
end
```

where `logy = log.(y)`

. Notice that this is different from some other software packages where the log-normal model is equivalent to a proportional error model when using the first-order approximation methods.

## Other continuous models

### Student's $t$

For data with some extreme observations, it is sometimes useful to use an error model with heavier tails such that Student's $t$ distribution. The `LocationScale`

structure is used to allow for location shift and scaling of one-parameter `TDist(ν)`

distribution type.

```
@derived begin
y ~ @. LocationScale(μ, σ, TDist(ν))
end
```

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = σ^2 \quad \textrm{if } ν>2\]

The Student's t distributed model does not allow for FOCE estimation but requires `LaplaceI()`

.

### Gamma

The Gamma model is traditionally used in the literature on generalized linear models, see [5], for data where the standard deviation is proportional to the mean which is the same property that the Gaussian proportional error model has.

In contrast to the Gaussian proportional error model, the dispersion parameter of the Gamma model is $ν^{-1}$ which does not depend on the mean parameter.

```
@derived begin
y ~ @. Gamma(ν, μ / ν)
end
```

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = \frac{μ^2}{ν}\]

which shows how the Gamma model is very similar to the Gaussian proportional error model. They share the first two moments when setting $σ^2 = ν^{-1}$.

## Models for discrete data

### Bernoulli (logistic regression)

```
@derived begin
y ~ @. Bernoulli(μ)
end
```

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = μ(1 - μ)\]

The dispersion parameter of this model is $1$.

### Poisson

```
@derived begin
y ~ @. Poisson(μ)
end
```

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = μ\]

The dispersion parameter of this model is $1$ which allows for estimation with the `FOCE()`

approximation.

### Negative binomial

Notice that this distribution has many parameterizations in the literature.

```
@derived begin
p := n / (μ + n)
y ~ @. NegativeBinomial(n, p)
end
```

where $p$ is the probability of success and $n$ is the size (or dispersion) parameter.

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = \frac{μ(μ + n)}{n}\]

### Categorical

Categorical error models can be used for modeling unordered (nominal) and ordered (ordinal) data. The categorical error model is slightly different from other discrete error model in that the magnitude of the observed variable is irrelevant. For conditional mean $\mathrm{E}(Y|η)$ is not defined for the categorical error model, and instead we consider the probability $P(Y=y|η)$ of each of the outcomes $y \in 1,...,k$. This can be formulated in Pumas as

```
@derived begin
y ~ @. Categorical(p₁, p₂, p₃)
end
```

where the `pᵢ`

parameters should sum to one. This model can be estimated with the `FOCE()`

estimation methods. Once the fit has been obtained, it is possible to get a table with outcome probabilities using the `probstable`

function. If the fit results are stored in `result`

the table is generated by running `probstable(result)`

.

## Models for incomplete data

The two most common models for incomplete data with a known truncation points are the truncated and censored Gaussian models, see [6]. These kinds of models were introduced in pharmacometrics in [7] for handling data below the limit of quantification. The paper used the name *M2* for the truncated model and *M3* for the censored model.

A common assumption for the two models below is that the core interest is in the latent variable and not in the truncated or censored version of the variable. In consequence, predictions and weighted residuals are computed for the latent variable, not the truncated/censored variable. The residuals associated with censored observations are computed but might not be easily interpreted.

Currently, it is not possible to fit the models for incomplete data using the `FOCE`

approximation.

### Truncated (M2)

In the truncated model, all observations outside the upper and lower truncation points $u$ and $l$ are omitted and the probability mass between the truncation points is adjusted accordingly. It is assumed that the data can be modeled with some latent variable $Y^*$ but that only the variable $Y$ is observed where

\[Y = \begin{cases} Y^* \quad &\textrm{if } l \leqslant Y^* \leqslant u \\ missing \quad &\textrm{if } Y^* < l \vee Y^* > u \end{cases}\]

When fitting data to this model all incomplete data records should be either removed or coded as `missing`

```
@derived begin
y ~ @. truncated(Normal(μ, σ); lower = l, upper = u)
end
```

\[\mathrm{E}(Y^*|η) = μ \quad \mathrm{Var}(Y^*|η) = σ^2\]

Notice, the mean and variance are computed for the latent variable $Y^*$.

The `FOCE()`

estimation method is not applicable for this error model. For random effects models based on a truncated error model, the `LaplaceI()`

estimation method should be used.

### Censored (M3)

In the censored model, the number of censored observations is assumed to be known. Hence, instead of simply removed probability mass associated with observations outside the truncation points $u$ and $l$, the censored model introduces positive probability mass at the truncation points. It is assumed that the data can be modeled with latent variable $Y^*$ but that only the variable $Y$ is observed where

\[Y = \begin{cases} u\quad &\textrm{if } Y^* \geqslant u \\ Y^* \quad &\textrm{if } l < Y^* < u \\ l \quad &\textrm{if } Y^* \leqslant l \end{cases}\]

Hence, incomplete data records should be encoded with a value below (above) or equal to the lower (upper) truncation point.

```
@derived begin
y ~ @. Censored(Normal(μ, σ), l, u)
end
```

\[\mathrm{E}(Y^*|η) = μ \quad \mathrm{Var}(Y^*|η) = σ^2\]

Notice, the mean and variance are computed for the latent variable $Y^*$.

The `FOCE()`

estimation method is not applicable for this error model. For random effects models based on a `Censored`

error model, the `LaplaceI()`

estimation method should be used.

## Summary of Error Models

Data | Model | Derived Block Expression | Dispersion | Approximation Method |
---|---|---|---|---|

Continuous | ||||

Additive | `y ~ @. Normal(μ, σ)` | `σ^2` | `FOCE()` | |

Proportional | `y ~ @. Normal(μ, μ * σ)` | `(μσ)^2` | `FOCE()` | |

Combined (no correlation) | `y ~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2))` | `σ_add^2 + (μ * σ_prop)^2` | `FOCE()` | |

Combined (with correlation) | `y ~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2 + 2 * μ * σ_cov))` | `σ_add^2 + (μ * σ_prop)^2 + σ_add^2 + (μ * σ_prop)^2 + 2 * μ * σ_cov` | `FOCE()` | |

Log-Normal | `y ~ @. LogNormal(log(μ), σ)` | `σ^2` | `FOCE()` | |

Log-Normal | `logy ~ @. Normal(log(μ), σ)` | `σ^2` | `FOCE()` | |

Student's T | `y ~ @. LocationScale(μ, σ, TDist(ν))` | `LaplaceI()` | ||

Gamma | `y ~ @. Gamma(ν, μ/ν)` | `ν^-1` | `FOCE()` | |

Discrete | ||||

Bernoulli | `y ~ @. Bernoulli(μ)` | `1` | `FOCE()` | |

Poisson | `y ~ @. Poisson(μ)` | `1` | `FOCE()` | |

Negative Binomial | `p := n/(μ + n)<br> y ~ @. NegativeBinomial(n, p)` | `n` | `FOCE()` | |

Categorical | `y ~ @. Categorical(p₁, p₂, p₃)` | `FOCE()` | ||

Incomplete Data | ||||

Truncated (M2) | `y ~ @. truncated(Normal(μ, σ); lower=l, upper=u)` | `LaplaceI()` | ||

Censored (M3) | `y ~ @. Censored(Normal(μ, σ), lower, upper)` | `LaplaceI()` |