PumasModel-Error models

The error models in Pumas are formulated as the conditional distribution of the observed variable, $Y$, conditionally on the random effects, $η$. In the following, $μ = \mathrm{E}(Y|η)$ is the individual prediction. The individual prediction $μ$ generally depends on the random effect $η$, but the dependency will be suppressed in the notation for simplicity.

A broad range of distributions are supported and a subset of the more common ones is presented below. A small code example is presented for each of the error models presented as well as the conditional mean (individual prediction) and variance of each of the models. A summary of all error models is presented as a table towards the end of this page.

In Pumas, we don't differentiate between FOCE (First Order Conditional Estimation) and FOCEI (FOCE with interaction). We detect if it is necessary to include the interaction terms as part of our automatic differentiation instead.

Gaussian models

The Gaussian models are the traditional workhorse models for non-linear mixed effects models. In Pumas, the Gaussian models are defined by using the Normal structure which is parameterized with the mean and the standard deviation.

Additive

@derived begin
    y ~ @. Normal(μ, σ)
end

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = σ^2\]

The dispersion parameter of this model is $σ^2$.

Proportional

@derived begin
    y ~ @. Normal(μ, μ * σ)
end

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = (μσ)^2\]

The dispersion parameter of this model is $(μσ)^2$, i.e. a function of $η$ via $μ$.

In some cases when using numerical ODE integration, it can be difficult to ensure that $μ$ remains positive. In such cases, it usually helps to writeNormal(μ, abs(μ) * σ) or alternatively Normal(μ, μ * σ + 1e-10).

Combined (additive and proportional) without correlation

@derived begin
    y ~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2))
end

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = σ_{add}^2 + (μσ_{prop})^2\]

The dispersion parameter of this model is $σ_{add}^2 + (μσ_{prop})^2$, i.e. a function of $η$ via $μ$.

Combined (additive and proportional) with correlation

@derived begin
    y ~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2 + 2 * μ * σ_cov))
end

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = σ_{add}^2 + (μσ_{prop})^2 + 2μ\sigma_{cov}\]

The dispersion parameter of this model is $σ_{add}^2 + (μσ_{prop})^2 + 2μ\sigma_{cov}$, i.e. a function of $η$ via $μ$.

Log-normal (exponential)

@derived begin
    y ~ @. LogNormal(log(μ), σ)
end

\[\mathrm{E}(Y|η) = \mathrm{E}\left(\exp\left(\log(μ) + \frac{σ^2}{2}\right)|η\right) \approx μ \quad \mathrm{Var}(Y|η) \approx (μσ)^2\]

for small values of $σ$ so this model is very similar to the Proportional error model.

The dispersion parameter of this model is $σ^2$.

The log-normal model is equivalent to the following model

@derived begin
    logy ~ @. Normal(log(μ), σ)
end

where logy = log.(y). Notice that this is different from some other software packages where the log-normal model is equivalent to a proportional error model when using the first-order approximation methods.

It is possible to specify a LogNormal model where the second parameter is a function of the mean similarly to the proportional error model and use FOCE() as the approximation method.

@derived begin
    y ~ @. LogNormal(log(μ), μ * σ)
end

In the usual LogNormal model presented above the log-transformed ys have standard deviations that don't depend on the modeled mean. The approximate proportionality between the standard deviation of the natural level, y, and the modeled mean is a consequence of the exponentiation, i.e. the log-normal distribution. In the latter example, the log-transformed ys themselves follow a proportional error model. In this model, the coefficient of variation of the y is approximately proportional to the modeled mean.

Other continuous models

Student's $t$

For data with some extreme observations, it is sometimes useful to use an error model with heavier tails such as the Student's $t$ distribution. It is modeled by a, possibly shifted and scaled, one-parameter TDist(ν) distribution.

@derived begin
    y ~ @. μ + σ * TDist(ν)
end

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = σ^2 \quad \textrm{if } ν>2\]

Note that a Student's $t$ distributed model does not allow for FOCE estimation but requires LaplaceI().

Gamma

The Gamma model is traditionally used in the literature on generalized linear models, see [5], for data where the standard deviation is proportional to the mean which is the same property that the Gaussian proportional error model has.

In contrast to the Gaussian proportional error model, the dispersion parameter of the Gamma model is $ν^{-1}$ which does not depend on the mean parameter.

@derived begin
    y ~ @. Gamma(ν, μ / ν)
end

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = \frac{μ^2}{ν}\]

which shows how the Gamma model is very similar to the Gaussian proportional error model. They share the first two moments when setting $σ^2 = ν^{-1}$.

Models for discrete data

Bernoulli (logistic regression)

@derived begin
    y ~ @. Bernoulli(μ)
end

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = μ(1 - μ)\]

The dispersion parameter of this model is $1$.

Poisson

@derived begin
    y ~ @. Poisson(μ)
end

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = μ\]

The dispersion parameter of this model is $1$ which allows for estimation with the FOCE() approximation.

Negative binomial

Notice that this distribution has many parameterizations in the literature.

@derived begin
    p := n / (μ + n)
    y ~ @. NegativeBinomial(n, p)
end

where $p$ is the probability of success and $n$ is the size (or dispersion) parameter.

\[\mathrm{E}(Y|η) = μ \quad \mathrm{Var}(Y|η) = \frac{μ(μ + n)}{n}\]

Categorical

Categorical error models can be used for modeling unordered (nominal) and ordered (ordinal) data. The categorical error model is slightly different from other discrete error model in that the magnitude of the observed variable is irrelevant. For conditional mean $\mathrm{E}(Y|η)$ is not defined for the categorical error model, and instead we consider the probability $P(Y=y|η)$ of each of the outcomes $y \in 1,...,k$. This can be formulated in Pumas as

@derived begin
    y ~ @. Categorical(p₁, p₂, p₃)
end

where the pᵢ parameters should sum to one. This model can be estimated with the FOCE() estimation methods. Once the fit has been obtained, it is possible to get a table with outcome probabilities using the probstable function. If the fit results are stored in result the table is generated by running probstable(result).

Models for incomplete data

The two most common models for incomplete data with a known truncation points are the truncated and censored Gaussian models, see [6]. These kinds of models were introduced in pharmacometrics in [7] for handling data below the limit of quantification. The paper used the name M2 for the truncated model and M3 for the censored model.

A common assumption for the two models below is that the core interest is in the latent variable and not in the truncated or censored version of the variable. In consequence, predictions and weighted residuals are computed for the latent variable, not the truncated/censored variable. The residuals associated with censored observations are computed but might not be easily interpreted.

Currently, it is not possible to fit the models for incomplete data using the FOCE approximation.

Truncated (M2)

In the truncated model, all observations outside the upper and lower truncation points $u$ and $l$ are omitted and the probability mass between the truncation points is adjusted accordingly. It is assumed that the data can be modeled with some latent variable $Y^*$ but that only the variable $Y$ is observed where

\[Y = \begin{cases} Y^* \quad &\textrm{if } l \leqslant Y^* \leqslant u \\ missing \quad &\textrm{if } Y^* < l \vee Y^* > u \end{cases}\]

When fitting data to this model all incomplete data records should be either removed or coded as missing

@derived begin
    y ~ @. truncated_latent(Normal(μ, σ); lower = l, upper = u)
end

If l and u are numbers defined outside of the model the keyword version above can be used. This allows Pumas to optimize for the case where lower or upper is omitted. Cases where this is possible includes analyses with only one study or assay used. Then the limits are probably the same for all subjects. However, if a bound is given as a covariate value because a better assay was introduced in later studies we have to use positional arguments. Both limits should then be specified even if they are infinite.

@derived begin
    y ~ @. truncated_latent(Normal(μ, σ), l, u)
end

The reason for the differences in ways of specifying the limits is because broadcasting (the @. in the code) does not work for vector valued keyword inputs.

As the name suggests, this does not describe a model where simulated quantities will themselves be truncated. Instead, simulations will come from the underlying model. Though in estimation, the log-likelihood contributions will take into consideration that observations outside the given interval is missing from the dataset.

\[\mathrm{E}(Y^*|η) = μ \quad \mathrm{Var}(Y^*|η) = σ^2\]

Notice, the mean and variance are computed for the latent variable $Y^*$.

The FOCE() estimation method is not applicable for this error model. For random effects models based on a truncated error model, the LaplaceI() estimation method should be used.

Censored (M3)

In the censored model, the number of censored observations is assumed to be known. Hence, instead of simply removed probability mass associated with observations outside the truncation points $u$ and $l$, the censored model introduces positive probability mass at the truncation points. It is assumed that the data can be modeled with latent variable $Y^*$ but that only the variable $Y$ is observed where

\[Y = \begin{cases} u\quad &\textrm{if } Y^* \geqslant u \\ Y^* \quad &\textrm{if } l < Y^* < u \\ l \quad &\textrm{if } Y^* \leqslant l \end{cases}\]

Hence, incomplete data records should be encoded with a value below (above) or equal to the lower (upper) truncation point.

@derived begin
    y ~ @. censored_latent(Normal(μ, σ), lower = l, upper = u)
end

As described above for the truncated (M2) case, positional arguments have to be used if the limits are coming from the model.

Again, the name suggests an important feature. We do not expect that the modelled quantity cannot be below l or above u in the subject. Rather, we just fail to observe the exact value when the value is below the Lower Limit of Quantification (LLOQ) or above the upper limit. In estimation we take advantage of this information, but in simulations the latent variable will be simulated and returned.

\[\mathrm{E}(Y^*|η) = μ \quad \mathrm{Var}(Y^*|η) = σ^2\]

Notice, the mean and variance are computed for the latent variable $Y^*$.

The FOCE() estimation method is not applicable for this error model. For random effects models based on a censored error model, the LaplaceI() estimation method should be used.

Summary of Error Models

Data	Model	Derived Block Expression	Dispersion	Approximation Method
Continuous
	Additive	`y ~ @. Normal(μ, σ)`	`σ^2`	`FOCE()`
	Proportional	`y ~ @. Normal(μ, μ * σ)`	`(μσ)^2`	`FOCE()`
	Combined (no correlation)	`y ~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2))`	`σ_add^2 + (μ * σ_prop)^2`	`FOCE()`
	Combined (with correlation)	`y ~ @. Normal(μ, sqrt(σ_add^2 + (μ * σ_prop)^2 + 2 * μ * σ_cov))`	`σ_add^2 + (μ * σ_prop)^2 + σ_add^2 + (μ * σ_prop)^2 + 2 * μ * σ_cov`	`FOCE()`
	Log-Normal	`y ~ @. LogNormal(log(μ), σ)`	`σ^2`	`FOCE()`
	Log-Normal	`logy ~ @. Normal(log(μ), σ)`	`σ^2`	`FOCE()`
	Log-Normal (proportional)	`y ~ @. LogNormal(log(μ), μ * σ)`	`(μσ)^2`	`FOCE()`
	Log-Normal (proportional)	`logy ~ @. Normal(log(μ), μ * σ)`	`(μσ)^2`	`FOCE()`
	Student's T	`y ~ @. μ + σ * TDist(ν)`		`LaplaceI()`
	Gamma	`y ~ @. Gamma(ν, μ/ν)`	`ν^-1`	`FOCE()`
Discrete
	Bernoulli	`y ~ @. Bernoulli(μ)`	`1`	`FOCE()`
	Poisson	`y ~ @. Poisson(μ)`	`1`	`FOCE()`
	Negative Binomial	`p := n/(μ + n)<br> y ~ @. NegativeBinomial(n, p)`	`n`	`FOCE()`
	Categorical	`y ~ @. Categorical(p₁, p₂, p₃)`		`FOCE()`
Incomplete Data
	Truncated (M2)	`y ~ @. truncated(Normal(μ, σ); lower=l, upper=u)`		`LaplaceI()`
	Censored (M3)	`y ~ @. censored(Normal(μ, σ); lower, upper)`		`LaplaceI()`