Dosage Regimens, Subjects, and Populations

In Pumas, subjects are represented by the Subject type and collections of subjects are represented as Vectors of Subjects and aliased as a Population. Subjects are defined by their identifier, observations, covariates, and events. In this section we will specify the methods used for defining Subjects programmatically or using the read_pumas function that reads in data that follows the Pumas NLME Data Format (PumasNDF) data format. Before we look at Subjects, we will take a look at how to define events as represented by the DosageRegimen type.

Dosage Regimen Terminology

When subjects are subjected to treatment it is represented by an event in Pumas. Administration of a drug is represented by a DosageRegimen that describes the amount, type, frequency and route. DosageRegimens can either be constructed programmatically using the DosageRegimen constructor or from a data source in the PumasNDF format using read_pumas. The names of the inputs are the same independent of how the DosageRegimen is constructed. The definition of the values are as follows:

  • amt: the amount of the dose. This is the only required value.
  • time: the time at which the dose is given. Defaults to 0.
  • evid: the event id. 1 specifies a normal event. 3 means it's a reset event, meaning that the value of the dynamical variable is reset to the amt at the dosing event. If 4, then the dynamical value and time are reset, and then a final dose is given. Defaults to 1.
  • ii: the inter-dose interval. For steady state events, this is the length of time between successive doses. When addl is specified, this is the length of time between dose events. Defaults to 0.
  • addl: the number of additional events of the same type, spaced by ii. Defaults to 0.
  • rate: the rate of administration. If 0, then the dose is instantaneous. Otherwise the dose is administrated at a constant rate for a duration equal to amt/rate.
  • ss: an indicator for whether the dose is a steady state dose. A steady state dose is defined as the result of having applied the dose with the interval ii infinitely many successive times. 0 indicates that the dose is not a steady state dose. 1 indicates that the dose is a steady state dose. 2 indicates that it is a steady state dose that is added to the previous amount. The default is 0.
  • route: route of administration to be used in NCA analysis if it is carried out with the integrated interface inside @model. Defaults to NullRoute which is basically no route specified.

This specification leads to the following default constructor for the DosageRegimen type:

Pumas.DosageRegimenType
DosageRegimen(
  amt::Numeric;
  time::Numeric = 0,
  cmt::Union{Numeric,Symbol} = 1,
  evid::Numeric = 1,
  ii::Numeric = zero.(time),
  addl::Numeric = 0,
  rate::Numeric = zero.(amt)./oneunit.(time),
  duration::Numeric = zero(amt)./oneunit.(time),
  ss::Numeric = 0,
  route::NCA.Route = NCA.NullRoute,
)

Lazy representation of a series of events.

Examples

julia> DosageRegimen(100; ii = 24, addl = 6)
DosageRegimen
 Row │ time     cmt    amt      evid  ii       addl   rate     duration  ss    route
     │ Float64  Int64  Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼───────────────────────────────────────────────────────────────────────────────────
   1 │     0.0      1    100.0     1     24.0      6      0.0       0.0     0  NullRoute

julia> DosageRegimen(50; ii = 12, addl = 13)
DosageRegimen
 Row │ time     cmt    amt      evid  ii       addl   rate     duration  ss    route
     │ Float64  Int64  Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼───────────────────────────────────────────────────────────────────────────────────
   1 │     0.0      1     50.0     1     12.0     13      0.0       0.0     0  NullRoute

julia> DosageRegimen(200; ii = 24, addl = 2)
DosageRegimen
 Row │ time     cmt    amt      evid  ii       addl   rate     duration  ss    route
     │ Float64  Int64  Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼───────────────────────────────────────────────────────────────────────────────────
   1 │     0.0      1    200.0     1     24.0      2      0.0       0.0     0  NullRoute

julia> DosageRegimen(200; ii = 24, addl = 2, route = NCA.IVBolus)
DosageRegimen
 Row │ time     cmt    amt      evid  ii       addl   rate     duration  ss    route
     │ Float64  Int64  Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼───────────────────────────────────────────────────────────────────────────────────
   1 │     0.0      1    200.0     1     24.0      2      0.0       0.0     0  IVBolus

You can also create a new DosageRegimen from various existing DosageRegimens:

evs = DosageRegimen(
  regimen1::DosageRegimen,
  regimen2::DosageRegimen;
  offset = nothing
)

offset specifies if regimen2 should start after an offset following the end of the last event in regimen1.

Examples

julia> e1 = DosageRegimen(100; ii = 24, addl = 6)
DosageRegimen
 Row │ time     cmt    amt      evid  ii       addl   rate     duration  ss    route
     │ Float64  Int64  Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼───────────────────────────────────────────────────────────────────────────────────
   1 │     0.0      1    100.0     1     24.0      6      0.0       0.0     0  NullRoute

julia> e2 = DosageRegimen(50; ii = 12, addl = 13)
DosageRegimen
 Row │ time     cmt    amt      evid  ii       addl   rate     duration  ss    route
     │ Float64  Int64  Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼───────────────────────────────────────────────────────────────────────────────────
   1 │     0.0      1     50.0     1     12.0     13      0.0       0.0     0  NullRoute

julia> evs = DosageRegimen(e1, e2)
DosageRegimen
 Row │ time     cmt    amt      evid  ii       addl   rate     duration  ss    route
     │ Float64  Int64  Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼───────────────────────────────────────────────────────────────────────────────────
   1 │     0.0      1    100.0     1     24.0      6      0.0       0.0     0  NullRoute
   2 │     0.0      1     50.0     1     12.0     13      0.0       0.0     0  NullRoute

julia> DosageRegimen(e1, e2; offset = 10)
DosageRegimen
 Row │ time     cmt    amt      evid  ii       addl   rate     duration  ss    route
     │ Float64  Int64  Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼───────────────────────────────────────────────────────────────────────────────────
   1 │     0.0      1    100.0     1     24.0      6      0.0       0.0     0  NullRoute
   2 │   178.0      1     50.0     1     12.0     13      0.0       0.0     0  NullRoute

Steady State Dosing

In addition to applying the specified dose, steady state dosing resets the compartments to a steady state (if ss = 1) or adds the the steady state values to the current compartment values (if ss = 2).

Warn

Steady state dosing can only be used with models for which a non-trivial steady state exists. For instance, if the @dynamics block includes an equation for the AUC of a compartment, then the system can be in steady state only if the corresponding compartment is zero.

There are three classes of steady state doses:

ClassAmount (amt)Rate (rate)
Constant infusion0>0
Multiple infusions>0>0
Multiple bolus doses>00

The steady state of the system can be viewed as the result of having applied the steady state dose from the infinite past, depending on the type of the dose as a constant infusion or as a sequence of doses. These so-called implied past doses do not affect the values of the system prior to the time point (time) of the dose record.

Constant Infusion

The time point (time) of a steady state dose record with constant infusion denotes the time point at which the infusion from the infinite past ends. Therefore, absorption lag (lags) does not apply to constant infusions and the dose event cannot be applied multiple times (ii and addl must be zero). Since the dose amount (amt) is zero, also other dose control parameters (bioav, rate, and duration) do not apply to steady state events with constant infusion.

Multiple Doses

For steady state events with multiple infusions or bolus doses, the time point (time) of the dose record denotes the time point at which the infusion is started or the bolus is injected. Dose control parameters (lags, bioav, rate, and duration) at the time point (time) of the dose record apply to all doses, including the implied doses prior to the dose event.

The Subject Constructor

The dosage regimen is only a subset of what we need to fully specify a subject. As mentioned above, we use the Subject type to represent individuals in Pumas. As seen below, then can either be constructed from primitives (time, events covariates, and observations) or read from a tabular format that will be mapped to these concepts. They can be constructed using the Subject constructor programmatically:

Pumas.SubjectType
Subject

The data corresponding to a single subject:

Fields:

  • id: identifier
  • observations: a named tuple of the dependent variables
  • covariates: a named tuple containing the covariates, or nothing.
  • events: a DosageRegimen, or nothing.
  • time: a vector of time stamps for the observations

When there are time varying covariates, each covariate is interpolated with a common covariate time support. The interpolated values are then used to build a multi-valued interpolant for the complete time support.

From the multi-valued interpolant, certain discontinuities are flagged in order to use that information for the differential equation solvers and to correctly apply the analytical solution per region as applicable.

Constructor

Subject(;id = "1",
         observations = NamedTuple(),
         events = nothing,
         time = observations isa AbstractDataFrame ? observations.time : nothing,
         covariates::Union{Nothing, NamedTuple} = nothing,
         covariates_time = observations isa AbstractDataFrame ? observations.time : nothing,
         covariates_direction = :left)

Subject may be constructed from an <:AbstractDataFrame with the appropriate schema or by providing the arguments directly through separate DataFrames / structures.

Examples:

julia> Subject()
Subject
  ID: 1

julia> Subject(
         id = 20,
         events = DosageRegimen(200, ii = 24, addl = 2),
         covariates = (WT = 14.2, HT = 5.2),
       )
Subject
  ID: 20
  Events: 3
  Covariates: WT, HT

julia> Subject(covariates = (WT = [14.2, 14.7], HT = fill(5.2, 2)), covariates_time = [0, 3])
Subject
  ID: 1
  Covariates: WT, HT

or using read_pumas from tabular data:

Pumas.read_pumasFunction
read_pumas(filepath::AbstractString; missingstring = ["", ".", "NA"], kwargs...)
read_pumas(df::AbstractDataFrame; kwargs...)

Import NMTRAN-formatted data. You can either pass a CSV file path or a data frame as the first and only positional argument.

Keyword Arguments

  • observations (default: [:dv]): a vector of column names of dependent variables.

  • covariates (default: Symbol[]): a vector of column names of covariates.

  • id::Symbol (default: :id): the name of the column with the IDs of the individuals. Each individual should have a unique integer or string.

  • time::Symbol (default: :time): the name of the column with the time corresponding to the row. Time should be unique per ID, i.e., there should be no duplicate time values for a given subject.

  • evid::Union{Symbol,Nothing} (default: nothing): the name of the column with event IDs, or nothing. Possible event IDs are:

    • 0: observation
    • 1: dose event
    • 2: other type event
    • 3: reset event (the amounts in each compartment are reset to zero and the on/off status of each compartment is reset to its initial status)
    • 4: reset and dose event

    The event ID defaults to 0 if the dose amount is 0 or missing, and 1 otherwise.

  • amt::Symbol (default: :amt): the name of the column of dose amounts. If the event ID is specified and non-zero, the dose amount should be non-zero. The default dose amount is 0.

  • addl::Symbol (default: :addl): the name of the column that indicates the number of repeated dose events. The number of additional doses defaults to 0.

  • ii (default: :ii): the name of the column of inter-dose intervals. When the number of additional doses is specified and non-zero, this is the time to the next dose. For steady-state events with multiple infusions or bolus doses, this is the time between implied doses. The default inter-dose interval is 0. It is required to be non-zero for steady-state events with multiple infusions or bolus doses, and it is required to be zero for steady-state events with constant infusion.

  • cmt::Symbol (default: :cmt): the name of the column with the compartment to be dosed. Compartments can be specified by integers, strings and Symbols. The default compartment is 1.

  • rate::Symbol (default: :rate): the name of the column with the rate of administration. A rate=-2 allows the rate to be determined by Dose Control Parameters (DCP). Defaults to 0. Possible values are:

    • 0: instantaneous bolus dose
    • >0: infusion dose administered at a constant rate for a duration equal to amt/rate
    • -2: infusion rate or duration specified by the dose control parameters (see @dosecontrol)
  • ss::Symbol (default: :ss): the name of the column that indicates whether a dose is a steady-state dose. A steady-state dose is defined as the result of having applied the same dose from the infinite past. Possible values of the steady-state indicator are:

    • 0: dose is not a steady state dose.
    • 1: dose is a steady state dose, and the compartment amounts are to be reset to the steady-state amounts resulting from the given dose. Compartment amounts resulting from prior dose event records are "zeroed out", and infusions in progress or pending additional doses are cancelled.
    • 2: dose is a steady state dose and the compartment amounts are to be set to the sum of the steady-state amounts resulting from the given dose plus the amounts at the event time were the steady-state dose not given.

    The default value is 0.

  • route::Symbol: the name of the column that specifies the route of administration.

  • mdv::Union{Symbol,Nothing} (default: nothing): the name of the column that indicates if observations are missing, or nothing.

  • event_data::Bool (default: true): toggles assertions applicable to event data. More specifically, checks if the following columns are present in the DataFrame, either as the default values or as user-defined values: :id, :time, and :amt. If no :evid column is present, then a warning will be thrown, and :evid is set to 1 when :amt values are >0 or not missing, or :evid is set to 0 when :amt values are missing and observations are not missing. Otherwise, read_pumas will throw an error.

  • covariates_direction::Symbol (default: :left): the direction of covariate interpolation. Either :left (Last Observation Carried Forward, LOCF) (default), or :right (Next Observation Carried Backward, NOCB). Notice, that for models with occasion variables it is important to use :left for the correct behavior of the interpolation.

  • check::Bool (default: event_data): toggles NMTRAN compliance check of the input data. More specifically, checks if the following columns are present in the DataFrame, either as the default values or as user-defined values: :id, :time, :amt, :cmt, :evid, :addl, :ii, :ss, and :route. Additional checks are:

    • all variables in observations are numeric, i.e. Integer or AbstractFloat.
    • :amt column is numeric, i.e. Integer or AbstractFloat.
    • :cmt column is either a positive Integer, an AbstractString, or a Symbol.
    • :amt values must be missing or 0 when evid = 0; or >=0 otherwise.
    • all variables in observation are missing when evid = 1.
    • :ii column must be present if :ss is specified or present.
    • :ii values must be missing or 0 when evid = 0.
    • :ii column must be >0 if :addl values are >0, and vice-versa.
    • :addl column, if present, must be >=0 when evid = 1.
    • :evid values must be !=0 when :amt values are >0, or :addl and :ii values are >0.
  • adjust_evid34::Bool (default: true): toggles adjustment of time vector for reset events (evid = 3 and evid = 4). If true (the default) then the time of the previous event is added to the time on record to ensure that the time vector is monotonically increasing.

If a column does not exist, its values are imputed to be the default values.

You can also create Subjects from an existing Subject:

Pumas.SubjectMethod
Subject(subject::Subject; id::AbstractString, events)

Construct a Subject from an existing subject while replacing information according to the keyword arguments. The possible keyword arguments are

  • id, sets the subject identifier to the string id. Defaults to the identifier already in subject.
  • events, sets the subject events to the input DosageRegimen. Defaults to the events already present in subject.
  • covariates and covariates_times, used as input to add covariate information

Examples:

julia> s1 = Subject(; id = "AKJ491", events = DosageRegimen(1.0; time = 1.0))
Subject
  ID: AKJ491
  Events: 1

julia> Subject(s1; id = "AKJ492")
Subject
  ID: AKJ492
  Events: 1

The simulated output of a PumasModel as a result of simobs is Pumas.SimulatedObservations. Passing this simobs output into a vectorized Subject will result in a Population (an alias for a vector of Subjects) that is equivalent to the output of read_pumas. This is a convenient feature that allows one to simulate data and turn it back into a Population that can then be passed into a fit function:

Pumas.SubjectMethod

Subject

Constructor

Subject(simsubject::SimulatedObservations)

Roundtrip the result of simobs, i.e. SimulatedObservations to a Subject/Population

Example:

sims = simobs(model, pop, params)

To convert sims to a Population, broadcast as below

Subject.(sims)

Understanding the Subject

The Subject is a central concept when it comes to simulation and fitting in Pumas. To do either of the two we need a model and data. The model is built using the PumasModel or PumasEMModel but carries no information about events, covariate values, or observed values. These all come from the Subject. In Pumas material you will also see Population mentioned and that is nothing but a collection of Subjects. Specifically, the collection type is a Vector such that if sub1, sub2, and sub3 are Subjects then [sub1, sub2, sub3] is a Population.

As mentioned, a model alone does not carry all the information needed to perform a simulation. In theory, we could divide the information into two categories:

  1. Observations
  2. Covariates

Both sets of information is always associated with a specific time point that refers to when the information was collected. In the case of dosing information the information is typically not collected but taken from protocol even if we would ideally always want to know the actual dosing time rather than the protocol time in a fitting context.

Covariates can further be divided into to categories: events and actual covariates. Events are typically dosing records including information such as additional doses, whether to treat the dose as if it has been given from an infinite past (steady-state dosing), and so on. Actual covariates would be information about the subject that does not directly affect events and that are not observations of derived variables in the statistical model. This could be demographics, formulation information that does not fit the dosing regimen, occasion indices, and so on. In Pumas, these two sets of covariate information fall into the two sub-categories events and covariates.

Observations

The data used for fitting model parameters and the data that will be simulated in simobs is specified in what we call observations

  Subject(id = "1", time = [...], observations = [...])

If we have data on record we will often use the read_pumas function that is designed to map tabular data to Subjects. This will be explained below. It is also possible to input data directly in the Subject constructor as follows:

Subject(
    id = "1",
    time = [0.0, 2.0, 4.0, 24.0],
    observations = (; dv = [3.0, 1.2, 0.8, 0.001]),
)

This requires the time vector to match up with the vectors inside the NamedTuple passed to observations. If we had two dependent variables we would simply add the second one to the NamedTuple

Subject(
    id = "1",
    time = [0.0, 2.0, 4.0, 24.0],
    observations = (; dv = [3.0, 1.2, 0.8, 0.001], dv2 = [0.1, 0.6, 0.9, 1.05]),
)

Where each element of each vector in the observations input matches the time vector. Some times there will not be a complete overlap between the two sets of sampling time points and in that case we need to add missing values. Let us say that dv2 is not sampled for the first two hours but it is additionally sampled at 48 hours then it might look like the following:

Subject(
    id = "1",
    time = [0.0, 2.0, 4.0, 24.0, 48.0],
    observations = (;
        dv = [3.0, 1.2, 0.8, 0.001, missing],
        dv2 = [missing, missing, 0.9, 1.05, 0.2],
    ),
)

Notice, that one dependent variable cannot have two observations at the same time point. Sometimes this can occur in data if pre-dose plasma samples are labelled as having the same time value as the dose and the post-dose sample. This is an example in the category mentioned above where the actual sample time is not reflected in the data set as you cannot realistically take a pre-dose sample, dose, and take a post-dose sample with a nanosecond. If this occurs, you need to shift the time point of the pre-dose sample slightly. Below, we add a pre-dose sample 3 minutes prior to dosing by setting time to -3/60 = -0.05:

Subject(
    id = "1",
    time = [-0.05, 0.0, 2.0, 4.0, 24.0, 48.0],
    observations = (;
        dv = [0.0, 3.0, 1.2, 0.8, 0.001, missing],
        dv2 = [missing, missing, missing, 0.9, 1.05, 0.2],
    ),
)

The same can be done in a DataFrame if using read_pumas.

Note

Notice, that when setting negative time points in the Subject, the reference time of the model is shifted from t0=0 to the earliest time point contained in the set of time of first observation, time of first dose, and time of first covariate value being observed.

As mentioned above, if we have data on record it is more often the case that we use read_pumas. The direct construction of Subjects is often related to simulation exercises where we want to simulate data instead. When simulating data, we can either choose to specify what variables to simulate and store in the simulated subject through the observations input or we can choose to not input anything and in that case all variables returned from @derived will be treated as relevant. Consider the following @derived block:

  @derived begin
    cp1 := @. Central / Vc
    dv1 ~ @. Normal(cp1, abs(cp1) * σ)
    cp2 := @. Metabolite / Vm
    dv2 ~ @. Normal(cp2, abs(cp2) * σ)
  end

This is a parent-metabolite model with proportional error models. cp1 and cp2 are suppressed from the output through := so the relevant variables to consider for data simulation is parent concentration dv1 and metabolite concentration dv2. If we want to simulate data from this model we need to prepare our subjects in one of the following two ways:

Subject(id = "1-both-1", time = [0.0, 1.0, 3.0, 4.0])

or

Subject(id = "1-both-2", time = [0.0, 1.0, 3.0, 4.0], observations = (:dv1, :dv2))

They will both contain the same information in this case because the omission of observations means "simulate everything" and in the second case we include observations and we mention everything. The list of variables to simulate should be given as a Tuple of Symbols (the names of the dv with a : in front).

However, if we wanted to simulate only parent or only metabolite we would have to do something like the following:

Subject(id = "1-parent", time = [0.0, 1.0, 3.0, 4.0], observations = (:dv1,))
Subject(id = "1-metabolite", time = [0.0, 1.0, 3.0, 4.0], observations = (:dv2,))

and then we could create a Subject from the output of simobs called with each subject by itself and we would get two different subjects back: one with only parent concentrations, and one with only metabolite concentrations.

Note

Selective simulation of dependent variables was described above. The selective application of dependent variables also works similarly in fitting. If you only have dv1 in your Subject but the model was both dv1 and dv2 the model will simply be fit under the assumption that dv2 does not add anything to the log-likelihood. This means that you can have a Population with only a subset of the dependent variables defined with ~ in your @derived block. A case where that could be useful is exactly the model above where you might have parent-metabolite relationships to consider, but you want to model the parent in a first stage and a metabolite in a second stage.

PumasNDF

The PumasNDF is a specification for building a Population (an alias for a vector of Subjects) from tabular data. Generally this tabular data is given by a database like a CSV. The CSV has columns described as follows:

  • id: the ID of the individual. Each individual should have a unique integer, or string.
  • time: the time corresponding to the row. Should be unique per id, i.e. no duplicate time values for a given subject.
  • evid: the event id. 1 specifies a normal dose event. 3 means it's a reset event, meaning that the value of the dynamical variable is reset to the amt at the dosing event. If 4, then the dynamical value and time are reset, and then a final dose is given. Defaults to 0 if amt is 0 or missing, and 1 otherwise.
  • amt: the amount of the dose. If the evid column exists and is non-zero, this value should be non-zero. Defaults to 0.
  • ii: the inter-dose interval. When addl is specified, this is the length of time to the next dose. For steady state events, this is the length of time between successive doses. Defaults to 0, and is required to be non-zero on rows where a steady-state event is specified.
  • addl: the number of additional doses of the same amt to give. Defaults to 0.
  • rate: the rate of administration. If 0, then the dose is instantaneous. Otherwise, the dose is administered at a constant rate for a duration equal to amt/rate. A rate=-2 allows the rate to be determined by Dose Control Parameters (DCP). Defaults to 0.
  • ss: an indicator for whether the dose is a steady state dose. A steady state dose is defined as the result of having applied the dose with the interval ii infinitely many successive times. 0 indicates that the dose is not a steady state dose. 1 indicates that the dose is a steady state dose. 2 indicates that it is a steady state dose that is added to the previous amount. The default is 0.
  • cmt: the compartment being dosed. Defaults to 1.
  • duration: the duration of administration. If 0, then the dose is instantaneous. Otherwise, the dose is administered at a constant rate equal to amt/duration. Defaults to 0.
  • Observation and covariate columns should be given as a time series of values of matching type. Constant covariates should be constant through the full column. Time points without a measurement should be denoted by a dot (.).

If a column does not exist, its values are imputed to be the defaults. Special notes:

  • If rate and duration exist, then it is enforced that amt=rate*duration.
  • All values and header names are interpreted as lower case.
Tip

Given the information above, it is important to understand how to read a dataset using the CSV package. We recommend that all blanks (""), .'s, NA's and any other character elements in your dataset be passed to the missingstring keyword argument when reading the file as below:

using CSV
data = CSV.read(
    joinpath("pathtomyfile", "mydata.csv"),
    DataFrame;
    missingstring = ["", ".", "NA", "BQL"],
)

For more information check out the CSV documentation.

PumasNDF Checks

The read_pumas function performs some general checks on the provided data and informs the user about inconsistency in the data. It throws an error in case of invalid data, reporting the row number and column name causing the problem such that the user can resolve the issue.

Following is the list of checks applied by the read_pumas function.

Necessary columns in case of event and non-event data

In case of event_data=true (the default), the dataset must contain id, time, amt, and observations columns.

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    cmt = [1, 2],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
    evid = [1, 0],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

[ Info: The input has keys: [:id, :time, :cmt, :dv, :age, :sex, :evid]
ERROR: PumasDataError: The input must have: `id, time, amt, and observations` when `event_data` is `true`
[...]

In case of event_data=false, only the id column is required.

df = DataFrame(
    time = [0, 1],
    cmt = [1, 2],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
    evid = [1, 0],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex], event_data = false)

# output

ERROR: ArgumentError: column name "id" not found in the data frame; existing most similar names are: "dv" and "evid"
[...]

Event data without evid column

In case of event_data=true (the default), a warning is displayed if the provided dataset doesn't have an evid column.

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 0],
    cmt = [1, 2],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

┌ Warning: Your dataset has dose event but it hasn't an evid column. We are adding 1 for dosing rows and 0 for others in evid column. If this is not the case, please add your evid column.
│
└ @ Pumas
Population
  Subjects: 1
  Covariates: age, sex
  Observations: dv

Non-numeric/string entries in an observation column

If there are non-numeric or string entries in an observation column, read_pumas throws an error and reports row(s) and column(s) having this issue.

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 0],
    cmt = [1, 2],
    dv = [missing, "k@"],
    age = [45, 45],
    sex = ["M", "M"],
    evid = [1, 0],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: [1], row = [2], col = dv]  We expect the dv column to be of numeric type.
These are the unique non-numeric values present in the column dv: ("k@",)
[...]

Non-numeric/string entries in the amt column

Similarly, if there are non-numeric or string entries in an observation column, read_pumas throws an error and reports row(s) and column(s) having this issue.

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = ["k8", 0],
    cmt = [1, 2],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
    evid = [1, 0],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: [1], row = [1], col = amt]  We expect the amt column to be of numeric type.
These are the unique non-numeric values present in the column amt: ("k8",)
[...]

cmt must be a positive integer or valid string/symbol for non-zero evid data record

The cmt column should contain positive numbers or string/symbol identifiers of the compartment being dosed.

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 0],
    cmt = [-1, 2],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
    evid = [1, 0],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = cmt] column had invalid value: -1. Pumas does not currently support negative compartment values to turn off compartments.
[...]

amt can only be missing or zero when evid is zero

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 5],
    cmt = [1, 2],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
    evid = [1, 0],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 2, col = evid] amt can only be missing or 0 when evid is 0
[...]

amt can only be positive or zero when evid is 1

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [-10, 0],
    cmt = [1, 2],
    evid = [1, 0],
    dv = [10, 8],
    age = [45, 45],
    sex = ["M", "M"],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = evid] amt can only be positive or zero when evid is 1
[...]

Observations at the time of dose

Observations should be missing at the time of dose (or when amt > 0).

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 0],
    cmt = [1, 2],
    evid = [1, 0],
    dv = [10, 8],
    age = [45, 45],
    sex = ["M", "M"],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = dv] an observation is present at the time of dose in column dv. A blank record (`missing`) is required at time of dosing, i.e. when `amt` is positive.
[...]

Steady-state dosing requires ii column

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 0],
    ss = [1, 0],
    cmt = [1, 2],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
    evid = [1, 0],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: your dataset does not have ii which is a required column for steady state dosing.
[...]

Steady-state dosing with multiple infusions or bolus doses requires positive ii

In case of a steady-state event with multiple infusions or bolus doses (ss = 1 or ss = 2 and amt > 0), the value of the dose interval column ii must be non-zero.

If the rate column is not provided it is assumed to be zero.

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 0],
    ss = [1, 0],
    ii = [0, 0],
    cmt = [1, 2],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
    evid = [1, 0],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = ii] for steady-state dosing the value of the interval column ii must be non-zero but was 0
[...]

Steady-state dosing with constant infusion requires zero ii

In case of a steady-state event with constant infusion (ss = 1 or ss = 2, amt = 0 and rate > 0), the value of the dose interval column ii must be zero.

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [0, 0],
    ss = [1, 0],
    rate = [2, 0],
    ii = [1, 0],
    cmt = [1, 2],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
    evid = [1, 0],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = ii] for steady-state infusion the value of the interval column ii must be zero but was 1
[...]

Steady-state dosing with constant infusion requires zero addl

In case of a steady-state event with constant infusion (ss = 1 or ss = 2, amt = 0 and rate > 0), the value of the additional dose column addl must be zero.

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [0, 0],
    ss = [1, 0],
    rate = [2, 0],
    ii = [0, 0],
    addl = [5, 0],
    cmt = [1, 2],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
    evid = [1, 0],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = addl] for steady-state infusion the value of the additional dose column addl must be zero but was 5
[...]

Non-zero addl requires ii column

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 0],
    addl = [5, 0],
    cmt = [1, 2],
    evid = [1, 0],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: The dataset has an `addl` column specified with non-zero entries but does not have an `ii` column specified. It is not possible to specify the additional doses without providing the interval of time between them. Please specify which column the `ii` information is present in in your `read_pumas` call.
[...]

Non-zero addl requires positive ii

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 0],
    addl = [5, 0],
    ii = [0, 0],
    cmt = ["Depot", "Central"],
    evid = [1, 0],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = ii]  ii must be positive for addl > 0
[...]

Positive ii requires non-zero addl

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 0],
    addl = [0, 0],
    ii = [12, 0],
    cmt = ["Depot", "Central"],
    evid = [1, 0],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = addl]  addl must be positive for ii > 0
[...]

ii can only be missing or zero when evid is zero

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 0],
    addl = [5, 2],
    ii = [12, 4],
    cmt = ["Depot", "Central"],
    evid = [1, 0],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 2, col = evid]  ii can only be missing or zero when evid is 0
[...]

addl can only be positive or zero when evid is 1

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 0],
    addl = [-10, 0],
    ii = [12, 0],
    cmt = ["Depot", "Central"],
    evid = [1, 0],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = evid]  addl can only be positive or zero when evid is 1
[...]

evid must be nonzero when amt is positive or addl and ii are positive

When amt is positive, evid must be non-zero as evid=0 indicates an observation record.

df = DataFrame(
    id = [1, 1],
    time = [0, 1],
    amt = [10, 0],
    addl = [5, 0],
    ii = [12, 0],
    cmt = ["Depot", "Central"],
    evid = [0, 0],
    dv = [missing, 8],
    age = [45, 45],
    sex = ["M", "M"],
)
read_pumas(df; observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = evid] amt can only be missing or 0 when evid is 0
[...]