# Dosage Regimens, Subjects, and Populations

In Pumas, subjects are represented by the Subject type and collections of subjects are represented as Vectors of Subjects aliased Population. Subjects are defined by their identifier, observations, covariates, and events. In this section we will specify the methods used for defining Subjects programmatically or using the read_pumas function that reads in data that follows the Pumas NLME Data Format (PumasNDF) data format. Before we look at Subjects, we will take a look at how to define events as represented by the DosageRegimen type.

## Dosage Regimen Terminology

When subjects are subjected to treatment it is represented by an event in Pumas. Administration of a drug is represented by a DosageRegimen that describes the amount, type, frequency and route. DosageRegimens can either be constructed programmatically using the DosageRegimen constructor or from a data source in the PumasNDF format using read_pumas. The names of the inputs are the same independent of how the DosageRegimen is constructed. The definition of the values are as follows:

• amt: the amount of the dose. This is the only required value.
• time: the time at which the dose is given. Defaults to 0.
• evid: the event id. 1 specifies a normal event. 3 means it's a reset event, meaning that the value of the dynamical variable is reset to the amt at the dosing event. If 4, then the dynamical value and time are reset, and then a final dose is given. Defaults to 1.
• ii: the interdose interval. For steady state events, this is the length of time between successive doses. When addl is specified, this is the length of time to the next dose. Defaults to 0.
• addl: the number of additional events of the same types, spaced by ii. Defaults to 0.
• rate: the rate of administration. If 0, then the dose is instantaneous. Otherwise the dose is administrated at a constant rate for a duration equal to amt/rate.
• ss: an indicator for whether the dose is a steady state dose. A steady state dose is defined as the result of having applied the dose with the interval ii infinitely many successive times. 0 indicates that the dose is not a steady state dose. 1 indicates that the dose is a steady state dose. 2 indicates that it is a steady state dose that is added to the previous amount. The default is 0.
• route: route of administration to be used in NCA analysis if it is carried out with the integrated interface inside @model. Defaults to NullRoute which is basically no route specified.

This specification leads to the following default constructor for the DosageRegimen type

DosageRegimen(amt::Numeric;
time::Numeric = 0,
cmt::Union{Numeric,Symbol} = 1,
evid::Numeric = 1,
ii::Numeric = zero.(time),
rate::Numeric = zero.(amt)./oneunit.(time),
duration::Numeric = zero(amt)./oneunit.(time),
ss::Numeric = 0,
route::NCA.Route = NCA.NullRoute)

Each of the values can either be or scalars. All vectors must be of the same length, and the elementwise combinations each define an event (with scalars being repeated).

A DosageRegimen can be converted to its tabular form using the DataFrame function: DataFrame(dr).

Let us try to construct a few dosage regimens to see how these inputs change the constructed DosageRegimens. First, a simple instantaneous (default) dose with the amount 9:

DosageRegimen(9)

# output

DosageRegimen
Row │ time     cmt    amt      evid  ii       addl   rate     duration  ss    route
│ Float64  Int64  Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼───────────────────────────────────────────────────────────────────────────────────
1 │     0.0      1      9.0     1      0.0      0      0.0       0.0     0  NullRoute


We see that the default compartments, rates, etc were set for us. We recommend always setting a compartment name or index, so let us do that, and change the dosage regimen to a constant rate of 0.1. This implies a duration of 90:

DosageRegimen(9.0; cmt=:Central, rate=0.1)

# output

DosageRegimen
Row │ time     cmt      amt      evid  ii       addl   rate     duration  ss    route
│ Float64  Symbol   Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼─────────────────────────────────────────────────────────────────────────────────────
1 │     0.0  Central      9.0     1      0.0      0      0.1      90.0     0  NullRoute

We can also construct a dosage regimen that is composed of several DosageRegimens, this is done by passing several DosageRegimen instances to the DosageRegimen constructor:

dr1 = DosageRegimen(9.0; cmt=:Central, rate=0.1)
dr2 = DosageRegimen(9.0; time=1.0, cmt=:Central, rate=0.1)

DosageRegimen(dr1, dr2)

# output

DosageRegimen
Row │ time     cmt      amt      evid  ii       addl   rate     duration  ss    route
│ Float64  Symbol   Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼─────────────────────────────────────────────────────────────────────────────────────
1 │     0.0  Central      9.0     1      0.0      0      0.1      90.0     0  NullRoute
2 │     1.0  Central      9.0     1      0.0      0      0.1      90.0     0  NullRoute

In this case, the second dose was simply a repetition of the first after 1 unit of time. In this instance, we could also have used dr1 together with the offset keyword to DosageRegimen:

dr1 = DosageRegimen(9.0; cmt=:Central, rate=0.1)
DosageRegimen(dr1, dr1, offset = 1.0)

# output

DosageRegimen
Row │ time     cmt      amt      evid  ii       addl   rate     duration  ss    route
│ Float64  Symbol   Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼─────────────────────────────────────────────────────────────────────────────────────
1 │     0.0  Central      9.0     1      0.0      0      0.1      90.0     0  NullRoute
2 │     1.0  Central      9.0     1      0.0      0      0.1      90.0     0  NullRoute


We could also have used the ii and addl keywords to construct a more compact representation of the same dosage regimen:

DosageRegimen(9.0; cmt=:Central, rate=0.1, addl=1, ii=1.0)

# output

DosageRegimen
Row │ time     cmt      amt      evid  ii       addl   rate     duration  ss    route
│ Float64  Symbol   Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼─────────────────────────────────────────────────────────────────────────────────────
1 │     0.0  Central      9.0     1      1.0      1      0.1      90.0     0  NullRoute

Next, we show the vector form mentioned above. If we input vectors instead of scalars, we can simultaneously define several administrations in one constructor as follows:

DosageRegimen([9.0, 18]; cmt=:Central, rate=[0.1, 1.0], time=[1.0, 5.0], addl=1, ii=2)

# output

DosageRegimen
Row │ time     cmt      amt      evid  ii       addl   rate     duration  ss    route
│ Float64  Symbol   Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼─────────────────────────────────────────────────────────────────────────────────────
1 │     1.0  Central      9.0     1      2.0      1      0.1      90.0     0  NullRoute
2 │     5.0  Central     18.0     1      2.0      1      1.0      18.0     0  NullRoute

Finally, if you are carrying out NCA analysis through the integrated interface you need to specify route as shown below.

julia> DosageRegimen(200, ii = 24, addl = 2, route = NCA.IVBolus)

#output

DosageRegimen
Row │ time     cmt    amt      evid  ii       addl   rate     duration  ss    route
│ Float64  Int64  Float64  Int8  Float64  Int64  Float64  Float64   Int8  NCA.Route
─────┼───────────────────────────────────────────────────────────────────────────────────
1 │     0.0      1    200.0     1     24.0      2      0.0       0.0     0  IVBolus

## The Subject Constructor

The dosage regimen is only a subset of what we need to fully specify a subject. As mentioned above, we use the Subject type to represent individuals in Pumas. The can be constructed using the Subject constructor programatically or using read_pumas from tabular data. The constructor has the following keywords and default values:

Subject(;id = "1",
observations = nothing,
events = Pumas.Event[],
time = observations isa AbstractDataFrame ? observations.time : nothing,
event_data = true,
covariates::Union{Nothing, NamedTuple} = nothing,
covariates_time = observations isa AbstractDataFrame ? observations.time : nothing,
covariates_direction = :right)

The definitions of the arguments are as follows:

• id is the id of the subject. Defaults to "1" and can be a Number or String.
• observations holds the observational data. When using the @model interface, this must be a NamedTuple whose names match those of the derived variables.
• events is a DosageRegimen or a Vector{<:Pumas.Event}. Defaults to an empty event list.
• time is the time when observations are measured
• event_data is a boolean which defaults to true and triggers that the specified events adhere to the PumasNDF(@ref). When set to false, the checks for PumasNDF are turned off.
• covariates are the covariates for the subject given as a NamedTuple of covariate name and value pairs. Defaults to nothing, meaning no covariates.
• covariates_time - is a Vector of times that the covariates are observed at or, if some covariates are observed at other times than other covariates, a NamedTuple with observation time and covariate name pairs.
• covariates_direction - a symbol that determines end-point handling in the piece-wise constant interpolation for time-varying covariates. Allowed values are :left and :right.

### Constructing Subjects

Let's create a few Subjects to get the general idea of how to work programmatically with subjects in Pumas. We can start with the simplest subject:

Subject()

# output

Subject
ID: 1

Suppose we want to construct a Subject with a custom identifier, some observed glucose levels and weight, we can construct the subjects in the following way:

Subject(;
id="AKJ491",
events=DosageRegimen(1.0; time=1.0),
observations=(glucose=[8.31, 7.709, 5.19],),
time=[0.3, 0.9, 3.0],
covariates=(bloodpressure=[(143,95.01), (141.3, 94.8), (130.4, 85.1)],),
covariates_time=[0.0,0.5, 5.0])

# output

Subject
ID: AKJ491
Events: 1
Observations: glucose: (n=3)
Covariates: bloodpressure

### Reading Subjects from tabular data

The read_pumas function allows you to read in

read_pumas(data; covariates=Symbol[], observations=Symbol[:dv];
ii=:ii, cmt=:cmt, rate=:rate, ss=:ss,
event_data=true, covariates_direction=:right,
parse_tad = true, check=event_data)

The only required argument is data. It is the tabular data source given as an actual table (for example using DataFrame) or a string that contains the path to something that can be parsed by CSV.jl.

The other arguments are optional (keyword arguments) and allow changing the column names from their default. The keywords id, time, covariates, and observations are used to tell what columns to parse as what type of information and covariates_direction is used as in the Subject constructor. The keywords evid, time, amt, cmt, addl, ii, rate, ss, and event_data tells what columns to use for the different types of information we saw when we constructed DosageRegimens above. The keyword parse_tad detects and parses time-after-dose time in the time column, and check is used to turn off the data checks mentioned below in PumasNDF.

### Subject constructor for simobs output

The simulated output of a PumasModel as a result of simobs is Pumas.SimulatedObservations. Passing this simobs output into Subject will result in a Population that is equivalent to the output of read_pumas. This is a convenient feature that allows one to simulate data and turn it back into a Population that can then be passed into a a fit function. Below is an example from the introduction tutorial.

Given a simobs call as below,

sims = simobs(
inf_2cmt_lin_turnover,
pop,
turnover_params,
obstimes = sd_obstimes)

it can be turned into a Population as below

julia> Subject.(sims)
Population
Subjects: 10
Covariates:
Observables: dv, resp

## PumasNDF

The PumasNDF is a specification for building a Population (an alias for a vector of Subject's') from tabular data. Generally this tabular data is given by a database like a CSV. The CSV has columns described as follows:

• id: the ID of the individual. Each individual should have a unique integer, or string.
• time: the time corresponding to the row. Should be unique per id, i.e. no duplicate time values for a given subject.
• evid: the event id. 1 specifies a normal event. 3 means it's a reset event, meaning that the value of the dynamical variable is reset to the amt at the dosing event. If 4, then the dynamical value and time are reset, and then a final dose is given. Defaults to 0 if amt is 0 or missing, and 1 otherwise.
• amt: the amount of the dose. If the evid column exists and is non-zero, this value should be non-zero. Defaults to 0.
• ii: the interdose interval. When addl is specified, this is the length of time to the next dose. For steady state events, this is the length of time between successive doses. Defaults to 0, and is required to be non-zero on rows where a steady-state event is specified.
• addl: the number of additional doses of the same time to give. Defaults to 0.
• rate: the rate of administration. If 0, then the dose is instantaneous. Otherwise the dose is administrated at a constant rate for a duration equal to amt/rate. A rate=-2 allows the rate to be determined by Dose Control Parameters (DCP). Defaults to 0.
• ss: an indicator for whether the dose is a steady state dose. A steady state dose is defined as the result of having applied the dose with the interval ii infinitely many successive times. 0 indicates that the dose is not a steady state dose. 1 indicates that the dose is a steady state dose. 2 indicates that it is a steady state dose that is added to the previous amount. The default is 0.
• cmt: the compartment being dosed. Defaults to 1.
• duration: the duration of administration. If 0, then the dose is instantaneous. Otherwise the dose is administered at a constant rate equal to amt/duration. Defaults to 0.
• Observation and covariate columns should be given as a time series of values of matching type. Constant covariates should be constant through the full column. Time points without a measurement should be denoted by a ..

If a column does not exists, its values are imputed to be the defaults. Special notes:

• If rate and duration exist, then it is enforced that amt=rate*duration
• All values and header names are interpreted as lower case.
Tip

Given the information above, it is important to understand how to read a dataset using the CSV.jl package. We recommend that all blanks (""), .'s, NA's and any other character elements in your dataset be passed to the missingstrings keyword argument when reading the file as below

using CSV
data = CSV.File(joinpath("pathtomyfile", "mydata.csv"), DataFrame, missingstrings = ["", ".", "NA", "BQL"])

### PumasNDF Checks

The read_pumas function does some general checks on the provided data and informs the user about inconsistency in the data and throws an error in case of invalid data reporting the row number and column name causing the problem. This will allow the user to resolve the issue.

Following is the list of checks applied by read_pumas function with examples.

#### Necessary columns in case of event and non-event data

When the event_data is true, the dataset must contain id, time, amt, and observations columns.

In case of event_data = false, only requirement is id.

df = DataFrame(id = [1,1], time = [0,1], cmt = [1,2], dv = [missing,8],
age = [45,45], sex  =  ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex], event_data = true)

# output

[ Info: The input has keys: [:id, :time, :cmt, :dv, :age, :sex, :evid]
ERROR: PumasDataError: The input must have: id, time, amt, and observations when event_data is tru
[...]

#### No evid column but event_data argument is set to true

When provided dataset doesn't have evid column but event_data=true is passed to read_pumas function.

df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], cmt = [1,2], dv = [missing,8],
age = [45,45], sex  =  ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex], event_data = true)

# output

┌ Warning: Your dataset has dose event but it hasn't an evid column. We are adding 1 for dosing rows and 0 for others in evid column. If this is not the case, please add your evid column.
│
[...]

#### Non-numeric/string entries in an observation column

If there are non-numeric or string entries in an observation column, read_pumas throws an error and reports row(s) and column(s) having this issue.

df = DataFrame(id = [1,1], time = [0,1], amt = [10,0],
cmt = [1,2], dv = [missing,"k@"],
age = [45,45], sex  =  ["M","M"], evid = [1,0])

# output

ERROR: PumasDataError: [Subject id: [1], row = [2], col = dv]  We expect the dv column to be of numeric type.
These are the unique non-numeric values present in the column dv: ("k@",)

#### Non-numeric/string entries in amt column

This check is similar to above.

df = DataFrame(id = [1,1], time = [0,1], amt = ["k8",0],
cmt = [1,2], dv = [missing,8],
age = [45,45], sex  =  ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: [1], row = [1], col = amt]  We expect the amt column to be of numeric type.
These are the unique non-numeric values present in the column amt: ("k8",)

#### cmt must be a positive integer or valid string/symbol for non-zero evid data record

cmt column should contain positive numbers or string/symbol identifiers to compartment being dosed.

df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], cmt = [-1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = cmt] cmt column should be positive

#### amt can only be missing or zero when evid = 0

df = DataFrame(id = [1,1], time = [0,1], amt = [10,5], cmt = [1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 2, col = evid] amt can only be missing or 0 when evid is 0
[...]

#### amt can only be positive or zero when evid = 1

df = DataFrame(id = [1,1], time = [0,1], amt = [-10,0],
cmt = [1,2], evid = [1,0], dv = [10,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = evid] amt can only be positive or zero when evid is 1
[...]

#### Observation (dv) at the time of dose

Observation should be missing at the time of dose (or when amt > 0)

df = DataFrame(id = [1,1], time = [0,1], amt = [10,0],
cmt = [1,2], evid = [1,0], dv = [10,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = dv] an observation is present at the time of dose in column dv. A blank record (missing) is required at time of dosing, i.e. when amt is positive.

#### Steady-state column (ss) requires ii column

df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], ss = [1, 0],
cmt = [1,2], dv = [missing,8], age = [45,45],
sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: your dataset does not have ii which is a required column for steady state dosing.

#### Steady-state dosing requires ii > 0

Incase of steady-state dosing the value of the interval column ii must be non-zero

If rate column is not provided it is assumed to be zero.

df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], ss = [1, 0], ii = [0,0],
cmt = [1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = ii] for steady-state dosing the value of the interval column ii must be non-zero but was 0

#### Steady-state infusion requires ii = 0

Incase of steady-state infusion the value of the interval column ii must be zero

df = DataFrame(id = [1,1], time = [0,1], amt = [0,0], ss = [1, 0], rate = [2, 0], ii = [1, 0],
cmt = [1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = ii] for steady-state infusion the value of the interval column ii must be zero but was 1

#### Steady-state infusion requires addl = 0

Incase of steady-state infusion the value of the additional dose column addl must be zero

df = DataFrame(id = [1,1], time = [0,1], amt = [0,0], ss = [1, 0], rate = [2, 0], ii = [0, 0],
addl = [5, 0], cmt = [1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = addl] for steady-state infusion the value of the additional dose column addl must be zero but was 5

#### addl column is present but ii is not

df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], addl = [5,0],
cmt = [1,2], evid = [1,0], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: your dataset does not have ii which is a required column when addl is specified.

#### ii must be positive for addl > 0

df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], addl = [5,0], ii = [0,0],
cmt = ["Depot","Central"], evid = [1,0], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = ii]  ii must be positive for addl > 0

#### addl must be positive for ii > 0

df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], addl = [0,0], ii = [12,0],
cmt = ["Depot","Central"], evid = [1,0], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = addl]  addl must be positive for ii > 0

#### ii can only be missing or zero when evid = 0

df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], addl = [5,2], ii = [12,4],
cmt = ["Depot","Central"], evid = [1,0], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 2, col = evid]  ii can only be missing or zero when evid is zero

#### addl can only be positive or zero when evid = 1

df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], addl = [-10,0], ii = [12,0],
cmt = ["Depot","Central"], evid = [1,0], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = evid]  addl can only be positive or zero when evid is one

#### evid must be nonzero when amt > 0 or addl and ii are positive

When amt is positive, evid must be non-zero as evid = 0 indicates an observation record.

df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], addl = [5,0], ii = [12,0],
cmt = ["Depot","Central"], evid = [0,0], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])

# output

ERROR: PumasDataError: [Subject id: 1, row = 1, col = evid] amt can only be missing or 0 when evid is 0`