Dosage Regimens, Subjects, and Populations
In Pumas, subjects are represented by the Subject
type and collections of subjects are represented as Vector
s of Subject
s aliased Population
. Subjects are defined by their identifier, observations, covariates, and events. In this section we will specify the methods used for defining Subject
s programmatically or using the read_pumas
function that reads in data that follows the Pumas NLME Data Format (PumasNDF) data format. Before we look at Subject
s, we will take a look at how to define events as represented by the DosageRegimen
type.
Dosage Regimen Terminology
When subjects are subjected to treatment it is represented by an event in Pumas. Administration of a drug is represented by a DosageRegimen
that describes the amount, type, frequency and route. DosageRegimen
s can either be constructed programmatically using the DosageRegimen
constructor or from a data source in the PumasNDF
format using read_pumas
. The names of the inputs are the same independent of how the DosageRegimen
is constructed. The definition of the values are as follows:
amt
: the amount of the dose. This is the only required value.time
: the time at which the dose is given. Defaults to 0.evid
: the event id.1
specifies a normal event.3
means it's a reset event, meaning that the value of the dynamical variable is reset to theamt
at the dosing event. If4
, then the dynamical value and time are reset, and then a final dose is given. Defaults to1
.ii
: the interdose interval. For steady state events, this is the length of time between successive doses. Whenaddl
is specified, this is the length of time to the next dose. Defaults to0
.addl
: the number of additional events of the same types, spaced byii
. Defaults to 0.rate
: the rate of administration. If0
, then the dose is instantaneous. Otherwise the dose is administrated at a constant rate for a duration equal toamt/rate
.ss
: an indicator for whether the dose is a steady state dose. A steady state dose is defined as the result of having applied the dose with the intervalii
infinitely many successive times.0
indicates that the dose is not a steady state dose. 1 indicates that the dose is a steady state dose. 2 indicates that it is a steady state dose that is added to the previous amount. The default is 0.route
: route of administration to be used in NCA analysis if it is carried out with the integrated interface inside@model
. Defaults toNullRoute
which is basically no route specified.
This specification leads to the following default constructor for the DosageRegimen
type
DosageRegimen(amt::Numeric;
time::Numeric = 0,
cmt::Union{Numeric,Symbol} = 1,
evid::Numeric = 1,
ii::Numeric = zero.(time),
addl::Numeric = 0,
rate::Numeric = zero.(amt)./oneunit.(time),
duration::Numeric = zero(amt)./oneunit.(time),
ss::Numeric = 0,
route::NCA.Route = NCA.NullRoute)
Each of the values can either be or scalars. All vectors must be of the same length, and the elementwise combinations each define an event (with scalars being repeated).
A DosageRegimen
can be converted to its tabular form using the DataFrame
function: DataFrame(dr)
.
Let us try to construct a few dosage regimens to see how these inputs change the constructed DosageRegimen
s. First, a simple instantaneous (default) dose with the amount 9
:
DosageRegimen(9)
# output
DosageRegimen
Row │ time cmt amt evid ii addl rate duration ss route
│ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 NCA.Route
─────┼───────────────────────────────────────────────────────────────────────────────────
1 │ 0.0 1 9.0 1 0.0 0 0.0 0.0 0 NullRoute
We see that the default compartments, rates, etc were set for us. We recommend always setting a compartment name or index, so let us do that, and change the dosage regimen to a constant rate of 0.1. This implies a duration of 90:
DosageRegimen(9.0; cmt=:Central, rate=0.1)
# output
DosageRegimen
Row │ time cmt amt evid ii addl rate duration ss route
│ Float64 Symbol Float64 Int8 Float64 Int64 Float64 Float64 Int8 NCA.Route
─────┼─────────────────────────────────────────────────────────────────────────────────────
1 │ 0.0 Central 9.0 1 0.0 0 0.1 90.0 0 NullRoute
We can also construct a dosage regimen that is composed of several DosageRegimen
s, this is done by passing several DosageRegimen
instances to the DosageRegimen
constructor:
dr1 = DosageRegimen(9.0; cmt=:Central, rate=0.1)
dr2 = DosageRegimen(9.0; time=1.0, cmt=:Central, rate=0.1)
DosageRegimen(dr1, dr2)
# output
DosageRegimen
Row │ time cmt amt evid ii addl rate duration ss route
│ Float64 Symbol Float64 Int8 Float64 Int64 Float64 Float64 Int8 NCA.Route
─────┼─────────────────────────────────────────────────────────────────────────────────────
1 │ 0.0 Central 9.0 1 0.0 0 0.1 90.0 0 NullRoute
2 │ 1.0 Central 9.0 1 0.0 0 0.1 90.0 0 NullRoute
In this case, the second dose was simply a repetition of the first after 1 unit of time. In this instance, we could also have used dr1
together with the offset
keyword to DosageRegimen
:
dr1 = DosageRegimen(9.0; cmt=:Central, rate=0.1)
DosageRegimen(dr1, dr1, offset = 1.0)
# output
DosageRegimen
Row │ time cmt amt evid ii addl rate duration ss route
│ Float64 Symbol Float64 Int8 Float64 Int64 Float64 Float64 Int8 NCA.Route
─────┼─────────────────────────────────────────────────────────────────────────────────────
1 │ 0.0 Central 9.0 1 0.0 0 0.1 90.0 0 NullRoute
2 │ 1.0 Central 9.0 1 0.0 0 0.1 90.0 0 NullRoute
We could also have used the ii
and addl
keywords to construct a more compact representation of the same dosage regimen:
DosageRegimen(9.0; cmt=:Central, rate=0.1, addl=1, ii=1.0)
# output
DosageRegimen
Row │ time cmt amt evid ii addl rate duration ss route
│ Float64 Symbol Float64 Int8 Float64 Int64 Float64 Float64 Int8 NCA.Route
─────┼─────────────────────────────────────────────────────────────────────────────────────
1 │ 0.0 Central 9.0 1 1.0 1 0.1 90.0 0 NullRoute
Next, we show the vector form mentioned above. If we input vectors instead of scalars, we can simultaneously define several administrations in one constructor as follows:
DosageRegimen([9.0, 18]; cmt=:Central, rate=[0.1, 1.0], time=[1.0, 5.0], addl=1, ii=2)
# output
DosageRegimen
Row │ time cmt amt evid ii addl rate duration ss route
│ Float64 Symbol Float64 Int8 Float64 Int64 Float64 Float64 Int8 NCA.Route
─────┼─────────────────────────────────────────────────────────────────────────────────────
1 │ 1.0 Central 9.0 1 2.0 1 0.1 90.0 0 NullRoute
2 │ 5.0 Central 18.0 1 2.0 1 1.0 18.0 0 NullRoute
Finally, if you are carrying out NCA analysis through the integrated interface you need to specify route
as shown below.
julia> DosageRegimen(200, ii = 24, addl = 2, route = NCA.IVBolus)
#output
DosageRegimen
Row │ time cmt amt evid ii addl rate duration ss route
│ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 NCA.Route
─────┼───────────────────────────────────────────────────────────────────────────────────
1 │ 0.0 1 200.0 1 24.0 2 0.0 0.0 0 IVBolus
The Subject
Constructor
The dosage regimen is only a subset of what we need to fully specify a subject. As mentioned above, we use the Subject
type to represent individuals in Pumas. The can be constructed using the Subject
constructor programatically or using read_pumas
from tabular data. The constructor has the following keywords and default values:
Subject(;id = "1",
observations = nothing,
events = Pumas.Event[],
time = observations isa AbstractDataFrame ? observations.time : nothing,
event_data = true,
covariates::Union{Nothing, NamedTuple} = nothing,
covariates_time = observations isa AbstractDataFrame ? observations.time : nothing,
covariates_direction = :right)
The definitions of the arguments are as follows:
id
is the id of the subject. Defaults to"1"
and can be aNumber
orString
.observations
holds the observational data. When using the@model
interface, this must be aNamedTuple
whose names match those of the derived variables.events
is aDosageRegimen
or aVector{<:Pumas.Event}
. Defaults to an empty event list.time
is the time whenobservations
are measuredevent_data
is a boolean which defaults totrue
and triggers that the specified events adhere to the PumasNDF(@ref). When set tofalse
, the checks forPumasNDF
are turned off.covariates
are the covariates for the subject given as aNamedTuple
of covariate name and value pairs. Defaults tonothing
, meaning no covariates.covariates_time
- is aVector
of times that thecovariates
are observed at or, if some covariates are observed at other times than other covariates, aNamedTuple
with observation time and covariate name pairs.covariates_direction
- a symbol that determines end-point handling in the piece-wise constant interpolation for time-varying covariates. Allowed values are:left
and:right
.
Constructing Subject
s
Let's create a few Subject
s to get the general idea of how to work programmatically with subjects in Pumas. We can start with the simplest subject:
Subject()
# output
Subject
ID: 1
Suppose we want to construct a Subject with a custom identifier, some observed glucose levels and weight, we can construct the subjects in the following way:
Subject(;
id="AKJ491",
events=DosageRegimen(1.0; time=1.0),
observations=(glucose=[8.31, 7.709, 5.19],),
time=[0.3, 0.9, 3.0],
covariates=(bloodpressure=[(143,95.01), (141.3, 94.8), (130.4, 85.1)],),
covariates_time=[0.0,0.5, 5.0])
# output
Subject
ID: AKJ491
Events: 1
Observations: glucose: (n=3)
Covariates: bloodpressure
Reading Subject
s from tabular data
The read_pumas
function allows you to read in
read_pumas(data; covariates=Symbol[], observations=Symbol[:dv];
id=:id, time=:time, evid=:evid, amt=:amt, addl=:addl,
ii=:ii, cmt=:cmt, rate=:rate, ss=:ss,
event_data=true, covariates_direction=:right,
parse_tad = true, check=event_data)
The only required argument is data
. It is the tabular data source given as an actual table (for example using DataFrame) or a string that contains the path to something that can be parsed by CSV.jl.
The other arguments are optional (keyword arguments) and allow changing the column names from their default. The keywords id
, time
, covariates
, and observations
are used to tell what columns to parse as what type of information and covariates_direction
is used as in the Subject
constructor. The keywords evid
, time
, amt
, cmt
, addl
, ii
, rate
, ss
, and event_data
tells what columns to use for the different types of information we saw when we constructed DosageRegimen
s above. The keyword parse_tad
detects and parses time-after-dose time in the time
column, and check
is used to turn off the data checks mentioned below in PumasNDF
.
Subject
constructor for simobs
output
The simulated output of a PumasModel
as a result of simobs
is Pumas.SimulatedObservations
. Passing this simobs
output into Subject
will result in a Population
that is equivalent to the output of read_pumas
. This is a convenient feature that allows one to simulate data and turn it back into a Population
that can then be passed into a a fit
function. Below is an example from the introduction tutorial.
Given a simobs
call as below,
sims = simobs(
inf_2cmt_lin_turnover,
pop,
turnover_params,
obstimes = sd_obstimes)
it can be turned into a Population
as below
julia> Subject.(sims)
Population
Subjects: 10
Covariates:
Observables: dv, resp
PumasNDF
The PumasNDF is a specification for building a Population
(an alias for a vector of Subject
's') from tabular data. Generally this tabular data is given by a database like a CSV. The CSV has columns described as follows:
id
: the ID of the individual. Each individual should have a unique integer, or string.time
: the time corresponding to the row. Should be unique per id, i.e. no duplicate time values for a given subject.evid
: the event id.1
specifies a normal event.3
means it's a reset event, meaning that the value of the dynamical variable is reset to theamt
at the dosing event. If4
, then the dynamical value and time are reset, and then a final dose is given. Defaults to0
if amt is0
or missing, and 1 otherwise.amt
: the amount of the dose. If theevid
column exists and is non-zero, this value should be non-zero. Defaults to0
.ii
: the interdose interval. Whenaddl
is specified, this is the length of time to the next dose. For steady state events, this is the length of time between successive doses. Defaults to0
, and is required to be non-zero on rows where a steady-state event is specified.addl
: the number of additional doses of the same time to give. Defaults to 0.rate
: the rate of administration. If0
, then the dose is instantaneous. Otherwise the dose is administrated at a constant rate for a duration equal toamt/rate
. Arate=-2
allows therate
to be determined by Dose Control Parameters (DCP). Defaults to0
.ss
: an indicator for whether the dose is a steady state dose. A steady state dose is defined as the result of having applied the dose with the intervalii
infinitely many successive times.0
indicates that the dose is not a steady state dose.1
indicates that the dose is a steady state dose.2
indicates that it is a steady state dose that is added to the previous amount. The default is0
.cmt
: the compartment being dosed. Defaults to1
.duration
: the duration of administration. If0
, then the dose is instantaneous. Otherwise the dose is administered at a constant rate equal toamt/duration
. Defaults to0
.- Observation and covariate columns should be given as a time series of values of matching type. Constant covariates should be constant through the full column. Time points without a measurement should be denoted by a
.
.
If a column does not exists, its values are imputed to be the defaults. Special notes:
- If
rate
andduration
exist, then it is enforced thatamt=rate*duration
- All values and header names are interpreted as lower case.
Given the information above, it is important to understand how to read a dataset using the CSV.jl package. We recommend that all blanks (""
), .
's, NA
's and any other character elements in your dataset be passed to the missingstrings
keyword argument when reading the file as below
using CSV
data = CSV.File(joinpath("pathtomyfile", "mydata.csv"), DataFrame, missingstrings = ["", ".", "NA", "BQL"])
For more information check out the CSV.jl documentation
PumasNDF Checks
The read_pumas
function does some general checks on the provided data and informs the user about inconsistency in the data and throws an error in case of invalid data reporting the row number and column name causing the problem. This will allow the user to resolve the issue.
Following is the list of checks applied by read_pumas
function with examples.
Necessary columns in case of event and non-event data
When the event_data
is true
, the dataset must contain id, time, amt, and observations
columns.
In case of event_data = false
, only requirement is id
.
df = DataFrame(id = [1,1], time = [0,1], cmt = [1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex], event_data = true)
# output
[ Info: The input has keys: [:id, :time, :cmt, :dv, :age, :sex, :evid]
ERROR: PumasDataError: The input must have: `id, time, amt, and observations` when `event_data` is `tru
[...]
No evid
column but event_data
argument is set to true
When provided dataset doesn't have evid
column but event_data=true
is passed to read_pumas
function.
df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], cmt = [1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex], event_data = true)
# output
┌ Warning: Your dataset has dose event but it hasn't an evid column. We are adding 1 for dosing rows and 0 for others in evid column. If this is not the case, please add your evid column.
│
[...]
Non-numeric/string entries in an observation column
If there are non-numeric or string entries in an observation column, read_pumas
throws an error and reports row(s) and column(s) having this issue.
df = DataFrame(id = [1,1], time = [0,1], amt = [10,0],
cmt = [1,2], dv = [missing,"k@"],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations=[:dv], covariates=[:age, :sex])
# output
ERROR: PumasDataError: [Subject id: [1], row = [2], col = dv] We expect the dv column to be of numeric type.
These are the unique non-numeric values present in the column dv: ("k@",)
Non-numeric/string entries in amt
column
This check is similar to above.
df = DataFrame(id = [1,1], time = [0,1], amt = ["k8",0],
cmt = [1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: [1], row = [1], col = amt] We expect the amt column to be of numeric type.
These are the unique non-numeric values present in the column amt: ("k8",)
cmt
must be a positive integer or valid string/symbol for non-zero evid
data record
cmt
column should contain positive numbers or string/symbol identifiers to compartment being dosed.
df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], cmt = [-1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: 1, row = 1, col = cmt] cmt column should be positive
amt
can only be missing
or zero when evid = 0
df = DataFrame(id = [1,1], time = [0,1], amt = [10,5], cmt = [1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: 1, row = 2, col = evid] amt can only be missing or 0 when evid is 0
[...]
amt
can only be positive or zero when evid = 1
df = DataFrame(id = [1,1], time = [0,1], amt = [-10,0],
cmt = [1,2], evid = [1,0], dv = [10,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: 1, row = 1, col = evid] amt can only be positive or zero when evid is 1
[...]
Observation (dv
) at the time of dose
Observation should be missing
at the time of dose (or when amt
> 0)
df = DataFrame(id = [1,1], time = [0,1], amt = [10,0],
cmt = [1,2], evid = [1,0], dv = [10,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: 1, row = 1, col = dv] an observation is present at the time of dose in column dv. A blank record (`missing`) is required at time of dosing, i.e. when `amt` is positive.
Steady-state column (ss
) requires ii
column
df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], ss = [1, 0],
cmt = [1,2], dv = [missing,8], age = [45,45],
sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: your dataset does not have ii which is a required column for steady state dosing.
Steady-state dosing requires ii
> 0
Incase of steady-state dosing the value of the interval column ii
must be non-zero
If rate
column is not provided it is assumed to be zero.
df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], ss = [1, 0], ii = [0,0],
cmt = [1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: 1, row = 1, col = ii] for steady-state dosing the value of the interval column ii must be non-zero but was 0
Steady-state infusion requires ii
= 0
Incase of steady-state infusion the value of the interval column ii
must be zero
df = DataFrame(id = [1,1], time = [0,1], amt = [0,0], ss = [1, 0], rate = [2, 0], ii = [1, 0],
cmt = [1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: 1, row = 1, col = ii] for steady-state infusion the value of the interval column ii must be zero but was 1
Steady-state infusion requires addl
= 0
Incase of steady-state infusion the value of the additional dose column addl
must be zero
df = DataFrame(id = [1,1], time = [0,1], amt = [0,0], ss = [1, 0], rate = [2, 0], ii = [0, 0],
addl = [5, 0], cmt = [1,2], dv = [missing,8],
age = [45,45], sex = ["M","M"], evid = [1,0])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: 1, row = 1, col = addl] for steady-state infusion the value of the additional dose column addl must be zero but was 5
addl
column is present but ii
is not
df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], addl = [5,0],
cmt = [1,2], evid = [1,0], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: your dataset does not have ii which is a required column when addl is specified.
ii
must be positive for addl > 0
df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], addl = [5,0], ii = [0,0],
cmt = ["Depot","Central"], evid = [1,0], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: 1, row = 1, col = ii] ii must be positive for addl > 0
addl
must be positive for ii > 0
df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], addl = [0,0], ii = [12,0],
cmt = ["Depot","Central"], evid = [1,0], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: 1, row = 1, col = addl] addl must be positive for ii > 0
ii
can only be missing or zero when evid = 0
df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], addl = [5,2], ii = [12,4],
cmt = ["Depot","Central"], evid = [1,0], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: 1, row = 2, col = evid] ii can only be missing or zero when evid is zero
addl
can only be positive or zero when evid = 1
df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], addl = [-10,0], ii = [12,0],
cmt = ["Depot","Central"], evid = [1,0], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: 1, row = 1, col = evid] addl can only be positive or zero when evid is one
evid
must be nonzero when amt
> 0 or addl
and ii
are positive
When amt
is positive, evid
must be non-zero as evid = 0
indicates an observation record.
df = DataFrame(id = [1,1], time = [0,1], amt = [10,0], addl = [5,0], ii = [12,0],
cmt = ["Depot","Central"], evid = [0,0], dv = [missing,8],
age = [45,45], sex = ["M","M"])
read_pumas(df, observations = [:dv], covariates = [:age, :sex])
# output
ERROR: PumasDataError: [Subject id: 1, row = 1, col = evid] amt can only be missing or 0 when evid is 0