Data Format and parsing for NCA

Once a source data is read into Pumas-NCA, the next step is to ensure that the correctness of the data for NCA analysis. The correctness check requires that the data is presented to Pumas-NCA in a specific format. This format is called the Pumas-NCA data format - PumasNCADF which is discussed next.

Pumas-NCA data format - PumasNCADF

PumasNCADF is a standardized format for tabular source data that is required for Pumas-NCA. A comprehensive list of the requirements is listed below, although, not all the requirements below are required for a given analysis.

id: The numeric or string id of the subject.
time: The time at which the observations were measured.
observations: The observation (e.g. concentration) time series measurements. Values must be numbers or missing.
amt: The amount of a dose. Must be the dosing amount at each dosing time and otherwise missing.
route: The route of administration. Possible choices are iv for intravenous, ev for extravascular, and inf for infusion. These can be specified as lower, upper or mixed case. E.g. iv, IV or Ev are accepted.
duration: The infusion duration. Should be the duration value or missing.
blq: Below the lower Limit of Quantification (BLQ). Used to specify the observation is BLQ. The BLQ column can take a value of 1 for BLQ observation and 0 otherwise. Handling Missing and BLQ Data is described in a separate section.
ii: The interdose interval, equivalent to tau. Used to specify the interval length for steady-state dosing. Defaults to the :ii column. If specified, and ss is true, then analysis returns steady-state parameters e.g., cminss, cavgss, cmaxss by computing the accumulationindex.
ss: The steady-state. Used to specify whether a dose is steady-state, a steady-state dose takes the value 1 and 0 otherwise. It defaults to the :ss column. If ss is set to 1 for a subject, ii should be greater than 0.

For urine analysis, the format has the follows columns:

id: The numeric or string id of the subject.
observations: The observations e.g. (urine concentration) time series measurements.
volume: Collected urine volume.
start_time: The beginning of the urine collection time.
end_time: The end of the urine collection time.
amt: The amount of a dose. Must be the dosing amount at each dosing time, and otherwise missing.
route: The route of administration. Possible choices are iv for intravenous, ev for extravascular, and inf for infusion. These can be specified as lower, upper or mixed case. E.g. iv, IV or Ev are accepted.
duration: The infusion duration. Should be the duration value or missing.
blq: Below the lower Limit of Quantification (BLQ). Used to specify the observation that is BLQ, BLQ column takes the value 1 and 0 otherwise. Handling BLQ Data is described in a separate section.
ii: The interdose interval. Used to specify the interval length for steady state dosing. Defaults to the :ii column.
ss: The steady-state. Used to specify whether a dose is steady-state, a steady-state dose takes the value 1 and 0 otherwise. Defaults to the :ss column. If ss is set to 1 for a subject, ii should be greater than 0.

Any additional columns in the data may be chosen for grouping the output, as discussed in the reference to group keyword argument below.

read_nca

The parsing function for the PumasNCADF is read_nca which has the following signature:

read_nca(df;
        id                = :id,
        time              = :time,
        observations      = :conc,
        start_time        = :start_time,
        end_time          = :end_time,
        volume            = :volume,
        amt               = :amt,
        route             = :route,
        duration          = :duration,
        blq               = :blq,
        ii                = :ii,
        ss                = :ss,
        group             = nothing,
        concu             = true,
        timeu             = true,
        amtu              = true,
        volumeu           = true,
        verbose           = true,
        lambdazidxs       = nothing,
        lambdazslopetimes = nothing,
        kwargs...)

These arguments are:

df: The required positional argument. This is either a string which is the path to a CSV file, or a DataFrame of tabular data for use in the NCA.

In case column names are distinct from the default names discussed above, id, time, observations, start_time, end_time, volume, amt, route, duration, blq, ii and ss keyword arguments can be used to pass in the corresponding column's name.

In addition to the keyword arguments for column names, the following are supported as well:

group: The columns to group the data by, splits the subjects based on the group information associated with them. Defaults to no grouping.
llq: The Lower Limit of Quantification (LLQ). Defaults to nothing.
concblq: The scheme for handling of BLQ values. Defaults to the dictionary Dict(:first=>:keep, :middle=>:drop, :last=>:keep), further explanation is available in the Handling BLQ Data section.
concu: The units for observations (e.g. concentration). Defaults to no units.
amtu: The units for dosing amount. Defaults to no units.
timeu: The units for time. Defaults to no units.
volumeu: The units for volume. Defaults to no units.
verbose: When true, warnings will be thrown when the output does not match PumasNCADF. Defaults to true.
lambdazidxs: For specifying the time points to use in the lambdaz calculation by passing the indices of time points per subject in an array, hence it needs an array of arrays. If you want to skip a subject the corresponding element should be nothing.
lambdazslopetimes: Similar to lambdazidxs but takes the actual time points instead of the index.

Examples

The examples below provide various patterns of using read_nca. In addition to showcasing correct usage, we also showcase the expected errors when the function is used incorrectly. This can serve as a quick reference in the event a user faces an error.

Standard dataframe with no errors

julia> df1 = DataFrame(id = [1,1,1,1,1,2,2,2,2,2],
                              time = [0,1,2,3,4,0,1,2,3,4],
                              amt=[10,0,0,0,0,10,0,0,0,0],
                              conc=[missing,8,6,4,2,missing,8,6,4,2],
                              route = ["iv","iv","iv","iv","iv","iv","iv","iv","iv","iv"])10×5 DataFrame
 Row │ id     time   amt    conc     route
     │ Int64  Int64  Int64  Int64?   String
─────┼──────────────────────────────────────
   1 │     1      0     10  missing  iv
   2 │     1      1      0        8  iv
   3 │     1      2      0        6  iv
   4 │     1      3      0        4  iv
   5 │     1      4      0        2  iv
   6 │     2      0     10  missing  iv
   7 │     2      1      0        8  iv
   8 │     2      2      0        6  iv
   9 │     2      3      0        4  iv
  10 │     2      4      0        2  iv

julia> df1_r = read_nca(df1)NCAPopulation (2 subjects):
Number of missing observations: 2
Number of blq observations: 0

We can make use of the other keyword arguments for more control over the creation of the population with read_nca, let's pass units from Unitful.jl for concentration and time with concu and timeu and also specify the slopetimes for lambdaz with lambdazslopetimes.

julia> using UnitfulERROR: ArgumentError: Package Unitful not found in current path:
- Run `import Pkg; Pkg.add("Unitful")` to install the Unitful package.
julia> time_unit = u"hr"ERROR: LoadError: UndefVarError: @u_str not defined
in expression starting at REPL[2]:1
julia> concentration_unit = u"mg/L"ERROR: LoadError: UndefVarError: @u_str not defined
in expression starting at REPL[3]:1
julia> custom_slopetimes = [[3,4], [2,3]]2-element Vector{Vector{Int64}}:
 [3, 4]
 [2, 3]
julia> df3_r = read_nca(df3, concu = concentration_unit, timeu = time_unit, lambdazslopetimes = custom_slopetimes)ERROR: UndefVarError: concentration_unit not defined

One can also pass the indices of the time points instead of the actual time values as below, with the lambdazidxs keyword argument.

julia> custom_slopetimes_idxs = [[4:5], [3:4]]2-element Vector{Vector{UnitRange{Int64}}}:
 [4:5]
 [3:4]
julia> df3_r = read_nca(df3, concu = concentration_unit, timeu = time_unit, lambdazidxs = custom_slopetimes_idxs)ERROR: UndefVarError: concentration_unit not defined

Missing required column `route`

julia> df2 = DataFrame(id = [1,1,1,1,1,2,2,2,2,2],
                              time = [0,1,2,3,4,0,1,2,3,4],
                              amt=[10,0,0,0,0,10,0,0,0,0],
                              conc=[missing,8,6,4,2,missing,8,6,4,2])10×4 DataFrame
 Row │ id     time   amt    conc
     │ Int64  Int64  Int64  Int64?
─────┼──────────────────────────────
   1 │     1      0     10  missing
   2 │     1      1      0        8
   3 │     1      2      0        6
   4 │     1      3      0        4
   5 │     1      4      0        2
   6 │     2      0     10  missing
   7 │     2      1      0        8
   8 │     2      2      0        6
   9 │     2      3      0        4
  10 │     2      4      0        2

julia> df2_r = read_nca(df2, observations = :conc)┌ Warning: No dosage information has passed. If the dataset has dosage information, you can pass the column names by `amt=:amt, route=:route`.
└ @ NCA /builds/PumasAI/PumasDocs-jl/.julia/packages/NCA/9RagW/src/data_parsing.jl:78
┌ Warning: Dosage information requires the presence of both amt & route information. Looks like you only entered the amt and not the route. If your dataset does not have route, please add a column that specifies the route of administration and then pass both columns as `amt=:amt, route=:route.`
└ @ NCA /builds/PumasAI/PumasDocs-jl/.julia/packages/NCA/9RagW/src/data_parsing.jl:80
NCAPopulation (2 subjects):
Number of missing observations: 2
Number of blq observations: 0

The warning message above is verbose and educates the users on the consequence of the not passing in the route column, and also, how to pass it in if missing from the source data.

`amt` can be `missing` at time of observations

julia> df3 = DataFrame(id = [1,1,1,1,1,2,2,2,2,2],
                              time = [0,1,2,3,4,0,1,2,3,4],
                              amt=[10,missing,missing,missing,missing,10,missing,missing,missing,missing],
                              conc=[missing,8,6,4,2,missing,8,6,4,2],
                              route = ["iv","iv","iv","iv","iv","iv","iv","iv","iv","iv"])10×5 DataFrame
 Row │ id     time   amt      conc     route
     │ Int64  Int64  Int64?   Int64?   String
─────┼────────────────────────────────────────
   1 │     1      0       10  missing  iv
   2 │     1      1  missing        8  iv
   3 │     1      2  missing        6  iv
   4 │     1      3  missing        4  iv
   5 │     1      4  missing        2  iv
   6 │     2      0       10  missing  iv
   7 │     2      1  missing        8  iv
   8 │     2      2  missing        6  iv
   9 │     2      3  missing        4  iv
  10 │     2      4  missing        2  iv

julia> df3_r = read_nca(df3, observations = :conc)NCAPopulation (2 subjects):
Number of missing observations: 2
Number of blq observations: 0

String (non-numeric) observations

julia> df4 = DataFrame(id = [1,1,1,1,1,2,2,2,2,2],
                              time = [0,1,2,3,4,0,1,2,3,4],
                              amt=[10,missing,missing,missing,missing,10,missing,missing,missing,missing],
                              conc=[missing,8,6,4,"<LOQ",missing,8,6,4,2],
                              route = ["iv","iv","iv","iv","iv","iv","iv","iv","iv","iv"])10×5 DataFrame
 Row │ id     time   amt      conc     route
     │ Int64  Int64  Int64?   Any      String
─────┼────────────────────────────────────────
   1 │     1      0       10  missing  iv
   2 │     1      1  missing  8        iv
   3 │     1      2  missing  6        iv
   4 │     1      3  missing  4        iv
   5 │     1      4  missing  <LOQ     iv
   6 │     2      0       10  missing  iv
   7 │     2      1  missing  8        iv
   8 │     2      2  missing  6        iv
   9 │     2      3  missing  4        iv
  10 │     2      4  missing  2        iv

julia> df4_r = read_nca(df4, observations = :conc)ERROR: ArgumentError: conc has non-numeric values at index=[5]. We expect the names column to be of numeric type. Please fix your input data before proceeding further.

observations column can only be numeric. The above error message will be noticed if the column has a string element, in this example <LOQ. The way to circumvent this error is to specify the missingstrings keyword in CSV.read. e.g. CSV.read("./pkdata", DataFrame, missingstrings=["<LOQ"]). In this way, all string elements match that text will be converted to missing.

The above method of setting missingstrings runs the risk that one may not be able to keep tally of which observations were missing due to what reason. In that case, it is recommended to take a two step approach, such as below.

df4.isBLQ .= ifelse.(df4.conc == "<LOQ", 1, 0)
df4.conc .= parse.(Float64, df4.conc)

Here the data was pre-processed and an extra column isBLQ was set to keep a tally of the <LOQ value. Then, the observations column, conc was converted to a numeric column.

Another way of handling this would be to

read the data in without missingstrings in the CSV.read function.
create a isBQL column as before df4.isBLQ .= ifelse.(df4.conc == "<LOQ", 1, 0)
convert the observations from Any to numeric by df.conc.= parse.(Float64, df4)

`amt` column can only be numeric

julia> df5 = DataFrame(id = [1,1,1,1,1,2,2,2,2,2],
                              time = [0,1,2,3,4,0,1,2,3,4],
                              amt=["10",missing,missing,missing,missing,"10",missing,missing,missing,missing],
                              conc=[missing,8,6,4,2,missing,8,6,4,2],
                              route = ["iv","iv","iv","iv","iv","iv","iv","iv","iv","iv"])10×5 DataFrame
 Row │ id     time   amt      conc     route
     │ Int64  Int64  String?  Int64?   String
─────┼────────────────────────────────────────
   1 │     1      0  10       missing  iv
   2 │     1      1  missing        8  iv
   3 │     1      2  missing        6  iv
   4 │     1      3  missing        4  iv
   5 │     1      4  missing        2  iv
   6 │     2      0  10       missing  iv
   7 │     2      1  missing        8  iv
   8 │     2      2  missing        6  iv
   9 │     2      3  missing        4  iv
  10 │     2      4  missing        2  iv

julia> df5_r = read_nca(df5, observations = :conc)ERROR: ArgumentError: amt has non-numeric values at index=[1, 6]. We expect the names column to be of numeric type. Please fix your input data before proceeding further.

Concentration at dosing rows are not ignored

julia> df6 = DataFrame(id = [1,1,1,1,1,2,2,2,2,2],
                              time = [0,1,2,3,4,0,1,2,3,4],
                              amt=[10,missing,missing,missing,missing,10,missing,missing,missing,missing],
                              conc=[10,8,6,4,2,10,8,6,4,2],
                              route = ["iv","iv","iv","iv","iv","iv","iv","iv","iv","iv"])10×5 DataFrame
 Row │ id     time   amt      conc   route
     │ Int64  Int64  Int64?   Int64  String
─────┼──────────────────────────────────────
   1 │     1      0       10     10  iv
   2 │     1      1  missing      8  iv
   3 │     1      2  missing      6  iv
   4 │     1      3  missing      4  iv
   5 │     1      4  missing      2  iv
   6 │     2      0       10     10  iv
   7 │     2      1  missing      8  iv
   8 │     2      2  missing      6  iv
   9 │     2      3  missing      4  iv
  10 │     2      4  missing      2  iv

julia> df6_r = read_nca(df6, observations = :conc)NCAPopulation (2 subjects):
Number of missing observations: 0
Number of blq observations: 0

The example below emphasizes the fact that concentrations in dose rows are not ignored. We can compare this with df1_r in the example Standard dataframe with no errors. The computed auc's between the two are different as can be seen below.

julia> NCA.auc(df1_r)2×2 DataFrame
 Row │ id     auc
     │ Int64  Float64
─────┼────────────────
   1 │     1  27.9743
   2 │     2  27.9743

julia> NCA.auc(df6_r)2×2 DataFrame
 Row │ id     auc
     │ Int64  Float64
─────┼────────────────
   1 │     1   27.641
   2 │     2   27.641

`route` can either be upper or lowercase or mixedcase `ev`, `iv` or `inf`

julia> df7 = DataFrame(id = [1,1,1,1,1,2,2,2,2,2],
                              time = [0,1,2,3,4,0,1,2,3,4],
                              amt=[10,missing,missing,missing,missing,10,missing,missing,missing,missing],
                              conc=[10,8,6,4,2,10,8,6,4,2],
                              route = ["Iv","Iv","Iv","Iv","Iv","IV","IV","IV","IV","IV"])10×5 DataFrame
 Row │ id     time   amt      conc   route
     │ Int64  Int64  Int64?   Int64  String
─────┼──────────────────────────────────────
   1 │     1      0       10     10  Iv
   2 │     1      1  missing      8  Iv
   3 │     1      2  missing      6  Iv
   4 │     1      3  missing      4  Iv
   5 │     1      4  missing      2  Iv
   6 │     2      0       10     10  IV
   7 │     2      1  missing      8  IV
   8 │     2      2  missing      6  IV
   9 │     2      3  missing      4  IV
  10 │     2      4  missing      2  IV

julia> df7_r = read_nca(df7, observations = :conc)NCAPopulation (2 subjects):
Number of missing observations: 0
Number of blq observations: 0

julia> df7_r[1].doseNCADose:
  time:         0
  amt:          10
  duration:     0
  route:        IVBolus
  ss:           false

While we accommodate mixed case, it is recommended for consistency that users provide route information in the same consistent case, preferably lower case.

Non-monotonic time is not allowed within an individual

julia> df8 = DataFrame(id = [1,1,1,1,1,2,2,2,2,2],
                              time = [0,1,2,3,3,0,1,2,3,3],
                              amt=[10,missing,missing,missing,missing,10,missing,missing,missing,missing],
                              conc=[10,8,6,4,2,10,8,6,4,2],
                              route = ["iv","iv","iv","iv","iv","iv","iv","iv","iv","iv"])10×5 DataFrame
 Row │ id     time   amt      conc   route
     │ Int64  Int64  Int64?   Int64  String
─────┼──────────────────────────────────────
   1 │     1      0       10     10  iv
   2 │     1      1  missing      8  iv
   3 │     1      2  missing      6  iv
   4 │     1      3  missing      4  iv
   5 │     1      3  missing      2  iv
   6 │     2      0       10     10  iv
   7 │     2      1  missing      8  iv
   8 │     2      2  missing      6  iv
   9 │     2      3  missing      4  iv
  10 │     2      3  missing      2  iv

julia> df8_r = read_nca(df8, observations = :conc)[ Info: ID 1 errored
ERROR: ArgumentError: Time must be monotonically increasing. Errored at `time=3` (index 4)

Users have to ensure that time is monotonically increasing within a subject, unless there is a grouping variable that is specified.

Missing Time is not allowed

julia> df9 = DataFrame(id = [1,1,1,1,1,2,2,2,2,2],
                              time = [0,1,2,3,missing,0,1,2,3,missing],
                              amt=[10,missing,missing,missing,missing,10,missing,missing,missing,missing],
                              conc=[10,8,6,4,2,10,8,6,4,2],
                              route = ["iv","iv","iv","iv","iv","iv","iv","iv","iv","iv"])10×5 DataFrame
 Row │ id     time     amt      conc   route
     │ Int64  Int64?   Int64?   Int64  String
─────┼────────────────────────────────────────
   1 │     1        0       10     10  iv
   2 │     1        1  missing      8  iv
   3 │     1        2  missing      6  iv
   4 │     1        3  missing      4  iv
   5 │     1  missing  missing      2  iv
   6 │     2        0       10     10  iv
   7 │     2        1  missing      8  iv
   8 │     2        2  missing      6  iv
   9 │     2        3  missing      4  iv
  10 │     2  missing  missing      2  iv

julia> df9_r = read_nca(df9, observations = :conc)[ Info: ID 1 errored
ERROR: ArgumentError: Time may not be missing (missing occured at index 5)

Multiple dose within a subject requires contiguous time

julia> df10 = DataFrame(id = [1,1,1,1,1,1,1,1,1,1],
                              time = [0,1,2,3,4,5,6,7,8,9],
                              amt=[10,0,0,0,0,10,0,0,0,0],
                              conc=[missing,8,6,4,2,missing,8,6,4,2],
                              route = ["iv","iv","iv","iv","iv","iv","iv","iv","iv","iv"],
                              iii = [5,0,0,0,0,5,0,0,0,0])10×6 DataFrame
 Row │ id     time   amt    conc     route   iii
     │ Int64  Int64  Int64  Int64?   String  Int64
─────┼─────────────────────────────────────────────
   1 │     1      0     10  missing  iv          5
   2 │     1      1      0        8  iv          0
   3 │     1      2      0        6  iv          0
   4 │     1      3      0        4  iv          0
   5 │     1      4      0        2  iv          0
   6 │     1      5     10  missing  iv          5
   7 │     1      6      0        8  iv          0
   8 │     1      7      0        6  iv          0
   9 │     1      8      0        4  iv          0
  10 │     1      9      0        2  iv          0

julia> df10_r = read_nca(df10, observations = :conc)NCAPopulation (1 subjects):
Number of missing observations: 1
Number of blq observations: 0

julia> df10_r[1].dose2-element Vector{NCADose{Int64, Int64}}:
 NCADose:
  time:         0
  amt:          10
  duration:     0
  route:        IVBolus
  ss:           false
 NCADose:
  time:         5
  amt:          10
  duration:     0
  route:        IVBolus
  ss:           false

Multiple dose with `ii` specified allows computation of steady-state values

julia> df11 = DataFrame(id = [1,1,1,1,1,1,1,1,1,1],
                              time = [0,1,2,3,4,5,6,7,8,9],
                              amt=[10,0,0,0,0,10,0,0,0,0],
                              conc=[missing,8,6,4,2,missing,8,6,4,2],
                              route = ["iv","iv","iv","iv","iv","iv","iv","iv","iv","iv"],
                              iii = [5,0,0,0,0,5,0,0,0,0])10×6 DataFrame
 Row │ id     time   amt    conc     route   iii
     │ Int64  Int64  Int64  Int64?   String  Int64
─────┼─────────────────────────────────────────────
   1 │     1      0     10  missing  iv          5
   2 │     1      1      0        8  iv          0
   3 │     1      2      0        6  iv          0
   4 │     1      3      0        4  iv          0
   5 │     1      4      0        2  iv          0
   6 │     1      5     10  missing  iv          5
   7 │     1      6      0        8  iv          0
   8 │     1      7      0        6  iv          0
   9 │     1      8      0        4  iv          0
  10 │     1      9      0        2  iv          0

julia> df11_r = read_nca(df11, observations=:conc, ii=:iii)NCAPopulation (1 subjects):
Number of missing observations: 1
Number of blq observations: 0

julia> df11_r[1].dose2-element Vector{NCADose{Int64, Int64}}:
 NCADose:
  time:         0
  amt:          10
  duration:     0
  route:        IVBolus
  ss:           false
 NCADose:
  time:         5
  amt:          10
  duration:     0
  route:        IVBolus
  ss:           false

As you can see, the result of df10_r and df11_r are identical, even though the later accepts the ii argument mapped to iii from the dataset. ii specifies the tau, or the dosing frequency. This information allows Pumas-NCA to compute steady-state parameters, cmaxss, cminss, cavgss, accumuluationindex, tau. We can confirm this by looking at the differences between the two. df10_r that has no ii information cannot compute the accumulationindex whereas df11_r can.

julia> NCA.accumulationindex(df10_r)2×2 DataFrame
 Row │ id     accumulationindex
     │ Int64  Missing
─────┼──────────────────────────
   1 │     1            missing
   2 │     1            missing

julia> NCA.accumulationindex(df11_r)2×2 DataFrame
 Row │ id     accumulationindex
     │ Int64  Float64
─────┼──────────────────────────
   1 │     1            1.06855
   2 │     1            1.06855

Subjects with dosing record only and no observations will result in missing results

julia> df12  = DataFrame(id = 1, time = 0, amt=10, conc=missing, route="iv")1×5 DataFrame
 Row │ id     time   amt    conc     route
     │ Int64  Int64  Int64  Missing  String
─────┼──────────────────────────────────────
   1 │     1      0     10  missing  iv

julia> df12_r = read_nca(df12, observations=:conc)┌ Warning: Subject 1: All concentration data is missing between times 0 and 0
└ @ NCA /builds/PumasAI/PumasDocs-jl/.julia/packages/NCA/9RagW/src/utils.jl:54
NCAPopulation (1 subjects):
Number of missing observations: 1
Number of blq observations: 0

julia> NCA.auc(df12_r)1×2 DataFrame
 Row │ id     auc
     │ Int64  Missing
─────┼────────────────
   1 │     1  missing

Multiple dosing - all provided doses should have corresponding observations vectors

In the example below, for subject 1,the first dose has observations, but the second dose at time 5 has no associated observations and hence results in the error.

julia> df13 = DataFrame(id = [1,1,1,1,1,1],
                              time = [0,1,2,3,4,5],
                              amt=[10,0,0,0,0,10],
                              conc=[missing,8,6,4,2,missing],
                              route = ["iv","iv","iv","iv","iv","iv"])6×5 DataFrame
 Row │ id     time   amt    conc     route
     │ Int64  Int64  Int64  Int64?   String
─────┼──────────────────────────────────────
   1 │     1      0     10  missing  iv
   2 │     1      1      0        8  iv
   3 │     1      2      0        6  iv
   4 │     1      3      0        4  iv
   5 │     1      4      0        2  iv
   6 │     1      5     10  missing  iv

julia> df13_r = read_nca(df13,   observations=:conc)┌ Warning: Subject 1: All concentration data is missing between times 5 and 5
└ @ NCA /builds/PumasAI/PumasDocs-jl/.julia/packages/NCA/9RagW/src/utils.jl:54
NCAPopulation (1 subjects):
Number of missing observations: 1
Number of blq observations: 0

steady-state flag `ss` requires `ii>0`

At the moment, ii and ss work interchangeable for the computation of steady state parameters. The rules are as follows:

When ii is specified, ss is not required, and the information of tau from ii is used to compute parameters specific to multiple dose.
When ss is specified, ii is required as most steady-state parameters require tau as information.
When ii or ss are not specified for multiple dose data, none of the steady-state parameters are computed.

julia> df14 = DataFrame(id = [1,1,1,1,1,1,1,1,1,1],
                              time = [0,1,2,3,4,5,6,7,8,9],
                              amt=[10,0,0,0,0,10,0,0,0,0],
                              conc=[missing,8,6,4,2,missing,8,6,4,2],
                              route = ["iv","iv","iv","iv","iv","iv","iv","iv","iv","iv"],
                              iii = [5,0,0,0,0,5,0,0,0,0],
                              sss = [1,0,0,0,0,1,0,0,0,0])10×7 DataFrame
 Row │ id     time   amt    conc     route   iii    sss
     │ Int64  Int64  Int64  Int64?   String  Int64  Int64
─────┼────────────────────────────────────────────────────
   1 │     1      0     10  missing  iv          5      1
   2 │     1      1      0        8  iv          0      0
   3 │     1      2      0        6  iv          0      0
   4 │     1      3      0        4  iv          0      0
   5 │     1      4      0        2  iv          0      0
   6 │     1      5     10  missing  iv          5      1
   7 │     1      6      0        8  iv          0      0
   8 │     1      7      0        6  iv          0      0
   9 │     1      8      0        4  iv          0      0
  10 │     1      9      0        2  iv          0      0

julia> df14_r1 = read_nca(df14, observations=:conc)NCAPopulation (1 subjects):
Number of missing observations: 1
Number of blq observations: 0

julia> NCA.accumulationindex(df14_r1)2×2 DataFrame
 Row │ id     accumulationindex
     │ Int64  Missing
─────┼──────────────────────────
   1 │     1            missing
   2 │     1            missing

julia> df14_r2 = read_nca(df14, observations=:conc, ss=:sss, ii = :iii)NCAPopulation (1 subjects):
Number of missing observations: 1
Number of blq observations: 0

julia> NCA.accumulationindex(df14_r2)2×2 DataFrame
 Row │ id     accumulationindex
     │ Int64  Float64
─────┼──────────────────────────
   1 │     1            1.06855
   2 │     1            1.06855

julia> df14_r3 = read_nca(df14, observations=:conc, ii=:iii)NCAPopulation (1 subjects):
Number of missing observations: 1
Number of blq observations: 0

julia> NCA.accumulationindex(df14_r3)2×2 DataFrame
 Row │ id     accumulationindex
     │ Int64  Float64
─────┼──────────────────────────
   1 │     1            1.06855
   2 │     1            1.06855

julia> df14_r4 = read_nca(df14, observations=:conc, ss=:sss, ii=:iii)NCAPopulation (1 subjects):
Number of missing observations: 1
Number of blq observations: 0

julia> NCA.accumulationindex(df14_r4)2×2 DataFrame
 Row │ id     accumulationindex
     │ Int64  Float64
─────┼──────────────────────────
   1 │     1            1.06855
   2 │     1            1.06855

Specification of Groups

NCA is always done on a per-NCASubject, per-doseevent basis.
Grouping during an NCA analysis can be at a

NCASubject level - e.g. After a single dose, subject has observations of parent and metabolite, so grouping happens at the analyte level; after multiple dose (single dose every day), subject has measurements every day, so grouping happens per day; subject has received single ascending dose, so group is per dose
NCAPopulation level - e.g. The study population is divided into multiple dose groups, so grouping is done by dose; some subject receive tablets and some subjects receive capsules, so grouping is done by formulation.

Groups specified in read_nca via the group argument get carried forward into the the result data frame, whether a complete report or the result of a single function.
At a NCASubject level, specifying group allows Pumas-NCA to breakdown the subject's profile into multiple groups that ensures that Non-monotonic time is not allowed within an individual requirement is respected.
At a NCAPopulation level, specifying group provides a convenient way to to carry that variable forward into the result data frame.
More than one group can be passed in via the group argument using the array of symbols syntax, e.g. group = [:dose, :day]

Multiple observation support

As of v2.0, Pumas-NCA accepts only one observations at a time

The example below emphasizes the grouping at the NCAPopulation level

julia> df17 = DataFrame(id = [1,1,1,1,1,2,2,2,2,2],
                              time = [0,1,2,3,4,0,1,2,3,4],
                              amt=[10,0,0,0,0,10,0,0,0,0],
                              conc=[missing,8,6,4,6,missing,8,6,4,2],
                              route = ["iv","iv","iv","iv","iv","iv","iv","iv","iv","iv"],
                              formulation =   ["T","T", "T","T","T","R", "R", "R", "R", "R"])10×6 DataFrame
 Row │ id     time   amt    conc     route   formulation
     │ Int64  Int64  Int64  Int64?   String  String
─────┼───────────────────────────────────────────────────
   1 │     1      0     10  missing  iv      T
   2 │     1      1      0        8  iv      T
   3 │     1      2      0        6  iv      T
   4 │     1      3      0        4  iv      T
   5 │     1      4      0        6  iv      T
   6 │     2      0     10  missing  iv      R
   7 │     2      1      0        8  iv      R
   8 │     2      2      0        6  iv      R
   9 │     2      3      0        4  iv      R
  10 │     2      4      0        2  iv      R

julia> df17_r = read_nca(df17, observations=:conc, group = [:formulation])NCAPopulation (2 subjects):
  Group: ["formulation" => "R", "formulation" => "T"]
Number of missing observations: 2
Number of blq observations: 0