Data Format and parsing for NCA

Once a source data is read into Pumas NCA package, the next step is to ensure that the correctness of the data for NCA analysis. The correctness check requires that the data is presented to Pumas NCA package in a specific format. This format is called the Pumas-NCA data format - PumasNCADF which is discussed next.

Pumas-NCA data format - PumasNCADF

PumasNCADF is a standardized format for tabular source data that is required for Pumas NCA analyses. A comprehensive list of the requirements is listed below as the docstring of the read_nca function:

NCA.read_nca — Function

read_nca(file::AbstractString; kwargs...)
read_nca(df_obs::AbstractDataFrame, df_dose::AbstractDataFrame; id = :id, time = :time, kwargs...)
read_nca(df::DataFrame; id=:id, time=:time, observations=:conc, nominal_time = :nominal_time,
                        start_time=:start_time, end_time=:end_time, volume=:volume,
                        amt=:amt, route=:route, duration=:duration, blq=:blq,
                        ii=:ii, ss=:ss, group=nothing, concu=true, timeu=true, amtu=true, volumeu=true,
                        verbose=true, sparse = false, kwargs...)

Parse a DataFrame object or a CSV file to NCAPopulation. NCAPopulation holds an array of NCASubjects which contain relevant data for the individual subjects.

Remark

Concentrations at dosing rows are NOT ignored in read_nca.

df : DataFrame containing the data for the analysis.

2 dataframes, in order, observations dataframe and dosing dataframe, can be passed to read_nca as well, rest of the arguments stay consistent in this case.

The following keyword arguments are used to specify column names in the df:

id : The numeric or string id of the subject. Defaults to :id.
time : The actual time at which the observations were measured. Defaults to :time.
observations: The observation (e.g. concentration) time series measurements. Values must be numbers or missing. Defaults to :conc.
amt: The amount of a dose. Can either be the dosing amount at each dosing time and otherwise missing or the dosing amount is present at each time, in this case the first time (for a subject in a subgroup) is considered as the dosing time. Defaults to :amt.
route: The route of administration. Possible choices are iv for intravenous, ev for extravascular, and inf for infusion. These can be specified as lower, upper or mixed case. E.g. iv, IV or Ev are accepted. Defaults to :route.
duration: The infusion duration. Should be the duration value or missing. Defaults to :duration.
blq: Below the lower Limit of Quantification (BLQ). Used to specify the observation is BLQ. The BLQ column can take a value of 1 for BLQ observation and 0 otherwise. Defaults to :blq.
ii: The interdose interval, equivalent to tau. Used to specify the interval length for steady-state dosing. Defaults to the :ii column. If specified, and ss is true, then analysis returns steady-state parameters e.g., cminss, cavgss, cmaxss by computing the accumulationindex.
ss: The steady-state. Used to specify whether a dose is steady-state, a steady-state dose takes the value 1 and 0 otherwise. It defaults to the :ss column. If ss is set to 1 for a subject, ii should be greater than 0.
group: The columns to group the data by, splits the subjects based on the group information associated with them. Defaults to no grouping.
llq: The Lower Limit of Quantification (LLQ). Defaults to nothing.
concblq: The scheme for handling of BLQ values. Defaults to the dictionary Dict(:first=>:keep, :middle=>:drop, :last=>:keep), further explanation is available in the Handling BLQ Data section.
concu: The units for observations (e.g. concentration). Defaults to no units.
amtu: The units for dosing amount. Defaults to no units.
timeu: The units for time. Defaults to no units.
volumeu: The units for volume. Defaults to no units.
verbose: When true, warnings will be thrown when the output does not match PumasNCADF. Defaults to true.
nominal_time: The nominal time corresponding to the observations. Defaults to :nominal_time.
sparse: Boolean flag to indicate if the dataset should be treated as a case of sparse sampling. Defaults to false.

Urine analysis requires the following columns not used in case of plasma.

start_time : The beginning of the urine collection time. Defaults to :start_time.
end_time : The end of the urine collection time. Defaults to :end_time.
volume: Collected urine volume. Defaults to :volume.

For details about the handling of concentration values below the lower limit of quantification, please check out the documentation of NCA.cleanblq. All the keyword arguments of NCA.cleanblq are applicable to read_nca, too.

Examples

The examples below provide various patterns of using read_nca. In addition to showcasing correct usage, we also showcase the expected errors when the function is used incorrectly. This can serve as a quick reference in the event a user faces an error.

Standard DataFrames with no errors

julia> df1 = DataFrame(;
           id = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
           time = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4],
           amt = [10, 0, 0, 0, 0, 10, 0, 0, 0, 0],
           conc = [missing, 8, 6, 4, 2, missing, 8, 6, 4, 2],
           route = ["iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv"],
       )10×5 DataFrame
 Row │ id     time   amt    conc     route  
     │ Int64  Int64  Int64  Int64?   String 
─────┼──────────────────────────────────────
   1 │     1      0     10  missing  iv
   2 │     1      1      0        8  iv
   3 │     1      2      0        6  iv
   4 │     1      3      0        4  iv
   5 │     1      4      0        2  iv
   6 │     2      0     10  missing  iv
   7 │     2      1      0        8  iv
   8 │     2      2      0        6  iv
   9 │     2      3      0        4  iv
  10 │     2      4      0        2  iv

julia> df1_r = read_nca(df1)NCAPopulation (2 subjects):
  Number of missing observations: 2
  Number of blq observations: 0

We can make use of the other keyword arguments for more control over the creation of the population with read_nca, let's pass units from Unitful for concentration and time with concu and timeu:

julia> time_unit = u"hr"hr
julia> concentration_unit = u"mg/L"mg L^-1
julia> df1_r2 = read_nca(df1; concu = concentration_unit, timeu = time_unit)NCAPopulation (2 subjects):
  Number of missing observations: 2
  Number of blq observations: 0
  Units:
    concentration: mg L^-1
    time:          hr
    dose:

Missing required column `route`

The warning message below is verbose and educates the users on the consequence, of the not passing in the route column, and also, how to pass it in if missing from the source data:

julia> df2 = DataFrame(;
           id = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
           time = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4],
           amt = [10, 0, 0, 0, 0, 10, 0, 0, 0, 0],
           conc = [missing, 8, 6, 4, 2, missing, 8, 6, 4, 2],
       )10×4 DataFrame
 Row │ id     time   amt    conc    
     │ Int64  Int64  Int64  Int64?  
─────┼──────────────────────────────
   1 │     1      0     10  missing 
   2 │     1      1      0        8
   3 │     1      2      0        6
   4 │     1      3      0        4
   5 │     1      4      0        2
   6 │     2      0     10  missing 
   7 │     2      1      0        8
   8 │     2      2      0        6
   9 │     2      3      0        4
  10 │     2      4      0        2

julia> df2_r = read_nca(df2; observations = :conc)┌ Warning: No dosage information has passed. If the dataset has dosage information, you can pass the column names by `amt=:amt, route=:route`.
└ @ NCA ~/run/_work/PumasDocs.jl/PumasDocs.jl/custom_julia_depot/packages/NCA/cDvmE/src/data_parsing.jl:179
┌ Warning: Dosage information requires the presence of both amt & route information. Looks like you only entered the amt and not the route. If your dataset does not have route, please add a column that specifies the route of administration and then pass both columns as `amt=:amt, route=:route.`
└ @ NCA ~/run/_work/PumasDocs.jl/PumasDocs.jl/custom_julia_depot/packages/NCA/cDvmE/src/data_parsing.jl:181
NCAPopulation (2 subjects):
  Number of missing observations: 2
  Number of blq observations: 0

`amt` can be `missing` at time of observations

This is totally fine and won't error or emit warnings:

julia> df3 = DataFrame(;
           id = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
           time = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4],
           amt = [10, missing, missing, missing, missing, 10, missing, missing, missing, missing],
           conc = [missing, 8, 6, 4, 2, missing, 8, 6, 4, 2],
           route = ["iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv"],
       )10×5 DataFrame
 Row │ id     time   amt      conc     route  
     │ Int64  Int64  Int64?   Int64?   String 
─────┼────────────────────────────────────────
   1 │     1      0       10  missing  iv
   2 │     1      1  missing        8  iv
   3 │     1      2  missing        6  iv
   4 │     1      3  missing        4  iv
   5 │     1      4  missing        2  iv
   6 │     2      0       10  missing  iv
   7 │     2      1  missing        8  iv
   8 │     2      2  missing        6  iv
   9 │     2      3  missing        4  iv
  10 │     2      4  missing        2  iv

julia> df3_r = read_nca(df3; observations = :conc)NCAPopulation (2 subjects):
  Number of missing observations: 2
  Number of blq observations: 0

String (non-numeric) observations

observations column can only be numeric. The error message below will be noticed if the column has a string element, in this example <LOQ:

julia> df4 = DataFrame(;
           id = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
           time = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4],
           amt = [10, missing, missing, missing, missing, 10, missing, missing, missing, missing],
           conc = [missing, 8, 6, 4, "<LOQ", missing, 8, 6, 4, 2],
           route = ["iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv"],
       )10×5 DataFrame
 Row │ id     time   amt      conc     route  
     │ Int64  Int64  Int64?   Any      String 
─────┼────────────────────────────────────────
   1 │     1      0       10  missing  iv
   2 │     1      1  missing  8        iv
   3 │     1      2  missing  6        iv
   4 │     1      3  missing  4        iv
   5 │     1      4  missing  <LOQ     iv
   6 │     2      0       10  missing  iv
   7 │     2      1  missing  8        iv
   8 │     2      2  missing  6        iv
   9 │     2      3  missing  4        iv
  10 │     2      4  missing  2        iv

julia> df4_r = read_nca(df4; observations = :conc)ERROR: ArgumentError: conc has non-numeric values at index=[5]. We expect the names column to be of numeric type. Please fix your input data before proceeding further.

The way to circumvent this error is to specify the missingstrings keyword in CSV.read, e.g. CSV.read("pkdata.csv", DataFrame; missingstrings = ["<LOQ"]). In this way, all string elements match that text will be converted to missing.

`amt` column can only be numeric

The amt column must be of a numeric type, otherwise read_nca will error:

julia> df5 = DataFrame(;
           id = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
           time = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4],
           amt = [
               "10",
               missing,
               missing,
               missing,
               missing,
               "10",
               missing,
               missing,
               missing,
               missing,
           ],
           conc = [missing, 8, 6, 4, 2, missing, 8, 6, 4, 2],
           route = ["iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv"],
       )10×5 DataFrame
 Row │ id     time   amt      conc     route  
     │ Int64  Int64  String?  Int64?   String 
─────┼────────────────────────────────────────
   1 │     1      0  10       missing  iv
   2 │     1      1  missing        8  iv
   3 │     1      2  missing        6  iv
   4 │     1      3  missing        4  iv
   5 │     1      4  missing        2  iv
   6 │     2      0  10       missing  iv
   7 │     2      1  missing        8  iv
   8 │     2      2  missing        6  iv
   9 │     2      3  missing        4  iv
  10 │     2      4  missing        2  iv

julia> df5_r = read_nca(df5; observations = :conc)ERROR: ArgumentError: amt has non-numeric values at index=[1, 6]. We expect the names column to be of numeric type. Please fix your input data before proceeding further.

Concentration at dosing rows are not ignored

The example below emphasizes the fact that concentrations in dose rows are not ignored:

julia> df6 = DataFrame(;
           id = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
           time = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4],
           amt = [10, missing, missing, missing, missing, 10, missing, missing, missing, missing],
           conc = [10, 8, 6, 4, 2, 10, 8, 6, 4, 2],
           route = ["iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv"],
       )10×5 DataFrame
 Row │ id     time   amt      conc   route  
     │ Int64  Int64  Int64?   Int64  String 
─────┼──────────────────────────────────────
   1 │     1      0       10     10  iv
   2 │     1      1  missing      8  iv
   3 │     1      2  missing      6  iv
   4 │     1      3  missing      4  iv
   5 │     1      4  missing      2  iv
   6 │     2      0       10     10  iv
   7 │     2      1  missing      8  iv
   8 │     2      2  missing      6  iv
   9 │     2      3  missing      4  iv
  10 │     2      4  missing      2  iv

julia> df6_r = read_nca(df6; observations = :conc)NCAPopulation (2 subjects):
  Number of missing observations: 0
  Number of blq observations: 0

`route` can either be upper or lowercase or mixed-case `ev`, `iv` or `inf`

While we accommodate mixed case, it is recommended for consistency that users provide route information in the same consistent case, preferably lower-case.

julia> df7 = DataFrame(;
           id = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
           time = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4],
           amt = [10, missing, missing, missing, missing, 10, missing, missing, missing, missing],
           conc = [10, 8, 6, 4, 2, 10, 8, 6, 4, 2],
           route = ["Iv", "Iv", "Iv", "Iv", "Iv", "IV", "IV", "IV", "IV", "IV"],
       )10×5 DataFrame
 Row │ id     time   amt      conc   route  
     │ Int64  Int64  Int64?   Int64  String 
─────┼──────────────────────────────────────
   1 │     1      0       10     10  Iv
   2 │     1      1  missing      8  Iv
   3 │     1      2  missing      6  Iv
   4 │     1      3  missing      4  Iv
   5 │     1      4  missing      2  Iv
   6 │     2      0       10     10  IV
   7 │     2      1  missing      8  IV
   8 │     2      2  missing      6  IV
   9 │     2      3  missing      4  IV
  10 │     2      4  missing      2  IV

julia> df7_r = read_nca(df7; observations = :conc)NCAPopulation (2 subjects):
  Number of missing observations: 0
  Number of blq observations: 0

julia> df7_r[1].doseNCADose:
  time:         0
  amt:          10
  duration:     0
  route:        IVBolus
  ss:           false

Non-monotonic time is not allowed within an individual

Users have to ensure that time is monotonically increasing within a subject, unless there is a grouping variable that is specified:

julia> df8 = DataFrame(;
           id = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
           time = [0, 1, 2, 3, 3, 0, 1, 2, 3, 3],
           amt = [10, missing, missing, missing, missing, 10, missing, missing, missing, missing],
           conc = [10, 8, 6, 4, 2, 10, 8, 6, 4, 2],
           route = ["iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv"],
       )10×5 DataFrame
 Row │ id     time   amt      conc   route  
     │ Int64  Int64  Int64?   Int64  String 
─────┼──────────────────────────────────────
   1 │     1      0       10     10  iv
   2 │     1      1  missing      8  iv
   3 │     1      2  missing      6  iv
   4 │     1      3  missing      4  iv
   5 │     1      3  missing      2  iv
   6 │     2      0       10     10  iv
   7 │     2      1  missing      8  iv
   8 │     2      2  missing      6  iv
   9 │     2      3  missing      4  iv
  10 │     2      3  missing      2  iv

julia> df8_r = read_nca(df8; observations = :conc)[ Info: ID 1 errored
ERROR: ArgumentError: Time must be monotonically increasing. Errored at `time=3` (index 4)

Missing time is not allowed

Values in the time column must not be missing:

julia> df9 = DataFrame(;
           id = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
           time = [0, 1, 2, 3, missing, 0, 1, 2, 3, missing],
           amt = [10, missing, missing, missing, missing, 10, missing, missing, missing, missing],
           conc = [10, 8, 6, 4, 2, 10, 8, 6, 4, 2],
           route = ["iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv"],
       )10×5 DataFrame
 Row │ id     time     amt      conc   route  
     │ Int64  Int64?   Int64?   Int64  String 
─────┼────────────────────────────────────────
   1 │     1        0       10     10  iv
   2 │     1        1  missing      8  iv
   3 │     1        2  missing      6  iv
   4 │     1        3  missing      4  iv
   5 │     1  missing  missing      2  iv
   6 │     2        0       10     10  iv
   7 │     2        1  missing      8  iv
   8 │     2        2  missing      6  iv
   9 │     2        3  missing      4  iv
  10 │     2  missing  missing      2  iv

julia> df9_r = read_nca(df9; observations = :conc)[ Info: ID 1 errored
ERROR: ArgumentError: Time may not be missing (missing occured at index 5)

Multiple dose within a subject requires contiguous time

julia> df10 = DataFrame(;
           id = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
           time = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
           amt = [10, 0, 0, 0, 0, 10, 0, 0, 0, 0],
           conc = [missing, 8, 6, 4, 2, missing, 8, 6, 4, 2],
           route = ["iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv"],
           iii = [5, 0, 0, 0, 0, 5, 0, 0, 0, 0],
       )10×6 DataFrame
 Row │ id     time   amt    conc     route   iii   
     │ Int64  Int64  Int64  Int64?   String  Int64 
─────┼─────────────────────────────────────────────
   1 │     1      0     10  missing  iv          5
   2 │     1      1      0        8  iv          0
   3 │     1      2      0        6  iv          0
   4 │     1      3      0        4  iv          0
   5 │     1      4      0        2  iv          0
   6 │     1      5     10  missing  iv          5
   7 │     1      6      0        8  iv          0
   8 │     1      7      0        6  iv          0
   9 │     1      8      0        4  iv          0
  10 │     1      9      0        2  iv          0

julia> df10_r = read_nca(df10; observations = :conc)NCAPopulation (1 subjects):
  Number of missing observations: 1
  Number of blq observations: 0

julia> df10_r[1].dose2-element Vector{NCADose{Int64, Int64}}:
 NCADose:
  time:         0
  amt:          10
  duration:     0
  route:        IVBolus
  ss:           false
 NCADose:
  time:         5
  amt:          10
  duration:     0
  route:        IVBolus
  ss:           false

Multiple dose with `ii` specified allows computation of steady-state values

julia> df11 = DataFrame(;
           id = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
           time = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
           amt = [10, 0, 0, 0, 0, 10, 0, 0, 0, 0],
           conc = [missing, 8, 6, 4, 2, missing, 8, 6, 4, 2],
           route = ["iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv"],
           iii = [5, 0, 0, 0, 0, 5, 0, 0, 0, 0],
       )10×6 DataFrame
 Row │ id     time   amt    conc     route   iii   
     │ Int64  Int64  Int64  Int64?   String  Int64 
─────┼─────────────────────────────────────────────
   1 │     1      0     10  missing  iv          5
   2 │     1      1      0        8  iv          0
   3 │     1      2      0        6  iv          0
   4 │     1      3      0        4  iv          0
   5 │     1      4      0        2  iv          0
   6 │     1      5     10  missing  iv          5
   7 │     1      6      0        8  iv          0
   8 │     1      7      0        6  iv          0
   9 │     1      8      0        4  iv          0
  10 │     1      9      0        2  iv          0

julia> df11_r = read_nca(df11; observations = :conc, ii = :iii)NCAPopulation (1 subjects):
  Number of missing observations: 1
  Number of blq observations: 0

julia> df11_r[1].dose2-element Vector{NCADose{Int64, Int64}}:
 NCADose:
  time:         0
  amt:          10
  duration:     0
  route:        IVBolus
  ss:           false
 NCADose:
  time:         5
  amt:          10
  duration:     0
  route:        IVBolus
  ss:           false

As you can see, the result of df10_r and df11_r are identical, even though the latter accepts the ii argument mapped to iii from the dataset. ii specifies the tau, or the dosing frequency. This information allows Pumas NCA package to compute steady-state parameters, cmaxss, cminss, cavgss, accumuluationindex, tau. We can confirm this by looking at the differences between the two. df10_r that has no ii information cannot compute the accumulationindex whereas df11_r can.

julia> NCA.accumulationindex(df10_r)2×2 DataFrame
 Row │ id      accumulationindex 
     │ String  Missing           
─────┼───────────────────────────
   1 │ 1                 missing 
   2 │ 1                 missing

julia> NCA.accumulationindex(df11_r)2×2 DataFrame
 Row │ id      accumulationindex 
     │ String  Float64           
─────┼───────────────────────────
   1 │ 1                 1.06855
   2 │ 1                 1.06855

Subjects with dosing record only and no observations will result in missing results

julia> df12 = DataFrame(; id = 1, time = 0, amt = 10, conc = missing, route = "iv")1×5 DataFrame
 Row │ id     time   amt    conc     route  
     │ Int64  Int64  Int64  Missing  String 
─────┼──────────────────────────────────────
   1 │     1      0     10  missing  iv

read_nca will also give a warning when parsing:

julia> df12_r = read_nca(df12; observations = :conc)[ Info: ID: 1. Dataset has the amt column amt populated for all rows hence the first time 0 is considered as dose time.
┌ Warning: Subject 1: All concentration data is missing between times 0 and 0
└ @ NCA ~/run/_work/PumasDocs.jl/PumasDocs.jl/custom_julia_depot/packages/NCA/cDvmE/src/utils.jl:74
NCAPopulation (1 subjects):
  Number of missing observations: 1
  Number of blq observations: 0

julia> NCA.auc(df12_r)1×2 DataFrame
 Row │ id      auc     
     │ String  Missing 
─────┼─────────────────
   1 │ 1       missing

Multiple dosing - all provided doses should have corresponding observations vectors

In the example below, for subject 1, the first dose has observations, but the second dose at time 5 has no associated observations and hence results in the error:

julia> df13 = DataFrame(;
           id = [1, 1, 1, 1, 1, 1],
           time = [0, 1, 2, 3, 4, 5],
           amt = [10, 0, 0, 0, 0, 10],
           conc = [missing, 8, 6, 4, 2, missing],
           route = ["iv", "iv", "iv", "iv", "iv", "iv"],
       )6×5 DataFrame
 Row │ id     time   amt    conc     route  
     │ Int64  Int64  Int64  Int64?   String 
─────┼──────────────────────────────────────
   1 │     1      0     10  missing  iv
   2 │     1      1      0        8  iv
   3 │     1      2      0        6  iv
   4 │     1      3      0        4  iv
   5 │     1      4      0        2  iv
   6 │     1      5     10  missing  iv

julia> df13_r = read_nca(df13; observations = :conc)┌ Warning: Subject 1: All concentration data is missing between times 5 and 5
└ @ NCA ~/run/_work/PumasDocs.jl/PumasDocs.jl/custom_julia_depot/packages/NCA/cDvmE/src/utils.jl:74
NCAPopulation (1 subjects):
  Number of missing observations: 1
  Number of blq observations: 0

steady-state flag `ss` requires `ii>0`

At the moment, ii and ss work interchangeable for the computation of steady state parameters. The rules are as follows:

When ii is specified, ss is not required, and the information of tau from ii is used to compute parameters specific to multiple dose.
When ss is specified, ii is required as most steady-state parameters require tau as information.
When ii or ss are not specified for multiple dose data, none of the steady-state parameters are computed.

julia> df14 = DataFrame(;
           id = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
           time = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
           amt = [10, 0, 0, 0, 0, 10, 0, 0, 0, 0],
           conc = [missing, 8, 6, 4, 2, missing, 8, 6, 4, 2],
           route = ["iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv"],
           iii = [5, 0, 0, 0, 0, 5, 0, 0, 0, 0],
           sss = [1, 0, 0, 0, 0, 1, 0, 0, 0, 0],
       )10×7 DataFrame
 Row │ id     time   amt    conc     route   iii    sss   
     │ Int64  Int64  Int64  Int64?   String  Int64  Int64 
─────┼────────────────────────────────────────────────────
   1 │     1      0     10  missing  iv          5      1
   2 │     1      1      0        8  iv          0      0
   3 │     1      2      0        6  iv          0      0
   4 │     1      3      0        4  iv          0      0
   5 │     1      4      0        2  iv          0      0
   6 │     1      5     10  missing  iv          5      1
   7 │     1      6      0        8  iv          0      0
   8 │     1      7      0        6  iv          0      0
   9 │     1      8      0        4  iv          0      0
  10 │     1      9      0        2  iv          0      0

julia> df14_r1 = read_nca(df14; observations = :conc)NCAPopulation (1 subjects):
  Number of missing observations: 1
  Number of blq observations: 0

julia> NCA.accumulationindex(df14_r1)2×2 DataFrame
 Row │ id      accumulationindex 
     │ String  Missing           
─────┼───────────────────────────
   1 │ 1                 missing 
   2 │ 1                 missing

julia> df14_r2 = read_nca(df14; observations = :conc, ss = :sss, ii = :iii)NCAPopulation (1 subjects):
  Number of missing observations: 1
  Number of blq observations: 0

julia> NCA.accumulationindex(df14_r2)2×2 DataFrame
 Row │ id      accumulationindex 
     │ String  Float64           
─────┼───────────────────────────
   1 │ 1                 1.06855
   2 │ 1                 1.06855

julia> df14_r3 = read_nca(df14; observations = :conc, ii = :iii)NCAPopulation (1 subjects):
  Number of missing observations: 1
  Number of blq observations: 0

julia> NCA.accumulationindex(df14_r3)2×2 DataFrame
 Row │ id      accumulationindex 
     │ String  Float64           
─────┼───────────────────────────
   1 │ 1                 1.06855
   2 │ 1                 1.06855

julia> df14_r4 = read_nca(df14; observations = :conc, ss = :sss)ERROR: ArgumentError: ii must be greater than zero when ss=1. Got ii=0

Specification of Groups

NCA is always done on a per-NCASubject, per-dose-event basis.
Grouping during an NCA analysis can be at a

NCASubject level, e.g. After a single dose, subject has observations of parent and metabolite, so grouping happens at the analyte level; after multiple dose (single dose every day), subject has measurements every day, so grouping happens per day; subject has received single ascending dose, so group is per dose.
NCAPopulation level, e.g. The study population is divided into multiple dose groups, so grouping is done by dose; some subject receive tablets and some subjects receive capsules, so grouping is done by formulation.

Groups specified in read_nca via the group argument get carried forward into the result data frame, whether a complete report or the result of a single function.
At a NCASubject level, specifying group allows Pumas NCA package to break down the subject's profile into multiple groups that ensures that Non-monotonic time is not allowed within an individual requirement is respected.
At a NCAPopulation level, specifying group provides a convenient way to carry that variable forward into the result data frame.
More than one group can be passed in via the group argument using the array of symbols syntax, e.g. group=[:dose, :day]

The example below emphasizes the grouping at the NCAPopulation level:

julia> df17 = DataFrame(;
           id = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
           time = [0, 1, 2, 3, 4, 0, 1, 2, 3, 4],
           amt = [10, 0, 0, 0, 0, 10, 0, 0, 0, 0],
           conc = [missing, 8, 6, 4, 6, missing, 8, 6, 4, 2],
           route = ["iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv", "iv"],
           formulation = ["T", "T", "T", "T", "T", "R", "R", "R", "R", "R"],
       )10×6 DataFrame
 Row │ id     time   amt    conc     route   formulation 
     │ Int64  Int64  Int64  Int64?   String  String      
─────┼───────────────────────────────────────────────────
   1 │     1      0     10  missing  iv      T
   2 │     1      1      0        8  iv      T
   3 │     1      2      0        6  iv      T
   4 │     1      3      0        4  iv      T
   5 │     1      4      0        6  iv      T
   6 │     2      0     10  missing  iv      R
   7 │     2      1      0        8  iv      R
   8 │     2      2      0        6  iv      R
   9 │     2      3      0        4  iv      R
  10 │     2      4      0        2  iv      R

julia> df17_r = read_nca(df17; observations = :conc, group = [:formulation])NCAPopulation (2 subjects):
  Group: [["formulation" => "R"], ["formulation" => "T"]]
  Number of missing observations: 2
  Number of blq observations: 0