Handling Missing and BLQ Data

When the data for Pumas NCA package is read in with read_nca it is passed through some data sanity checks and cleaning. A lot of these checks were covered as examples in the read_nca examples. Below we cover some extra details in regard to the handling of missing data and data below the lower limit of quantification (llq), usually referred to below lower limit of quantification (BLQ).

Missing data handling

missing observations (and their associated times) and volumes are removed from the dataset by default. However, missingconc and missingvolume are keyword arguments in read_nca that impute the missing data with a numeric value instead of dropping from the data.

julia> df = DataFrame(;
           id = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
           time = [0, 1, 2, 3, 4, 6, 0, 1, 2, 3, 4, 6, 8],
           amt = [10, 0, 0, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0],
           sss = [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
           iii = [4, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0],
           conc = [missing, 8, 6, 4, 2, 0.1, missing, 2, 6, 3, 2, 0.5, 0.1],
           isblq = [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1],
           route = ["iv", "iv", "iv", "iv", "iv", "iv", "ev", "ev", "ev", "ev", "ev", "ev", "ev"],
       )13×8 DataFrame
 Row │ id     time   amt    sss    iii    conc       isblq  route
     │ Int64  Int64  Int64  Int64  Int64  Float64?   Int64  String
─────┼─────────────────────────────────────────────────────────────
   1 │     1      0     10      1      4  missing        0  iv
   2 │     1      1      0      0      0        8.0      0  iv
   3 │     1      2      0      0      0        6.0      0  iv
   4 │     1      3      0      0      0        4.0      0  iv
   5 │     1      4      0      0      0        2.0      0  iv
   6 │     1      6      0      0      0        0.1      1  iv
   7 │     2      0     20      1      4  missing        0  ev
   8 │     2      1      0      0      0        2.0      0  ev
   9 │     2      2      0      0      0        6.0      0  ev
  10 │     2      3      0      0      0        3.0      0  ev
  11 │     2      4      0      0      0        2.0      0  ev
  12 │     2      6      0      0      0        0.5      0  ev
  13 │     2      8      0      0      0        0.1      1  ev

In the example below, we can see that the observations vector of the first subject is only 5-element long starting with 8 as expected.

Missing values

The underlying default value of missingconc is :drop, i.e, all missings are dropped.

julia> df_m1 = read_nca(df; observations = :conc)NCAPopulation (2 subjects):
  Number of missing observations: 2
  Number of blq observations: 0
julia> df_m1[1].observations5-element Vector{Float64}: 8.0 6.0 4.0 2.0 0.1

When we pass in the missingconc argument to read_nca and set all missing's to be 10, we can see that the first subject has a 6-element vector for observations starting at 10:

julia> df_m2 = read_nca(df; observations = :conc, missingconc = 10)NCAPopulation (2 subjects):
  Number of missing observations: 2
  Number of blq observations: 0
julia> df_m2[1].observations6-element Vector{Float64}: 10.0 8.0 6.0 4.0 2.0 0.1

One point to note here is that even when imputing the missing with a value with misingconc the number of missing concentrations (num_conc_missing field of NCASubject) is recorded on the basis of the original data's missing values:

julia> df_m2[1].num_conc_missing1

The method for handling missing concentrations can affect the output of which points are considered BLQ.

BLQ handling

In Pumas NCA package, by default the llq is considered as 0, and hence, values are considered BLQ if they are 0 unless it's at the first time point or at time of dosing. If a blq column is mapped from the data, all rows with 1 in it are removed from the data. concblq keyword argument in read_nca can be used to pass either a scalar indicating what should be done for all BLQ values or a collection with elements named "first", "middle", and "last"; each set to one of the valid options discussed below.

The meaning of each of the list elements is:

  1. :first: Values up to the first non-BLQ value. Note that if all values are BLQ, this includes all values.
  2. :middle: Values that are BLQ between the first and last non-BLQ values.
  3. :last: Values that are BLQ after the last non-BLQ value.

The valid settings for each are:

  1. :drop: Drop the BLQ values.
  2. :keep: Keep the BLQ values.
  3. A number: Impute BLQ values with that number.

The default settings for concblq are the following:

concblq = Dict(:first => :keep, :middle => :drop, :last => :keep)

In practice, there are three ways of handling BLQ data:

  • set the BLQ values to missing. The impact of doing this depends on where in the concentration time profile is the value.

    • When BLQ values occur at the end of the concentration-time profile, setting them to missing has the effect of truncating the AUC to the time of the last observed concentration.
    • When BLQ values occur in between two observed concentrations, setting the BLQ value to missing has the effect of removing that time point from the AUC calculation. This can overestimate the AUC as extrapolation occurs between the two observed data points, ignoring the BLQ value.
  • set BLQ value to zero

    • May result in underestimation of AUC, but at least protects against overestimation.
  • set BLQ to a specific value

    • Most common is to set the BLQ value to 1/2 of the llq value.
    • Users are also provided the option to set this to any numeric value of choice.

BLQ examples

Here are some examples on how to handle BLQs.

llq argument sets a data-wide value

In the example below, llq is set to 0.6 via the argument to read_nca. The result being that all values below the set value are now considered as BLQ in the dataset and dropped from it as per the concblq argument set to :drop. Notice how the number of reported BLQ values are three:

julia> df_b1 = read_nca(df; observations = :conc, llq = 0.6, concblq = :drop)NCAPopulation (2 subjects):
  Number of missing observations: 2
  Number of blq observations: 3

We see that values below 0.6 are not used for any NCA computation:

julia> NCA.clast(df_b1)2×2 DataFrame
 Row │ id      clast
     │ String  Float64
─────┼─────────────────
   1 │ 1           2.0
   2 │ 2           2.0

llq and blq contribute together

Users are allowed to pass in both llq and blq arguments to read_nca and conditions from both arguments are met additively. In the example below, the BLQ values are the union of those mapped from the data and those set via the llq argument:

julia> df_b2 = read_nca(df; observations = :conc, llq = 0.6, blq = :isblq, concblq = :drop)[ Info: Rows with isblq as 1 are removed from the data, for more control over BLQ handling please refer to `concblq` kwarg
NCAPopulation (2 subjects):
  Number of missing observations: 2
  Number of blq observations: 3
julia> NCA.clast(df_b2)2×2 DataFrame
 Row │ id      clast
     │ String  Float64
─────┼─────────────────
   1 │ 1           2.0
   2 │ 2           2.0

Use concblq to set specific rules for BLQ handling

As discussed above, the conblq argument provides a lot of flexibility in handling BLQ. In the examples below, we showcase some of these features using the BLQ values mapped from the data.

The default of read_nca is as the example below where all BLQ values in "middle" are dropped from the dataset and the "first" and "last" BLQ values are retained, this should also make why :drop was used in the above examples clear, where all our BLQ values were towards the end of subject's observations (last):

julia> df_b3 = read_nca(
           df;
           observations = :conc,
           llq = 0.2,
           concblq = Dict(:first => :keep, :middle => :drop, :last => :keep),
       )NCAPopulation (2 subjects):
  Number of missing observations: 2
  Number of blq observations: 2

We can confirm this:

julia> NCA.auc(df_b3; auctype = :last)2×2 DataFrame
 Row │ id      auc
     │ String  Float64
─────┼─────────────────
   1 │ 1       26.4333
   2 │ 2       15.1

matches the default below:

julia> df_b3a = read_nca(df; observations = :conc, llq = 0.2)NCAPopulation (2 subjects):
  Number of missing observations: 2
  Number of blq observations: 2
julia> NCA.auc(df_b3a; auctype = :last)2×2 DataFrame Row │ id auc │ String Float64 ─────┼───────────────── 1 │ 1 26.4333 2 │ 2 15.1

Next, we see how to :drop the last value instead of the default :keep for "last":

julia> df_b4 = read_nca(
           df;
           observations = :conc,
           llq = 0.2,
           concblq = Dict(:first => :keep, :middle => :drop, :last => :drop),
       )NCAPopulation (2 subjects):
  Number of missing observations: 2
  Number of blq observations: 2

Comparing to above obtained auc values:

julia> NCA.auc(df_b4; auctype = :last)2×2 DataFrame
 Row │ id      auc
     │ String  Float64
─────┼─────────────────
   1 │ 1       24.3333
   2 │ 2       14.5

Next, we set it to a unique value for the last concentration:

julia> df_b5 = read_nca(
           df;
           observations = :conc,
           blq = :isblq,
           llq = 0.6,
           concblq = Dict(:first => :drop, :middle => :drop, :last => 0.15),
       )[ Info: Rows with isblq as 1 are removed from the data, for more control over BLQ handling please refer to `concblq` kwarg
NCAPopulation (2 subjects):
  Number of missing observations: 2
  Number of blq observations: 3

We can confirm this:

julia> NCA.clast(df_b5)2×2 DataFrame
 Row │ id      clast
     │ String  Float64
─────┼─────────────────
   1 │ 1          2.0
   2 │ 2          0.15