Handling Missing and BLQ Data
When the data for Pumas NCA package is read in with read_nca it is passed through some data sanity checks and cleaning. A lot of these checks were covered as examples in the read_nca examples. Below we cover some extra details in regard to the handling of missing data and data below the lower limit of quantification (llq), usually referred to below lower limit of quantification (BLQ).
Missing data handling
missing observations (and their associated times) and volumes are removed from the dataset by default. However, missingconc and missingvolume are keyword arguments in read_nca that impute the missing data with a numeric value instead of dropping from the data.
julia> df = DataFrame(; id = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2], time = [0, 1, 2, 3, 4, 6, 0, 1, 2, 3, 4, 6, 8], amt = [10, 0, 0, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0], sss = [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], iii = [4, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0], conc = [missing, 8, 6, 4, 2, 0.1, missing, 2, 6, 3, 2, 0.5, 0.1], isblq = [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1], route = ["iv", "iv", "iv", "iv", "iv", "iv", "ev", "ev", "ev", "ev", "ev", "ev", "ev"], )13×8 DataFrame Row │ id time amt sss iii conc isblq route │ Int64 Int64 Int64 Int64 Int64 Float64? Int64 String ─────┼───────────────────────────────────────────────────────────── 1 │ 1 0 10 1 4 missing 0 iv 2 │ 1 1 0 0 0 8.0 0 iv 3 │ 1 2 0 0 0 6.0 0 iv 4 │ 1 3 0 0 0 4.0 0 iv 5 │ 1 4 0 0 0 2.0 0 iv 6 │ 1 6 0 0 0 0.1 1 iv 7 │ 2 0 20 1 4 missing 0 ev 8 │ 2 1 0 0 0 2.0 0 ev 9 │ 2 2 0 0 0 6.0 0 ev 10 │ 2 3 0 0 0 3.0 0 ev 11 │ 2 4 0 0 0 2.0 0 ev 12 │ 2 6 0 0 0 0.5 0 ev 13 │ 2 8 0 0 0 0.1 1 ev
In the example below, we can see that the observations vector of the first subject is only 5-element long starting with 8 as expected.
julia> df_m1 = read_nca(df; observations = :conc)NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 0julia> df_m1[1].observations5-element Vector{Float64}: 8.0 6.0 4.0 2.0 0.1
When we pass in the missingconc argument to read_nca and set all missing's to be 10, we can see that the first subject has a 6-element vector for observations starting at 10:
julia> df_m2 = read_nca(df; observations = :conc, missingconc = 10)NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 0julia> df_m2[1].observations6-element Vector{Float64}: 10.0 8.0 6.0 4.0 2.0 0.1
One point to note here is that even when imputing the missing with a value with misingconc the number of missing concentrations (num_conc_missing field of NCASubject) is recorded on the basis of the original data's missing values:
julia> df_m2[1].num_conc_missing1
The method for handling missing concentrations can affect the output of which points are considered BLQ.
BLQ handling
In Pumas NCA package, by default the llq is considered as 0, and hence, values are considered BLQ if they are 0 unless it's at the first time point or at time of dosing. If a blq column is mapped from the data, all rows with 1 in it are removed from the data. concblq keyword argument in read_nca can be used to pass either a scalar indicating what should be done for all BLQ values or a collection with elements named "first", "middle", and "last"; each set to one of the valid options discussed below.
The meaning of each of the list elements is:
:first: Values up to the first non-BLQvalue. Note that if all values areBLQ, this includes all values.:middle: Values that areBLQbetween the first and last non-BLQvalues.:last: Values that areBLQafter the last non-BLQvalue.
The valid settings for each are:
:drop: Drop theBLQvalues.:keep: Keep theBLQvalues.- A number: Impute
BLQvalues with that number.
The default settings for concblq are the following:
concblq = Dict(:first => :keep, :middle => :drop, :last => :keep)In practice, there are three ways of handling BLQ data:
set the
BLQvalues tomissing. The impact of doing this depends on where in the concentration time profile is the value.- When
BLQvalues occur at the end of the concentration-time profile, setting them to missing has the effect of truncating theAUCto the time of the last observed concentration. - When
BLQvalues occur in between two observed concentrations, setting theBLQvalue to missing has the effect of removing that time point from theAUCcalculation. This can overestimate the AUC as extrapolation occurs between the two observed data points, ignoring theBLQvalue.
- When
set
BLQvalue to zero- May result in underestimation of
AUC, but at least protects against overestimation.
- May result in underestimation of
set
BLQto a specific value- Most common is to set the
BLQvalue to 1/2 of thellqvalue. - Users are also provided the option to set this to any numeric value of choice.
- Most common is to set the
BLQ examples
Here are some examples on how to handle BLQs.
llq argument sets a data-wide value
In the example below, llq is set to 0.6 via the argument to read_nca. The result being that all values below the set value are now considered as BLQ in the dataset and dropped from it as per the concblq argument set to :drop. Notice how the number of reported BLQ values are three:
julia> df_b1 = read_nca(df; observations = :conc, llq = 0.6, concblq = :drop)NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 3
We see that values below 0.6 are not used for any NCA computation:
julia> NCA.clast(df_b1)2×2 DataFrame Row │ id clast │ String Float64 ─────┼───────────────── 1 │ 1 2.0 2 │ 2 2.0
llq and blq contribute together
Users are allowed to pass in both llq and blq arguments to read_nca and conditions from both arguments are met additively. In the example below, the BLQ values are the union of those mapped from the data and those set via the llq argument:
julia> df_b2 = read_nca(df; observations = :conc, llq = 0.6, blq = :isblq, concblq = :drop)[ Info: Rows with isblq as 1 are removed from the data, for more control over BLQ handling please refer to `concblq` kwarg NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 3
julia> NCA.clast(df_b2)2×2 DataFrame Row │ id clast │ String Float64 ─────┼───────────────── 1 │ 1 2.0 2 │ 2 2.0
Use concblq to set specific rules for BLQ handling
As discussed above, the conblq argument provides a lot of flexibility in handling BLQ. In the examples below, we showcase some of these features using the BLQ values mapped from the data.
The default of read_nca is as the example below where all BLQ values in "middle" are dropped from the dataset and the "first" and "last" BLQ values are retained, this should also make why :drop was used in the above examples clear, where all our BLQ values were towards the end of subject's observations (last):
julia> df_b3 = read_nca( df; observations = :conc, llq = 0.2, concblq = Dict(:first => :keep, :middle => :drop, :last => :keep), )NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 2
We can confirm this:
julia> NCA.auc(df_b3; auctype = :last)2×2 DataFrame Row │ id auc │ String Float64 ─────┼───────────────── 1 │ 1 26.4333 2 │ 2 15.1
matches the default below:
julia> df_b3a = read_nca(df; observations = :conc, llq = 0.2)NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 2julia> NCA.auc(df_b3a; auctype = :last)2×2 DataFrame Row │ id auc │ String Float64 ─────┼───────────────── 1 │ 1 26.4333 2 │ 2 15.1
Next, we see how to :drop the last value instead of the default :keep for "last":
julia> df_b4 = read_nca( df; observations = :conc, llq = 0.2, concblq = Dict(:first => :keep, :middle => :drop, :last => :drop), )NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 2
Comparing to above obtained auc values:
julia> NCA.auc(df_b4; auctype = :last)2×2 DataFrame Row │ id auc │ String Float64 ─────┼───────────────── 1 │ 1 24.3333 2 │ 2 14.5
Next, we set it to a unique value for the last concentration:
julia> df_b5 = read_nca( df; observations = :conc, blq = :isblq, llq = 0.6, concblq = Dict(:first => :drop, :middle => :drop, :last => 0.15), )[ Info: Rows with isblq as 1 are removed from the data, for more control over BLQ handling please refer to `concblq` kwarg NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 3
We can confirm this:
julia> NCA.clast(df_b5)2×2 DataFrame Row │ id clast │ String Float64 ─────┼───────────────── 1 │ 1 2.0 2 │ 2 0.15