Handling Missing and BLQ Data
When the data for Pumas NCA
package is read in with read_nca
it is passed through some data sanity checks and cleaning. A lot of these checks were covered as examples in the read_nca
examples. Below we cover some extra details in regard to the handling of missing data and data below the lower limit of quantification (llq
), usually referred to below lower limit of quantification (BLQ
).
Missing data handling
missing
observations (and their associated times) and volumes are removed from the dataset by default. However, missingconc
and missingvolume
are keyword arguments in read_nca
that impute the missing
data with a numeric value instead of dropping from the data.
julia> df = DataFrame(; id = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2], time = [0, 1, 2, 3, 4, 6, 0, 1, 2, 3, 4, 6, 8], amt = [10, 0, 0, 0, 0, 0, 20, 0, 0, 0, 0, 0, 0], sss = [1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], iii = [4, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0], conc = [missing, 8, 6, 4, 2, 0.1, missing, 2, 6, 3, 2, 0.5, 0.1], isblq = [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1], route = ["iv", "iv", "iv", "iv", "iv", "iv", "ev", "ev", "ev", "ev", "ev", "ev", "ev"], )
13×8 DataFrame Row │ id time amt sss iii conc isblq route │ Int64 Int64 Int64 Int64 Int64 Float64? Int64 String ─────┼───────────────────────────────────────────────────────────── 1 │ 1 0 10 1 4 missing 0 iv 2 │ 1 1 0 0 0 8.0 0 iv 3 │ 1 2 0 0 0 6.0 0 iv 4 │ 1 3 0 0 0 4.0 0 iv 5 │ 1 4 0 0 0 2.0 0 iv 6 │ 1 6 0 0 0 0.1 1 iv 7 │ 2 0 20 1 4 missing 0 ev 8 │ 2 1 0 0 0 2.0 0 ev 9 │ 2 2 0 0 0 6.0 0 ev 10 │ 2 3 0 0 0 3.0 0 ev 11 │ 2 4 0 0 0 2.0 0 ev 12 │ 2 6 0 0 0 0.5 0 ev 13 │ 2 8 0 0 0 0.1 1 ev
In the example below, we can see that the observations
vector of the first subject is only 5-element long starting with 8
as expected.
The underlying default value of missingconc
is :drop
, i.e, all missing
s are dropped.
julia> df_m1 = read_nca(df; observations = :conc)
NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 0
julia> df_m1[1].observations
5-element Vector{Float64}: 8.0 6.0 4.0 2.0 0.1
When we pass in the missingconc
argument to read_nca
and set all missing
's to be 10
, we can see that the first subject has a 6-element vector for observations
starting at 10
:
julia> df_m2 = read_nca(df; observations = :conc, missingconc = 10)
NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 0
julia> df_m2[1].observations
6-element Vector{Float64}: 10.0 8.0 6.0 4.0 2.0 0.1
One point to note here is that even when imputing the missing with a value with misingconc
the number of missing concentrations (num_conc_missing
field of NCASubject
) is recorded on the basis of the original data's missing values:
julia> df_m2[1].num_conc_missing
1
The method for handling missing
concentrations can affect the output of which points are considered BLQ
.
BLQ handling
In Pumas NCA
package, by default the llq
is considered as 0, and hence, values are considered BLQ
if they are 0
unless it's at the first time point or at time of dosing. If a blq
column is mapped from the data, all rows with 1
in it are removed from the data. concblq
keyword argument in read_nca
can be used to pass either a scalar indicating what should be done for all BLQ
values or a collection with elements named "first"
, "middle"
, and "last"
; each set to one of the valid options discussed below.
The meaning of each of the list elements is:
:first
: Values up to the first non-BLQ
value. Note that if all values areBLQ
, this includes all values.:middle
: Values that areBLQ
between the first and last non-BLQ
values.:last
: Values that areBLQ
after the last non-BLQ
value.
The valid settings for each are:
:drop
: Drop theBLQ
values.:keep
: Keep theBLQ
values.- A number: Impute
BLQ
values with that number.
The default settings for concblq
are the following:
concblq = Dict(:first => :keep, :middle => :drop, :last => :keep)
In practice, there are three ways of handling BLQ
data:
set the
BLQ
values tomissing
. The impact of doing this depends on where in the concentration time profile is the value.- When
BLQ
values occur at the end of the concentration-time profile, setting them to missing has the effect of truncating theAUC
to the time of the last observed concentration. - When
BLQ
values occur in between two observed concentrations, setting theBLQ
value to missing has the effect of removing that time point from theAUC
calculation. This can overestimate the AUC as extrapolation occurs between the two observed data points, ignoring theBLQ
value.
- When
set
BLQ
value to zero- May result in underestimation of
AUC
, but at least protects against overestimation.
- May result in underestimation of
set
BLQ
to a specific value- Most common is to set the
BLQ
value to 1/2 of thellq
value. - Users are also provided the option to set this to any numeric value of choice.
- Most common is to set the
BLQ examples
Here are some examples on how to handle BLQs.
llq
argument sets a data-wide value
In the example below, llq
is set to 0.6
via the argument to read_nca
. The result being that all values below the set value are now considered as BLQ
in the dataset and dropped from it as per the concblq
argument set to :drop
. Notice how the number of reported BLQ
values are three:
julia> df_b1 = read_nca(df; observations = :conc, llq = 0.6, concblq = :drop)
NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 3
We see that values below 0.6
are not used for any NCA computation:
julia> NCA.clast(df_b1)
2×2 DataFrame Row │ id clast │ String Float64 ─────┼───────────────── 1 │ 1 2.0 2 │ 2 2.0
llq
and blq
contribute together
Users are allowed to pass in both llq
and blq
arguments to read_nca
and conditions from both arguments are met additively. In the example below, the BLQ
values are the union of those mapped from the data and those set via the llq
argument:
julia> df_b2 = read_nca(df; observations = :conc, llq = 0.6, blq = :isblq, concblq = :drop)
[ Info: Rows with isblq as 1 are removed from the data, for more control over BLQ handling please refer to `concblq` kwarg NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 3
julia> NCA.clast(df_b2)
2×2 DataFrame Row │ id clast │ String Float64 ─────┼───────────────── 1 │ 1 2.0 2 │ 2 2.0
Use concblq
to set specific rules for BLQ handling
As discussed above, the conblq
argument provides a lot of flexibility in handling BLQ
. In the examples below, we showcase some of these features using the BLQ
values mapped from the data.
The default of read_nca
is as the example below where all BLQ
values in "middle" are dropped from the dataset and the "first" and "last" BLQ
values are retained, this should also make why :drop
was used in the above examples clear, where all our BLQ values were towards the end of subject's observations (last
):
julia> df_b3 = read_nca( df; observations = :conc, llq = 0.2, concblq = Dict(:first => :keep, :middle => :drop, :last => :keep), )
NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 2
We can confirm this:
julia> NCA.auc(df_b3; auctype = :last)
2×2 DataFrame Row │ id auc │ String Float64 ─────┼───────────────── 1 │ 1 26.4333 2 │ 2 15.1
matches the default below:
julia> df_b3a = read_nca(df; observations = :conc, llq = 0.2)
NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 2
julia> NCA.auc(df_b3a; auctype = :last)
2×2 DataFrame Row │ id auc │ String Float64 ─────┼───────────────── 1 │ 1 26.4333 2 │ 2 15.1
Next, we see how to :drop
the last value instead of the default :keep
for "last"
:
julia> df_b4 = read_nca( df; observations = :conc, llq = 0.2, concblq = Dict(:first => :keep, :middle => :drop, :last => :drop), )
NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 2
Comparing to above obtained auc
values:
julia> NCA.auc(df_b4; auctype = :last)
2×2 DataFrame Row │ id auc │ String Float64 ─────┼───────────────── 1 │ 1 24.3333 2 │ 2 14.5
Next, we set it to a unique value for the last concentration:
julia> df_b5 = read_nca( df; observations = :conc, blq = :isblq, llq = 0.6, concblq = Dict(:first => :drop, :middle => :drop, :last => 0.15), )
[ Info: Rows with isblq as 1 are removed from the data, for more control over BLQ handling please refer to `concblq` kwarg NCAPopulation (2 subjects): Number of missing observations: 2 Number of blq observations: 3
We can confirm this:
julia> NCA.clast(df_b5)
2×2 DataFrame Row │ id clast │ String Float64 ─────┼───────────────── 1 │ 1 2.0 2 │ 2 0.15