Visual Predictive Check (VPC)
Pumas allows you to generate VPC quantiles for your data and model simulations and utilize the Julia plotting capabilities to generate the relevant VPC plots. This is allowed with the vpc
function discussed below that returns a VPC
object, this can then be passed to the vpc_plot
function to generate the plots.
Overview
vpc(fpm::AbstractFittedPumasModel;
samples::Integer=499,
qreg_method=IP(),
observations::Array{Symbol} = [keys(extract_subject(fpm.data[1]).observations)[1]],
stratify_by::Array{Symbol} = Symbol[],
quantiles::NTuple{3,Float64}=(0.1, 0.5, 0.9),
level::Real=0.95,
ensemblealg::DiffEqBase.EnsembleAlgorithm=EnsembleSerial(),
bandwidth::Real=2,
maxnumstrats::Array{<:Integer} = [6 for i in 1:length(stratify_by)],
covariates::Array{Symbol} = [:time],
smooth::Bool = true,
rng::AbstractRNG=default_rng(),
obstimes::AbstractVector = [])
The vpc
computes the quantiles for VPC for an AbstractFittedPumasModel
with simulated confidence intervals around the empirical quantiles based on reps
simulated populations.
The number of repetitions of the simulations are passed in with the reps
positional argument.
The following keyword arguments are supported:
samples
: The number of simulated populations to generate, defaults to499
.quantiles
: A three-tuple of the quantiles for which the quantiles will be computed. The default is(0.1, 0.5, 0.9)
which computes the 10th, 50th and 90th percentile.level
: Probability level to use for the simulated confidence intervals around each of the quantiles. The default is0.95
.observations
: The name of the dependent variable to use for the VPCs, passed in as aSymbol
wrapped in an array. The default is the first dependent variable in the population. Even though the argument takes an array note that only single derived variable at a time is supported.stratify_by
: The covariates to be used for stratification. Takes an array of theSymbol
s of the stratification covariates.ensemblealg
: This is passed to thesimobs
call while thereps
simulations. For more description check the docs forsimobs
.bandwidth
: The kernel bandwidth in the quantile regression. If you are seeingNaN
s or an error, increasing the bandwidth should help in most cases. With higher values of thebandwidth
you will get more smoothened plots of the quantiles so it's a good idea to check with your data the rightbandwidth
.maxnumstrats
: The maximum number of strata for each covariate. Takes an array with the number of strata for the corresponding covariate, passed instratify_by
. It defaults to6
for each of the covariates.covariates
: The independent variable for VPC, defaults to[:time]
. Similar toobservations
it requires a single element array.obstimes
: The times for simulation in case of continuous VPC, same asobstimes
insimobs
. Defaults to union of all subject's unique times in the data where observation is not missing.qreg_method
: Defaults toIP()
. For most users the method used in quantile regression is not going to be of concern, but if you see large run times switchingqreg_method
toIP(true)
should help in improving the performance with a tradeoff in the accuracy of the fitting.
Hence a vpc
call with 100 repetitions of simulation would be:
res = fit(model, data, param,Pumas.FOCE())
vpc_fpm = vpc(res, samples = 100)
Stratification on wt
covariate
vpc_fpm_stratwt = vpc(res, stratify_by=[:wt])
# run only 100 simulations
vpc_fpm_stratispm = vpc(res, samples = 100, stratify_by=[:isPM])
The VPC
object obtained as the result contains the following fields:
simulated_quantiles::DataFrame
:DataFrame
of the simulated quantiles result.popvpc::PopVPC
: APopVPC
object with the observation quantiles (data_quantiles
), data (data
), stratification covariate if specified (stratify_by
) and the dependent variable (dv
).level::Float64
: The simulation CI'slevel
.
Since the quantiles are stored as DataFrame
s it is very easy to use the result of vpc
and also export it using CSV.jl to your disk.
vpc_fpm.simulated_quantiles
vpc_fpm.popvpc.data_quantiles
using CSV
CSV.write("sim_quantiles.csv", vpc_fpm.simulated_quantiles)
In Pumas, for models with continuous derived variables, the non-parametric quantile regression approach discussed in Jamsen et al. is used. Discrete and Time to Event models are also supported through the vpc
function and can be used without any change in syntax.
Full examples:
Let's take a look at a complete example, first we define the model and generate round-trip data.
julia> model = @model begin @param begin tvcl ∈ RealDomain(lower=0) tvv ∈ RealDomain(lower=0) pmoncl ∈ RealDomain(lower = -0.99) Ω ∈ PDiagDomain(2) σ_prop ∈ RealDomain(lower=0) end @random begin η ~ MvNormal(Ω) end @covariates wt isPM @pre begin CL = tvcl * (1 + pmoncl*isPM) * (wt/70)^0.75 * exp(η[1]) Vc = tvv * (wt/70) * exp(η[2]) end @dynamics Central1 @derived begin cp = @. 1000*(Central / Vc) dv ~ @. Normal(cp, sqrt(cp^2*σ_prop)) end end
PumasModel Parameters: tvcl, tvv, pmoncl, Ω, σ_prop Random effects: η Covariates: wt, isPM Dynamical variables: Central Derived: cp, dv Observed: cp, dv
julia> ev = DosageRegimen(100, time=0, addl=2, ii=24)
DosageRegimen Row │ time cmt amt evid ii addl rate duration ss route │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 NCA.Route ─────┼─────────────────────────────────────────────────────────────────────────────────── 1 │ 0.0 1 100.0 1 24.0 2 0.0 0.0 0 NullRoute
julia> s1 = Subject(id=1, events=ev, covariates=(isPM=1, wt=70))
Subject ID: 1 Events: 3 Covariates: isPM, wt
julia> param = ( tvcl = 4.0, tvv = 70, pmoncl = -0.7, Ω = Diagonal([0.09,0.09]), σ_prop = 0.04 )
(tvcl = 4.0, tvv = 70, pmoncl = -0.7, Ω = [0.09 0.0; 0.0 0.09], σ_prop = 0.04)
julia> choose_covariates() = (isPM = rand([1, 0]), wt = rand(55:80))
choose_covariates (generic function with 1 method)
julia> pop_with_covariates = Population(map(i -> Subject(id=i, events=ev, covariates=choose_covariates()),1:10))
Population Subjects: 10 Covariates: isPM, wt Observations:
julia> obs = simobs(model, pop_with_covariates, param, obstimes=0:1:60)
Simulated population (Vector{<:Subject}) Simulated subjects: 10 Simulated variables: cp, dv
julia> simdf = DataFrame(obs)
640×10 DataFrame Row │ id time cp dv amt evid cmt rate isPM wt │ String Float64 Float64? Float64? Float64? Int8? Int64? Float64? Int64? Int64? ─────┼─────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 1 0.0 missing missing 100.0 1 1 0.0 1 74 2 │ 1 0.0 1400.08 1019.75 missing 0 missing 0.0 1 74 3 │ 1 1.0 1377.5 1600.09 missing 0 missing 0.0 1 74 4 │ 1 2.0 1355.29 1439.13 missing 0 missing 0.0 1 74 5 │ 1 3.0 1333.44 1438.5 missing 0 missing 0.0 1 74 6 │ 1 4.0 1311.94 1255.26 missing 0 missing 0.0 1 74 7 │ 1 5.0 1290.78 1403.97 missing 0 missing 0.0 1 74 8 │ 1 6.0 1269.97 1449.04 missing 0 missing 0.0 1 74 9 │ 1 7.0 1249.49 1685.99 missing 0 missing 0.0 1 74 10 │ 1 8.0 1229.35 1420.19 missing 0 missing 0.0 1 74 11 │ 1 9.0 1209.52 862.887 missing 0 missing 0.0 1 74 ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 631 │ 10 51.0 2051.6 1493.43 missing 0 missing 0.0 1 74 632 │ 10 52.0 2008.6 1572.71 missing 0 missing 0.0 1 74 633 │ 10 53.0 1966.5 1521.81 missing 0 missing 0.0 1 74 634 │ 10 54.0 1925.28 2304.66 missing 0 missing 0.0 1 74 635 │ 10 55.0 1884.93 1704.11 missing 0 missing 0.0 1 74 636 │ 10 56.0 1845.42 1848.67 missing 0 missing 0.0 1 74 637 │ 10 57.0 1806.74 1498.26 missing 0 missing 0.0 1 74 638 │ 10 58.0 1768.87 1672.18 missing 0 missing 0.0 1 74 639 │ 10 59.0 1731.79 1723.38 missing 0 missing 0.0 1 74 640 │ 10 60.0 1695.49 1288.17 missing 0 missing 0.0 1 74 619 rows omitted
julia> data = read_pumas(simdf, time=:time, covariates=[:isPM, :wt])
Population Subjects: 10 Covariates: isPM, wt Observations: dv
julia> res = fit(model, data, param, Pumas.FOCE())
Iter Function value Gradient norm 0 4.314713e+03 1.621909e+01 * time: 0.03275179862976074 1 4.313178e+03 9.958573e+00 * time: 1.7549538612365723 2 4.313039e+03 3.271653e+00 * time: 1.7831318378448486 3 4.313027e+03 5.324438e+00 * time: 1.8000469207763672 4 4.312903e+03 2.555673e+00 * time: 1.825730800628662 5 4.312842e+03 5.836348e-01 * time: 1.8431658744812012 6 4.312825e+03 4.448720e-01 * time: 1.859853982925415 7 4.312820e+03 5.169640e-02 * time: 1.876734972000122 8 4.312820e+03 6.393095e-03 * time: 1.891650915145874 9 4.312820e+03 5.802140e-04 * time: 1.9043848514556885 FittedPumasModel Successful minimization: true Likelihood approximation: Pumas.FOCE Log-likelihood value: -4312.8201 Number of subjects: 10 Number of parameters: Fixed Optimized 0 6 Observation records: Active Missing dv: 610 0 Total: 610 0 --------------------- Estimate --------------------- tvcl 4.4698 tvv 68.07 pmoncl -0.67974 Ω₁,₁ 0.10838 Ω₂,₂ 0.071639 σ_prop 0.038034 ---------------------
julia> vpc_fpm = vpc(res);
julia> vpc_plot(vpc_fpm)
Alternatively, using a PumasEMModel
emmodel = @emmodel begin
@random begin
CLdw ~ 1 + isPM | LogNormal # CLdw = μ_CL * exp(μ_CL_isPM * isPM) * exp(η_CL)
Vcdw ~ 1 | LogNormal # Vcdw = μ_Vc * exp(η_Vc)
end
@covariates wt
@pre begin
CL = CLdw * (wt/70)^0.75
Vc = Vcdw * (wt/70)
end
@dynamics Central1
@post begin
cp = 1000 * Central / Vc
end
@error begin
dv ~ ProportionalNormal(cp)
end
end
emparam = (
CLdw = (4.0, -1.2), # -1.2 ≈ log(1 - 0.7) = log(1 - pmoncl)
Vcdw = 70,
Ω = (0.09,0.09),
σ = (0.04,)
)
emres = fit(emmodel, data, emparam, Pumas.LaplaceI())
vpc_emfpm = vpc(emres)
Note that because isPM
equals either 0
or 1
, pmoncl = exp(μ_CL_isPM) - 1
, allowing us to perform a change of variables and express the earlier PumasModel
as a PumasEMModel
.
In PumasModel
s and PumasEMModel
s, you can increase performance by preprocessing covariates. Instead of wt
, you can try storing (wt/70)
and (wt/70)^0.75
as appropriately named covariates.
vpc_plot(vpc_emfpm)
Let's generate a VPC stratified on isPM
covariate with stratify_by
. Note that multiple covariates can be passed to stratify_by
to get cross stratified VPC plots, in that case the maxnumstrats
will have to be adjusted accordingly.
resem_saem = fit(emmodel, data, (CLdw = (5.0,0.0), Vcdw = 50.0), Pumas.SAEM())
vpc_fpmem_saem = vpc(resem_saem, samples = 100)
vpc_plot(vpc_fpmem_saem)
When fitting with SAEM, it is strongly recomended to use values for the Ω
parameters much larger than your estiamtes of the true values, or to avoid providing them so they default to 1.0
along the diagonal. Using small values such as emparams
will result in convergence failure.
vpc_fpm = vpc(res, stratify_by = [:isPM])
vpc_plot(vpc_fpm)
vpc_saem_fpm = vpc(resem_saem, stratify_by = [:isPM]);
vpc_plot(vpc_saem_fpm)
Let's pass in the time grid with the obstimes
keyword argument to vpc
for the simulation time points.
vpc_fpm = vpc(res, 100, obstimes = 0.0:5:60.0)
vpc_plot(vpc_fpm)
By specifying the plot and the path the VPC plot can be saved to the disk as below:
p = vpc_plot(vpc_fpm)
CairoMakie.save("obstimesvpc.png", p)
More examples of VPC plots can be found in the plotting page's vpc_plot
section.
If your vpc
run gives you an error message similar to ERROR: LinearAlgebra.LAPACKException(2)
, increasing the bandwidth
should fix it in most cases.
For most users, the method used in quantile regression is not going to be of concern, but if you see large run times switching qreg_method
to IP(true)
after loading QuantileRegressions.jl should help in improving the performance with a tradeoff in the accuracy of the regression fitting.
Prediction corrected VPCs (pcVPCs) are work in progress and will become available soon.