Visual Predictive Check (VPC)

Pumas allows you to generate VPC quantiles for your data and model simulations and utilize the Julia plotting capabilities to generate the relevant VPC plots. This is allowed with the vpc function discussed below that returns a VPC object, this can then be passed to the vpc_plot function to generate the plots.

Overview

vpc(fpm::AbstractFittedPumasModel;
  samples::Integer=499,
  qreg_method=IP(),
  observations::Array{Symbol} = [keys(extract_subject(fpm.data[1]).observations)[1]],
  stratify_by::Array{Symbol} = Symbol[],
  quantiles::NTuple{3,Float64}=(0.1, 0.5, 0.9),
  level::Real=0.95,
  ensemblealg::DiffEqBase.EnsembleAlgorithm=EnsembleSerial(),
  bandwidth::Real=2,
  maxnumstrats::Array{<:Integer} = [6 for i in 1:length(stratify_by)],
  covariates::Array{Symbol} = [:time],
  smooth::Bool = true,
  rng::AbstractRNG=default_rng(),
  obstimes::AbstractVector = [])

The vpc computes the quantiles for VPC for an AbstractFittedPumasModel with simulated confidence intervals around the empirical quantiles based on reps simulated populations.

The number of repetitions of the simulations are passed in with the reps positional argument.

The following keyword arguments are supported:

  • samples: The number of simulated populations to generate, defaults to 499.
  • quantiles: A three-tuple of the quantiles for which the quantiles will be computed. The default is (0.1, 0.5, 0.9) which computes the 10th, 50th and 90th percentile.
  • level: Probability level to use for the simulated confidence intervals around each of the quantiles. The default is 0.95.
  • observations: The name of the dependent variable to use for the VPCs, passed in as a Symbol wrapped in an array. The default is the first dependent variable in the population. Even though the argument takes an array note that only single derived variable at a time is supported.
  • stratify_by: The covariates to be used for stratification. Takes an array of the Symbols of the stratification covariates.
  • ensemblealg: This is passed to the simobs call while the reps simulations. For more description check the docs for simobs.
  • bandwidth: The kernel bandwidth in the quantile regression. If you are seeing NaNs or an error, increasing the bandwidth should help in most cases. With higher values of the bandwidth you will get more smoothened plots of the quantiles so it's a good idea to check with your data the right bandwidth.
  • maxnumstrats: The maximum number of strata for each covariate. Takes an array with the number of strata for the corresponding covariate, passed in stratify_by. It defaults to 6 for each of the covariates.
  • covariates: The independent variable for VPC, defaults to [:time]. Similar to observations it requires a single element array.
  • obstimes: The times for simulation in case of continuous VPC, same as obstimes in simobs. Defaults to union of all subject's unique times in the data where observation is not missing.
  • qreg_method: Defaults to IP(). For most users the method used in quantile regression is not going to be of concern, but if you see large run times switching qreg_method to IP(true) should help in improving the performance with a tradeoff in the accuracy of the fitting.

Hence a vpc call with 100 repetitions of simulation would be:

res = fit(model, data, param,Pumas.FOCE())
vpc_fpm = vpc(res, samples = 100)

Stratification on wt covariate

vpc_fpm_stratwt = vpc(res, stratify_by=[:wt])
# run only 100 simulations 
vpc_fpm_stratispm = vpc(res, samples = 100, stratify_by=[:isPM])

The VPC object obtained as the result contains the following fields:

  • simulated_quantiles::DataFrame: DataFrame of the simulated quantiles result.
  • popvpc::PopVPC: A PopVPC object with the observation quantiles (data_quantiles), data (data), stratification covariate if specified (stratify_by) and the dependent variable (dv).
  • level::Float64: The simulation CI's level.

Since the quantiles are stored as DataFrames it is very easy to use the result of vpc and also export it using CSV.jl to your disk.

vpc_fpm.simulated_quantiles
vpc_fpm.popvpc.data_quantiles

using CSV
CSV.write("sim_quantiles.csv", vpc_fpm.simulated_quantiles)

In Pumas, for models with continuous derived variables, the non-parametric quantile regression approach discussed in Jamsen et al. is used. Discrete and Time to Event models are also supported through the vpc function and can be used without any change in syntax.

Full examples:

Let's take a look at a complete example, first we define the model and generate round-trip data.

julia> model = @model begin
         @param begin
           tvcl ∈ RealDomain(lower=0)
           tvv ∈ RealDomain(lower=0)
           pmoncl ∈ RealDomain(lower = -0.99)
           Ω ∈ PDiagDomain(2)
           σ_prop ∈ RealDomain(lower=0)
         end
       
         @random begin
           η ~ MvNormal(Ω)
         end
       
         @covariates wt isPM
       
         @pre begin
           CL = tvcl * (1 + pmoncl*isPM) * (wt/70)^0.75 * exp(η[1])
           Vc    = tvv * (wt/70) * exp(η[2])
         end
       
         @dynamics Central1
       
         @derived begin
           cp = @. 1000*(Central / Vc)
           dv ~ @. Normal(cp, sqrt(cp^2*σ_prop))
         end
       endPumasModel
  Parameters: tvcl, tvv, pmoncl, Ω, σ_prop
  Random effects: η
  Covariates: wt, isPM
  Dynamical variables: Central
  Derived: cp, dv
  Observed: cp, dv
julia> ev = DosageRegimen(100, time=0, addl=2, ii=24)DosageRegimen Row │ time cmt amt evid ii addl rate duration ss route │ Float64 Int64 Float64 Int8 Float64 Int64 Float64 Float64 Int8 NCA.Route ─────┼─────────────────────────────────────────────────────────────────────────────────── 1 │ 0.0 1 100.0 1 24.0 2 0.0 0.0 0 NullRoute
julia> s1 = Subject(id=1, events=ev, covariates=(isPM=1, wt=70))Subject ID: 1 Events: 3 Covariates: isPM, wt
julia> param = ( tvcl = 4.0, tvv = 70, pmoncl = -0.7, Ω = Diagonal([0.09,0.09]), σ_prop = 0.04 )(tvcl = 4.0, tvv = 70, pmoncl = -0.7, Ω = [0.09 0.0; 0.0 0.09], σ_prop = 0.04)
julia> choose_covariates() = (isPM = rand([1, 0]), wt = rand(55:80))choose_covariates (generic function with 1 method)
julia> pop_with_covariates = Population(map(i -> Subject(id=i, events=ev, covariates=choose_covariates()),1:10))Population Subjects: 10 Covariates: isPM, wt Observations:
julia> obs = simobs(model, pop_with_covariates, param, obstimes=0:1:60)Simulated population (Vector{<:Subject}) Simulated subjects: 10 Simulated variables: cp, dv
julia> simdf = DataFrame(obs)640×10 DataFrame Row │ id time cp dv amt evid cmt rate isPM wt │ String Float64 Float64? Float64? Float64? Int8? Int64? Float64? Int64? Int64? ─────┼─────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 1 0.0 missing missing 100.0 1 1 0.0 1 74 2 │ 1 0.0 1400.08 1019.75 missing 0 missing 0.0 1 74 3 │ 1 1.0 1377.5 1600.09 missing 0 missing 0.0 1 74 4 │ 1 2.0 1355.29 1439.13 missing 0 missing 0.0 1 74 5 │ 1 3.0 1333.44 1438.5 missing 0 missing 0.0 1 74 6 │ 1 4.0 1311.94 1255.26 missing 0 missing 0.0 1 74 7 │ 1 5.0 1290.78 1403.97 missing 0 missing 0.0 1 74 8 │ 1 6.0 1269.97 1449.04 missing 0 missing 0.0 1 74 9 │ 1 7.0 1249.49 1685.99 missing 0 missing 0.0 1 74 10 │ 1 8.0 1229.35 1420.19 missing 0 missing 0.0 1 74 11 │ 1 9.0 1209.52 862.887 missing 0 missing 0.0 1 74 ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 631 │ 10 51.0 2051.6 1493.43 missing 0 missing 0.0 1 74 632 │ 10 52.0 2008.6 1572.71 missing 0 missing 0.0 1 74 633 │ 10 53.0 1966.5 1521.81 missing 0 missing 0.0 1 74 634 │ 10 54.0 1925.28 2304.66 missing 0 missing 0.0 1 74 635 │ 10 55.0 1884.93 1704.11 missing 0 missing 0.0 1 74 636 │ 10 56.0 1845.42 1848.67 missing 0 missing 0.0 1 74 637 │ 10 57.0 1806.74 1498.26 missing 0 missing 0.0 1 74 638 │ 10 58.0 1768.87 1672.18 missing 0 missing 0.0 1 74 639 │ 10 59.0 1731.79 1723.38 missing 0 missing 0.0 1 74 640 │ 10 60.0 1695.49 1288.17 missing 0 missing 0.0 1 74 619 rows omitted
julia> data = read_pumas(simdf, time=:time, covariates=[:isPM, :wt])Population Subjects: 10 Covariates: isPM, wt Observations: dv
julia> res = fit(model, data, param, Pumas.FOCE())Iter Function value Gradient norm 0 4.314713e+03 1.621909e+01 * time: 0.03275179862976074 1 4.313178e+03 9.958573e+00 * time: 1.7549538612365723 2 4.313039e+03 3.271653e+00 * time: 1.7831318378448486 3 4.313027e+03 5.324438e+00 * time: 1.8000469207763672 4 4.312903e+03 2.555673e+00 * time: 1.825730800628662 5 4.312842e+03 5.836348e-01 * time: 1.8431658744812012 6 4.312825e+03 4.448720e-01 * time: 1.859853982925415 7 4.312820e+03 5.169640e-02 * time: 1.876734972000122 8 4.312820e+03 6.393095e-03 * time: 1.891650915145874 9 4.312820e+03 5.802140e-04 * time: 1.9043848514556885 FittedPumasModel Successful minimization: true Likelihood approximation: Pumas.FOCE Log-likelihood value: -4312.8201 Number of subjects: 10 Number of parameters: Fixed Optimized 0 6 Observation records: Active Missing dv: 610 0 Total: 610 0 --------------------- Estimate --------------------- tvcl 4.4698 tvv 68.07 pmoncl -0.67974 Ω₁,₁ 0.10838 Ω₂,₂ 0.071639 σ_prop 0.038034 ---------------------
julia> vpc_fpm = vpc(res);

julia> vpc_plot(vpc_fpm)

FPM VPC

Alternatively, using a PumasEMModel

emmodel = @emmodel begin
  @random begin
    CLdw ~ 1 + isPM | LogNormal # CLdw = μ_CL * exp(μ_CL_isPM * isPM) * exp(η_CL)
    Vcdw ~ 1        | LogNormal # Vcdw = μ_Vc * exp(η_Vc)
  end
  @covariates wt
  @pre begin
    CL = CLdw * (wt/70)^0.75
    Vc = Vcdw * (wt/70)
  end
  @dynamics Central1
  @post begin
    cp = 1000 * Central / Vc
  end
  @error begin
    dv ~ ProportionalNormal(cp)
  end
end

emparam = (
    CLdw = (4.0, -1.2), # -1.2 ≈ log(1 - 0.7) = log(1 - pmoncl)
    Vcdw    = 70,
    Ω = (0.09,0.09),
    σ = (0.04,)
)

emres = fit(emmodel, data, emparam, Pumas.LaplaceI())

vpc_emfpm = vpc(emres)

Note that because isPM equals either 0 or 1, pmoncl = exp(μ_CL_isPM) - 1, allowing us to perform a change of variables and express the earlier PumasModel as a PumasEMModel.

Tip

In PumasModels and PumasEMModels, you can increase performance by preprocessing covariates. Instead of wt, you can try storing (wt/70) and (wt/70)^0.75 as appropriately named covariates.

vpc_plot(vpc_emfpm)

EMMODEL-FPM VPC

Let's generate a VPC stratified on isPM covariate with stratify_by. Note that multiple covariates can be passed to stratify_by to get cross stratified VPC plots, in that case the maxnumstrats will have to be adjusted accordingly.

resem_saem = fit(emmodel, data, (CLdw = (5.0,0.0), Vcdw = 50.0), Pumas.SAEM())

vpc_fpmem_saem = vpc(resem_saem, samples = 100)
vpc_plot(vpc_fpmem_saem)

SAEM-FPM VPC

Tip

When fitting with SAEM, it is strongly recomended to use values for the Ω parameters much larger than your estiamtes of the true values, or to avoid providing them so they default to 1.0 along the diagonal. Using small values such as emparams will result in convergence failure.

vpc_fpm = vpc(res, stratify_by = [:isPM])
vpc_plot(vpc_fpm)

vpc_fpm_strat

vpc_saem_fpm = vpc(resem_saem, stratify_by = [:isPM]);
vpc_plot(vpc_saem_fpm)

vpc_saemfpm_strat

Let's pass in the time grid with the obstimes keyword argument to vpc for the simulation time points.

vpc_fpm = vpc(res, 100, obstimes = 0.0:5:60.0)
vpc_plot(vpc_fpm)

vpc_fpm_obstimes

By specifying the plot and the path the VPC plot can be saved to the disk as below:

p = vpc_plot(vpc_fpm)
CairoMakie.save("obstimesvpc.png", p)

More examples of VPC plots can be found in the plotting page's vpc_plot section.

Tip

If your vpc run gives you an error message similar to ERROR: LinearAlgebra.LAPACKException(2), increasing the bandwidth should fix it in most cases.

Note

For most users, the method used in quantile regression is not going to be of concern, but if you see large run times switching qreg_method to IP(true) after loading QuantileRegressions.jl should help in improving the performance with a tradeoff in the accuracy of the regression fitting.

Prediction corrected VPCs (pcVPCs) are work in progress and will become available soon.