Visual Predictive Check (VPC)

Pumas allows you to generate VPC quantiles for your data and model simulations and utilize the Julia plotting capabilities to generate the relevant VPC plots. This is allowed with the vpc function discussed below that returns a VPC object. We generate the VPCs with the non-parametric quantile regression approach discussed in the Jamsen et al.

 vpc(fpm::FittedPumasModel,
                reps::Integer = 499,
                qreg_method = IP(),
                vpctype::VPCType = ContinuousVPC();
                dv::Symbol = keys(fpm.data[1].observations)[1],
                stratify_by = nothing,
                quantiles::NTuple{3,Float64}=(0.1, 0.5, 0.9),
                level::Real=0.95,
                ensemblealg=EnsembleSerial(),
                bandwidth=2,
                numstrats=stratify_by === nothing ? nothing : [4 for i in 1:length(stratify_by)])

vpc computes the quantiles for VPC for a FittedPumasModel with simulated confidence intervals around the empirical quantiles based on reps simulated populations.

The number of repetitions of the simulations are passed in with the reps positional argument.

The following keyword arguments are supported:

- `quantiles::NTuple{3,Float64}`: A three-tuple of the quantiles for which the quantiles will be computed. The default is `(0.1, 0.5, 0.9)` which computes the 10th, 50th and 90th percentile.
- `level::Real`: Probability level to use for the simulated confidence intervals around each of the quantiles. The default is `0.95`.
- `dv::Symbol`: The name of the dependent variable to use for the VPCs, passed in as a `Symbol`. The default is the first dependent variable in the dataset.
- `stratify_by`: The covariates to be used for stratification. Takes an array of the `Symbol`s of the stratification covariates.
- `ensemblealg`: This is passed to the `simobs` call while the `reps` simulations. For more description check the docs for `simobs`.
- `bandwidth`: The kernel bandwidth in the quantile regression. If you are seeing `NaN`s or an error, increasing the bandwidth should help in most cases. With higher values of the `bandwidth` you will get more smoothened plots of the quantiles so it's a good idea to check with your data the right `bandwidth`.
- `numstrats`: The number of strata to divide into based on the unique values of the covariate, takes an array with the number of strata for the corresponding covariate passed in `stratify_by`. It takes a default of `4` for each of the covariates.
- `idv`: The independent variable used for the VPC, defaults to `:time`. 
- `sim_idvs`: The independent variable's values used in the simulation (currently limited to values in case of `:time` as the 
`idv` and gets passed in as the `obstimes` kwarg to `simobs`).

Hence a vpc call with 100 nreps would be:

 res = fit(model,data,param,Pumas.FOCEI())
 vpc_fpm = vpc(res, 100)

Stratification on wt covariate

 vpc_fpm_stratwt = vpc(res, stratify_by=[:wt])
 #run only 100 simulations 
 vpc_fpm_stratispm = vpc(res, 100, stratify_by=[:isPM])

The VPC object obtained as the result contains the following fields:

- `simulated_quantiles::DataFrame`: `DataFrame` of the simulated quantiles result.
- `popvpc::PopVPC`: A `PopVPC` object with the observation quantiles (`data_quantiles`), data (`data`), stratification covariate if specified (`stratify_by`) and the dependent variable (`dv`).
- `level::Float64`: The simulation CI's level.

Since the quantiles are stored as DataFrames it is very easy to use the result of vpc and also export it using CSV.jl to your disk.

 vpc_fpm.simulated_quantiles
 vpc_fpm.popvpc.data_quantiles

 using CSV
 write("sim_quantiles.csv", vpc_fpm.simulated_quantiles)

While plotting the obtained VPC object with plot the following keyword arguments allow the option to include or exclude various components with true or false respectively:

- `observations`: Scatter plot of the real observations.
- `simquantile_medians`: The median quantile regression of each quantile from the simulations.
- `observed_quantiles`: The quantile regressions for the real observations.
- `ci_bands`: Shaded region between the upper and lower confidence bounds of each quantile from the simulations.
Info

observations and simquantile_medians are set to false by default.

 using Plots
 plot(vpc_fpm)
 plot(vpc_fpm, observations = true)
 plot(vpc_fpmstratwt, observations = true, ci_bands = false)

Full examples:

Let's take a look at a complete example, first we define the model and generate round-trip data.

model = @model begin
  @param begin
    tvcl ∈ RealDomain(lower=0)
    tvv ∈ RealDomain(lower=0)
    pmoncl ∈ RealDomain(lower = -0.99)
    Ω ∈ PDiagDomain(2)
    σ_prop ∈ RealDomain(lower=0)
  end 

  @random begin
    η ~ MvNormal(Ω)
  end 

  @covariates wt isPM 

  @pre begin
    CL = tvcl * (1 + pmoncl*isPM) * (wt/70)^0.75 * exp(η[1])
    Vc    = tvv * (wt/70) * exp(η[2])
  end 

  @dynamics Central1  

  @derived begin
    cp = @. 1000*(Central / Vc)
    dv ~ @. Normal(cp, sqrt(cp^2*σ_prop))
  end
end

ev = DosageRegimen(100, time=0, addl=2, ii=24)
s1 = Subject(id=1, events=ev, covariates=(isPM=1, wt=70))

param = (
    tvcl = 4.0,
    tvv    = 70,
    pmoncl = -0.7,
    Ω = Diagonal([0.09,0.09]),
    σ_prop = 0.04
    )

choose_covariates() = (isPM = rand([1, 0]),
wt = rand(55:80))
pop_with_covariates = Population(map(i -> Subject(id=i, events=ev, covariates=choose_covariates()),1:10))
obs = simobs(model, pop_with_covariates, param, obstimes=0:1:60)
simdf = DataFrame(obs)
simdf[rand(1:length(simdf.dv), 5), :dv] .= missing
data = read_pumas(simdf, time=:time, covariates=[:isPM, :wt])

One workflow tip we recommend is to use the ability of doing the quantile calculations only for the observed data for adjusting the bandwidth value (defaults to 2) to get a good fit.

vpc_data = vpc(data)
plot(vpc_data)

Data VPC

Bandwidth 2

vpc_data_stratwt = vpc(data, stratify_by=[:wt], bandwidth = 5)
plot(vpc_data_stratwt)

Bandwidth 5

We see the plot with higher bandwidth better captures the data, so we would use the same bandwidth value for VPC of the fitted model, stratified on wt.

Tip

If your vpc run gives you an error message similar to ERROR: LinearAlgebra.LAPACKException(2), increasing the bandwidth should fix it in most cases.

res = fit(model,data,param,Pumas.FOCEI())

vpc_fpm = vpc(res, 100)
plot(vpc_fpm)

FPM VPC

vpc_fpm_stratwt = vpc(res, 100, stratify_by=[:wt], bandwidth = 5)
plot(vpc_fpm_stratwt, legend=false)

WT VPC

Let's pass in the time grid with the sim_idvs kwarg to vpc for the simulation time points.

vpc_fpm = vpc(res, 100, sim_idvs = 0.0:5:60.0)
plot(vpc_fpm)

sim idv

By specifying the plot and the path the VPC plot can be saved to the disk as below:

p = plot(vpc_fpm)
savefig(p, "simidvsvpc.png")
Note

For most users, the method used in quantile regression is not going to be of concern, but if you see large run times switching qreg_method to IP(true) after loading QuantileRegressions.jl should help in improving the performance with a tradeoff in the accuracy of the regression fitting.

VPCs for discrete or time to event data is already implemented and undergoing internal testing, these should become available in upcoming releases. Additionally, prediction corrected VPCs (pcVPCs) are work in progress and will become available soon.