Population PK Walkthrough
A guided tutorial demonstrating how to develop a population pharmacokinetic model using PumasAide. This walkthrough follows a realistic conversation between a pharmacometrician new to Pumas and the AI assistant.
Introduction
This tutorial demonstrates a complete PopPK workflow using PumasAide's MCP (Model Context Protocol) tools. You'll see how to:
- Load and explore PK data in NMTRAN format
- Create concentration-time visualizations
- Build a Pumas population with proper column mappings
- Define a one-compartment oral PK model
- Fit the model using progressive estimation (FO → FOCE)
- Generate and interpret diagnostic plots
The conversation format shows practical questions new users often have.
Getting Started
User: I'd like to learn population PK modeling. I understand the concepts but I've never used Pumas before. Can you walk me through it?
Assistant: I'll guide you through developing a PopPK model from start to finish. Let me explain how PumasAide works first.
PumasAide connects me to your Julia session through three tools. I look up documentation and API references with pumas_knowledge, write Julia scripts to the programs/ directory, and execute them with include_julia_file. Every step produces real, reusable Julia code.
Let me read the PopPK workflow guide first.
Reads the PopPK workflow via pumas_knowledge with doc_id: ["workflows/poppk_analysis.md"]
The workflow has these main steps:
- Load and assess data - Import PK dataset in NMTRAN format
- Explore and visualize - Plot concentration-time profiles
- Build population - Map columns to Pumas requirements
- Define model - Specify compartments, parameters, and error model
- Set initial estimates - Choose reasonable starting values
- Fit model - Use progressive estimation (FO then FOCE)
- Evaluate diagnostics - Check goodness-of-fit plots
I have a sample oral PK dataset from a single ascending dose study. It has 18 subjects across 3 dose levels - a good size for learning. Shall I load it?
User: Yes, let's use that.
Loading the Data
Assistant: I'll load the sample dataset. This is in NMTRAN format, which is the standard structure for population PK data.
Writes programs/01-data/01-load-pk-data.jl and executes with include_julia_file:
using CSV, DataFrames
pk_data = CSV.read(
"../../datasets/poppk_tutorial.csv",
DataFrame;
missingstring = ["NA", ".", ""],
)Result:
Summary of `DataFrame` `pk_data`: 288 rows × 14 columns
| Column | Type | Unique | Missing | Values |
|:----------|:------------------------|-------:|--------:|:--------------------|
| id | Int64 | 18 | 0 | |
| time | Float64 | 16 | 0 | |
| dv | Union{Missing, Float64} | 271 | 18 | |
| amt | Union{Missing, Float64} | 4 | 270 | 30.0, 60.0, 90.0 |
| evid | Int64 | 2 | 0 | 0, 1 |
| cmt | Union{Missing, Int64} | 2 | 270 | 1 |
| rate | Float64 | 1 | 0 | 0.0 |
| age | Int64 | 12 | 0 | |
| wt | Int64 | 16 | 0 | |
| doselevel | Int64 | 3 | 0 | 30, 60, 90 |
| isPM | String | 2 | 0 | "no", "yes" |
| isfed | String | 2 | 0 | "no", "yes" |
| sex | String | 2 | 0 | "female", "male" |
| route | String | 1 | 0 | "ev" |
Preview: http://localhost:8081/preview/pk_dataThe data has 18 subjects across 3 oral dose levels (30, 60, 90 mg). Notice the 18 missing values in dv - these are the dosing records where no concentration was measured.
User: I see columns like evid and cmt. What do those mean?
Assistant: These are standard NMTRAN columns that define the event structure:
evid(event ID): 0 = observation, 1 = dose. The 18 missingdvvalues correspond to evid=1 records.cmt(compartment): Where the dose goes. For oral dosing, typically 1 = depot compartment.amt: Dose amount. Only populated for dosing events (evid=1).rate: Infusion rate. 0 means bolus dosing.
The dataset also has covariates: age, wt (weight), sex, isPM (poor metabolizer status), and isfed (fed/fasted state).
Data Exploration
User: Can we see the concentration-time profiles?
Assistant: I'll plot individual profiles by dose group on a semi-log scale.
Writes programs/02-exploration/01-conc-time-profiles.jl and executes with include_julia_file:
using AlgebraOfGraphics, CairoMakie, DataFramesMeta
pk_obs = @rsubset(pk_data, :evid == 0)
plt =
data(pk_obs) *
mapping(:time, :dv; group = :id => nonnumeric, layout = :doselevel => nonnumeric) *
visual(ScatterLines)
conc_time_plot = draw(
plt;
axis = (
xlabel = "Time (h)",
ylabel = "Concentration (ng/mL)",
title = "Individual Concentration-Time Profiles",
yscale = Makie.pseudolog10,
),
)Result:
Preview: http://localhost:8081/preview/conc_time_plotThe plot shows typical oral PK profiles - absorption phase followed by elimination. A few observations:
- Dose-proportional increases in Cmax across dose groups
- Similar elimination rates suggesting linear kinetics
- Some inter-individual variability, which is what PopPK models capture
Building the Population
Assistant: Now I'll create a Pumas population object. This maps the DataFrame columns to what Pumas expects for fitting.
Writes programs/03-population/01-pumas-pop.jl and executes with include_julia_file:
using Pumas
pop = read_pumas(
pk_data;
id = :id,
time = :time,
observations = [:dv],
amt = :amt,
evid = :evid,
cmt = :cmt,
rate = :rate,
covariates = [:wt, :age, :doselevel, :isPM, :isfed, :sex],
)Result:
Population
Subjects: 18
Covariates: wt, age, doselevel, isPM, isfed, sex
Observations: dvThe population is ready with all 18 subjects. The covariates are included for potential covariate modeling later, though we'll start with a base model.
Building the Model
User: What model structure should we use?
Assistant: For oral dosing with a single elimination phase visible in the profiles, a one-compartment model with first-order absorption is appropriate. I'll use:
Depots1Central1: One depot (absorption) and one central compartment- Proportional error: Common for concentration data
- IIV on Ka, CL, and Vc: Capture between-subject variability in absorption and disposition
Writes programs/04-models/01-base-1cmt-ka.jl and executes with include_julia_file:
using Pumas
base_model = @model begin
@param begin
tvKa ∈ RealDomain(lower = 0.0)
tvCL ∈ RealDomain(lower = 0.0)
tvVc ∈ RealDomain(lower = 0.0)
Ω ∈ PDiagDomain(3)
σ_prop ∈ RealDomain(lower = 0.0)
end
@random begin
η ~ MvNormal(Ω)
end
@pre begin
Ka = tvKa * exp(η[1])
CL = tvCL * exp(η[2])
Vc = tvVc * exp(η[3])
end
@dynamics Depots1Central1
@derived begin
μ := @. Central / Vc
dv ~ @. Normal(μ, abs(μ) * σ_prop)
end
endResult:
PumasModel
Parameters: tvKa, tvCL, tvVc, Ω, σ_prop
Random effects: η
Dynamical system: Depots1Central1
Derived: dv
Observed: dvThe model has:
- Typical values (
tv): Population mean parameters - Omega (
Ω): Diagonal covariance matrix for between-subject variability - Sigma (
σ_prop): Proportional residual error
Initial Parameter Estimates
User: How do you choose initial estimates?
Assistant: Initial estimates should be physiologically reasonable. For a typical small molecule:
- Ka ~1 hr⁻¹: Absorption typically complete within a few hours
- CL ~5 L/h: Moderate clearance
- Vc ~50 L: Reasonable central volume
- ω values ~0.3: About 30% CV for between-subject variability
- σ_prop ~0.15: About 15% proportional error
Writes initial parameters into the same model file:
using LinearAlgebra
init_params =
(tvKa = 1.0, tvCL = 5.0, tvVc = 50.0, Ω = Diagonal([0.09, 0.09, 0.09]), σ_prop = 0.15)These are starting points - the optimizer will find the best values that fit the data.
Model Fitting
Assistant: I'll fit using a progressive approach: start with FO (First-Order) for speed, then refine with FOCE (First-Order Conditional Estimation) for better estimates.
Writes programs/05-fitting/01-fit-fo.jl and executes with include_julia_file:
using Pumas
fo_fit = fit(base_model, pop, init_params, FO())Result:
FittedPumasModel
Likelihood approximation: FO
Termination Reason: GradientNorm
Log-likelihood value: -1510.78
-------------------
Estimate
-------------------
tvKa 0.80068
tvCL 0.0031187
tvVc 0.10692
Ω₁,₁ 4.2535
Ω₂,₂ 0.25614
Ω₃,₃ 0.20980
σ_prop 0.19541
-------------------FO converged. Now let's refine with FOCE using the FO estimates as starting point.
Writes programs/05-fitting/02-fit-foce.jl and executes with include_julia_file:
using Pumas
foce_fit = fit(base_model, pop, coef(fo_fit), FOCE())Result:
FittedPumasModel
Likelihood approximation: FOCE
Termination Reason: GradientNorm
Log-likelihood value: -1493.92
-------------------
Estimate
-------------------
tvKa 1.6828
tvCL 0.0029984
tvVc 0.11074
Ω₁,₁ 0.48921
Ω₂,₂ 0.21324
Ω₃,₃ 0.17909
σ_prop 0.18233
-------------------
# Additional metrics
-2LL: 2987.83
AIC: 3001.83
BIC: 3027.02
η-shrinkage: (η₁ = 0.027, η₂ = -0.011, η₃ = -0.009)
ε-shrinkage: (dv = 0.093)
Preview: http://localhost:8081/preview/foce_fitThe FOCE fit improved the likelihood (-1494 vs -1511). Key observations:
- Low shrinkage (<10% for all ηs): The random effects are well-estimated
- Reasonable variability: ~46% CV for CL, ~42% for Vc
- Model converged: GradientNorm termination indicates successful optimization
Model Diagnostics
User: How do we know if the model fits well?
Assistant: Standard goodness-of-fit (GOF) plots help assess model adequacy. First, I'll extract diagnostic data from the fit results and create the key plots.
Writes programs/06-diagnostics/01-gof-plots.jl and executes with include_julia_file:
using Pumas, AlgebraOfGraphics, CairoMakie, DataFramesMeta
# Extract inspection DataFrame
insp = inspect(foce_fit)
diag_data = @chain DataFrame(insp) begin
@rsubset !ismissing(:dv)
dropmissing([:dv_pred, :dv_ipred, :dv_wres])
end
# Observed vs Population Predictions
plt1 =
data(diag_data) * mapping(:dv_pred, :dv) * visual(Scatter) +
mapping([0], [1]) * visual(ABLines; color = :red, linestyle = :dash)
gof_dv_pred = draw(
plt1;
axis = (
xlabel = "Population Predicted (ng/mL)",
ylabel = "Observed (ng/mL)",
title = "Observed vs Population Predictions",
),
)
# Observed vs Individual Predictions
plt2 =
data(diag_data) * mapping(:dv_ipred, :dv) * visual(Scatter) +
mapping([0], [1]) * visual(ABLines; color = :red, linestyle = :dash)
gof_dv_ipred = draw(
plt2;
axis = (
xlabel = "Individual Predicted (ng/mL)",
ylabel = "Observed (ng/mL)",
title = "Observed vs Individual Predictions",
),
)
# Weighted Residuals vs Time
plt3 =
data(diag_data) * mapping(:time, :dv_wres) * visual(Scatter) +
mapping([0]) * visual(HLines; color = :red, linestyle = :dash)
gof_wres_time = draw(
plt3;
axis = (xlabel = "Time (h)", ylabel = "WRES", title = "Weighted Residuals vs Time"),
)Result:
Preview: http://localhost:8081/preview/gof_dv_pred
Preview: http://localhost:8081/preview/gof_dv_ipred
Preview: http://localhost:8081/preview/gof_wres_timeWhat to look for in each plot:
| Plot | Good Sign | Problem Sign |
|---|---|---|
| DV vs PRED | Points scatter around identity line | Systematic deviation from line |
| DV vs IPRED | Tighter scatter than PRED | Still wide scatter (structural issue) |
| WRES vs TIME | Random scatter around zero | Trends or patterns over time |
The low shrinkage values mean the individual predictions (IPRED) should be reliable.
Generated Files
User: Where did the code go?
Assistant: Every step generated an executable Julia script in the programs/ directory:
programs/
01-data/
01-load-pk-data.jl # Data loading
02-exploration/
01-conc-time-profiles.jl # Profile plot
03-population/
01-pumas-pop.jl # Population building
04-models/
01-base-1cmt-ka.jl # Model definition + initial parameters
05-fitting/
01-fit-fo.jl # FO estimation
02-fit-foce.jl # FOCE estimation
06-diagnostics/
01-gof-plots.jl # Goodness-of-fit plotsThese files are reproducible and can be re-run independently using include_julia_file.
Summary
User: So what did we end up with?
Assistant: Here's what we completed:
- Loaded 288 records from 18 subjects across 3 dose levels
- Explored concentration-time profiles showing typical oral PK
- Built a Pumas population with dosing and covariate information
- Defined a one-compartment oral model with proportional error
- Fit using progressive estimation (FO → FOCE)
- Evaluated model fit using standard GOF diagnostics
The final model:
- One-compartment with first-order absorption (Depots1Central1)
- Ka = 1.68 hr⁻¹ (absorption rate constant)
- CL = 0.003 L/h, Vc = 0.11 L (apparent values)
- Between-subject variability: ~70% on Ka, ~46% on CL, ~42% on Vc
- Low shrinkage indicating reliable individual estimates
Next steps could include:
- Adding covariates (weight on CL/Vc, metabolizer status on CL)
- Testing alternative structural models
- Visual Predictive Check (VPC) for model validation
Key Takeaways
Workflow
- The AI reads the PopPK workflow guide via
pumas_knowledgefor step-by-step guidance - Each step produces a Julia script in
programs/using standard Pumas packages - Visualize profiles before modeling to understand the data
- Build population with
read_pumas()- map all NMTRAN columns - Define models with the
@modelmacro - start simple - Fit progressively: FO first, then FOCE using
coef(fo_fit)as starting point - Check diagnostics with GOF plots from
inspect()
Model Quality Indicators
| Metric | Good | Concern |
|---|---|---|
| Shrinkage (η) | < 20-30% | > 30% suggests overparameterization |
| Shrinkage (ε) | < 20% | > 20% means IPRED unreliable |
| Condition number | < 1000 | > 1000 indicates instability |
| RSE% | < 30-50% | > 50% means imprecise estimates |
Common Mistakes to Avoid
- Skipping data exploration (miss outliers and structural issues)
- Starting with complex models (begin simple, add complexity as needed)
- Ignoring shrinkage (affects reliability of individual estimates)
- Not using progressive estimation (FO provides good starting point for FOCE)