Population PK Walkthrough
A guided tutorial demonstrating how to develop a population pharmacokinetic model using PumasAide. This walkthrough follows a realistic conversation between a pharmacometrician new to Pumas and the AI assistant.
Introduction
This tutorial demonstrates a complete PopPK workflow using PumasAide's MCP (Model Context Protocol) tools. You'll see how to:
- Load and explore PK data in NMTRAN format
- Create concentration-time visualizations
- Build a Pumas population with proper column mappings
- Define a one-compartment oral PK model
- Fit the model using progressive estimation (FO → FOCE)
- Generate and interpret diagnostic plots
The conversation format shows practical questions new users often have.
Getting Started
User: I'd like to learn population PK modeling. I understand the concepts but I've never used Pumas before. Can you walk me through it?
Assistant: I'll guide you through developing a PopPK model from start to finish. Let me explain how PumasAide works first.
PumasAide provides specialized tools for pharmacometric analysis. Each tool performs a specific task - loading data, building models, fitting, creating plots, etc. When I use a tool, you'll see the parameters I'm sending and the results that come back.
Let me get the PopPK workflow guidance first.
Calls analysis_workflows with analysis_type: "poppk_analysis"
The workflow has these main steps:
- Load and assess data - Import PK dataset in NMTRAN format
- Explore and visualize - Plot concentration-time profiles
- Build population - Map columns to Pumas requirements
- Define model - Specify compartments, parameters, and error model
- Set initial estimates - Choose reasonable starting values
- Fit model - Use progressive estimation (FO then FOCE)
- Evaluate diagnostics - Check goodness-of-fit plots
I have a sample oral PK dataset from a single ascending dose study. It has 18 subjects across 3 dose levels - a good size for learning. Shall I load it?
User: Yes, let's use that.
Loading the Data
Assistant: I'll load the sample dataset. This is in NMTRAN format, which is the standard structure for population PK data.
Calls load_dataset:
{
"working_directory": "/path/to/project",
"relative_path": "poppk_tutorial.csv",
"dataframe_name": "pk_data"
}Tool Result:
# Summary of DataFrame `pk_data`
288 rows and 14 columns.
- id::Int64, 18 unique, 0 missing
- time::Float64, 16 unique, 0 missing
- dv::Union{Missing, Float64}, 271 unique, 18 missing
- amt::Union{Missing, Float64}, 4 unique, 270 missing, values: 30.0, 60.0, 90.0
- evid::Int64, 2 unique, 0 missing, values: 0, 1
- cmt::Union{Missing, Int64}, 2 unique, 270 missing, values: 1
- rate::Float64, 1 unique, 0 missing, values: 0.0
- age::Int64, 12 unique, 0 missing
- wt::Int64, 16 unique, 0 missing
- doselevel::Int64, 3 unique, 0 missing, values: 30, 60, 90
- isPM::String, 2 unique, 0 missing, values: "no", "yes"
- isfed::String, 2 unique, 0 missing, values: "no", "yes"
- sex::String, 2 unique, 0 missing, values: "female", "male"
- route::String, 1 unique, 0 missing, values: "ev"
DataFrame can be viewed at http://localhost:8081/preview/pk_data
Code written to: programs/data_loading/pk_data.jlThe data has 18 subjects across 3 oral dose levels (30, 60, 90 mg). Notice the 18 missing values in dv - these are the dosing records where no concentration was measured.
User: I see columns like evid and cmt. What do those mean?
Assistant: These are standard NMTRAN columns that define the event structure:
evid(event ID): 0 = observation, 1 = dose. The 18 missingdvvalues correspond to evid=1 records.cmt(compartment): Where the dose goes. For oral dosing, typically 1 = depot compartment.amt: Dose amount. Only populated for dosing events (evid=1).rate: Infusion rate. 0 means bolus dosing.
The dataset also has covariates: age, wt (weight), sex, isPM (poor metabolizer status), and isfed (fed/fasted state).
Data Exploration
User: Can we see the concentration-time profiles?
Assistant: I'll plot individual profiles by dose group on a semi-log scale.
Calls build_plot:
{
"plot_name": "conc_time_plot",
"layers": [{
"data": "pk_data",
"mapping": ":time, :dv; group=:id => nonnumeric, layout=:doselevel => nonnumeric",
"visual": ["visual(ScatterLines)"]
}],
"title": "Individual Concentration-Time Profiles",
"xlabel": "Time (h)",
"ylabel": "Concentration (ng/mL)",
"yscale": "pseudolog10"
}Tool Result:
Plot can be viewed at http://localhost:8081/preview/conc_time_plot
Code written to: programs/results/conc_time_plot.jlThe plot shows typical oral PK profiles - absorption phase followed by elimination. A few observations:
- Dose-proportional increases in Cmax across dose groups
- Similar elimination rates suggesting linear kinetics
- Some inter-individual variability, which is what PopPK models capture
Building the Population
Assistant: Now I'll create a Pumas population object. This maps the DataFrame columns to what Pumas expects for fitting.
Calls build_pumas_population:
{
"population_name": "pop",
"dataframe_name": "pk_data",
"mappings": {
"id": "id",
"time": "time",
"observations": ["dv"],
"amt": "amt",
"evid": "evid",
"cmt": "cmt",
"rate": "rate",
"covariates": ["wt", "age", "doselevel", "isPM", "isfed", "sex"]
}
}Tool Result:
pop = Population
Subjects: 18
Covariates: wt, age, doselevel, isPM, isfed, sex
Observations: dv
Code written to: programs/data_loading/pop.jlThe population is ready with all 18 subjects. The covariates are included for potential covariate modeling later, though we'll start with a base model.
Building the Model
User: What model structure should we use?
Assistant: For oral dosing with a single elimination phase visible in the profiles, a one-compartment model with first-order absorption is appropriate. I'll use:
Depots1Central1: One depot (absorption) and one central compartment- Proportional error: Common for concentration data
- IIV on Ka, CL, and Vc: Capture between-subject variability in absorption and disposition
Calls build_pumas_model:
{
"model_name": "base_model",
"explicit_model": "Depots1Central1",
"observations": {
"names": ["dv"],
"error": "Proportional"
}
}Tool Result:
PumasModel
Parameters: tvKa, tvCL, tvVc, ω_Ka, ω_CL, ω_Vc, σ_prop
Random effects: η_Ka, η_CL, η_Vc
Dynamical system variables: Depot, Central
Derived: dv
Observed: dv
Code written to: programs/models/base_model.jlThe model has:
- Typical values (
tv): Population mean parameters - Omega parameters (
ω): Standard deviations for between-subject variability - Sigma (
σ_prop): Proportional residual error
Initial Parameter Estimates
User: How do you choose initial estimates?
Assistant: Initial estimates should be physiologically reasonable. For a typical small molecule:
- Ka ~1 hr⁻¹: Absorption typically complete within a few hours
- CL ~5 L/h: Moderate clearance
- Vc ~50 L: Reasonable central volume
- ω values ~0.3: About 30% CV for between-subject variability
- σ_prop ~0.15: About 15% proportional error
Calls pumas_model_parameters:
{
"model_name": "base_model",
"parameters_name": "init_params",
"parameters": "(tvKa=1.0, tvCL=5.0, tvVc=50.0, ω_Ka=0.3, ω_CL=0.3, ω_Vc=0.3, σ_prop=0.15)"
}Tool Result:
(tvKa = 1.0, tvCL = 5.0, tvVc = 50.0, ω_Ka = 0.3, ω_CL = 0.3, ω_Vc = 0.3, σ_prop = 0.15)
Code written to: programs/models/init_params.jlThese are starting points - the optimizer will find the best values that fit the data.
Model Fitting
Assistant: I'll fit using a progressive approach: start with FO (First-Order) for speed, then refine with FOCE (First-Order Conditional Estimation) for better estimates.
Calls fit_pumas_model:
{
"fit_results_name": "fo_fit",
"model_name": "base_model",
"population_name": "pop",
"parameters_name": "init_params",
"estimation_method": "FO"
}Tool Result:
fo_fit = FittedPumasModel
Likelihood approximation: FO
Termination Reason: GradientNorm
Log-likelihood value: -1510.78
-------------------
Estimate
-------------------
tvKa 0.80068
tvCL 0.0031187
tvVc 0.10692
ω_Ka 2.0624
ω_CL 0.5061
ω_Vc 0.45805
σ_prop 0.19541
-------------------
Code written to: programs/results/fo_fit.jlFO converged. Now let's refine with FOCE.
Calls fit_pumas_model:
{
"fit_results_name": "foce_fit",
"model_name": "base_model",
"population_name": "pop",
"parameters_name": "init_params",
"estimation_method": "FOCE"
}Tool Result:
foce_fit = FittedPumasModel
Likelihood approximation: FOCE
Termination Reason: GradientNorm
Log-likelihood value: -1493.92
-------------------
Estimate
-------------------
tvKa 1.6828
tvCL 0.0029984
tvVc 0.11074
ω_Ka 0.69944
ω_CL 0.46178
ω_Vc 0.42319
σ_prop 0.18233
-------------------
# Additional metrics
-2LL: 2987.83
AIC: 3001.83
BIC: 3027.02
η-shrinkage: (η_Ka = 0.027, η_CL = -0.011, η_Vc = -0.009)
ϵ-shrinkage: (dv = 0.093)
Model fit results can be viewed at http://localhost:8081/preview/foce_fit
Code written to: programs/results/foce_fit.jlThe FOCE fit improved the likelihood (-1494 vs -1511). Key observations:
- Low shrinkage (<10% for all ηs): The random effects are well-estimated
- Reasonable variability: ~46% CV for CL, ~42% for Vc
- Model converged: GradientNorm termination indicates successful optimization
Model Diagnostics
User: How do we know if the model fits well?
Assistant: Standard goodness-of-fit (GOF) plots help assess model adequacy. First, I need to extract diagnostic data from the fit results.
Calls data_wrangler:
{
"input_dataframe": "foce_fit",
"output_dataframe": "diag_data",
"operations": ["@rsubset !ismissing(:dv)", "dropmissing([:dv_pred, :dv_ipred, :dv_wres])"],
"comments": "Extract observation rows for diagnostic plots"
}Tool Result:
# Summary of DataFrame `diag_data`
270 rows and 8 columns.
- id, time, dv, dv_pred, dv_ipred, dv_wres, ...
Code written to: programs/data_loading/diag_data.jlThe data_wrangler tool can accept fitted model results as input - it internally calls Pumas.inspect() to extract the diagnostic DataFrame. Now I'll create three key diagnostic plots.
Calls build_plot:
{
"plot_name": "gof_dv_pred",
"layers": [
{"data": "diag_data", "mapping": ":dv_pred, :dv", "visual": ["visual(Scatter)"]},
{"data": "(INTERCEPT=0, SLOPE=1)", "mapping": ":INTERCEPT, :SLOPE", "visual": ["visual(ABLines, color=\"red\", linestyle=:dash)"]}
],
"title": "Observed vs Population Predictions",
"xlabel": "Population Predicted (ng/mL)",
"ylabel": "Observed (ng/mL)"
}Tool Result:
Plot can be viewed at http://localhost:8081/preview/gof_dv_pred
Code written to: programs/results/gof_dv_pred.jlCalls build_plot:
{
"plot_name": "gof_dv_ipred",
"layers": [
{"data": "diag_data", "mapping": ":dv_ipred, :dv", "visual": ["visual(Scatter)"]},
{"data": "(INTERCEPT=0, SLOPE=1)", "mapping": ":INTERCEPT, :SLOPE", "visual": ["visual(ABLines, color=\"red\", linestyle=:dash)"]}
],
"title": "Observed vs Individual Predictions",
"xlabel": "Individual Predicted (ng/mL)",
"ylabel": "Observed (ng/mL)"
}Tool Result:
Plot can be viewed at http://localhost:8081/preview/gof_dv_ipred
Code written to: programs/results/gof_dv_ipred.jlCalls build_plot:
{
"plot_name": "gof_wres_time",
"layers": [
{"data": "diag_data", "mapping": ":time, :dv_wres", "visual": ["visual(Scatter)"]},
{"data": "(Y=0.0,)", "mapping": ":Y", "visual": ["visual(HLines, color=\"red\", linestyle=:dash)"]}
],
"title": "Weighted Residuals vs Time",
"xlabel": "Time (h)",
"ylabel": "WRES"
}Tool Result:
Plot can be viewed at http://localhost:8081/preview/gof_wres_time
Code written to: programs/results/gof_wres_time.jlWhat to look for in each plot:
| Plot | Good Sign | Problem Sign |
|---|---|---|
| DV vs PRED | Points scatter around identity line | Systematic deviation from line |
| DV vs IPRED | Tighter scatter than PRED | Still wide scatter (structural issue) |
| WRES vs TIME | Random scatter around zero | Trends or patterns over time |
The low shrinkage values mean the individual predictions (IPRED) should be reliable.
Generated Files
User: Where did the code go?
Assistant: Every tool operation generates executable Julia code in the programs/ directory:
programs/
data_loading/
pk_data.jl # Data loading
pop.jl # Population building
models/
base_model.jl # Model definition
init_params.jl # Initial parameters
results/
fo_fit.jl # FO estimation
foce_fit.jl # FOCE estimation
conc_time_plot.jl # Profile plot
gof_dv_pred.jl # DV vs PRED
gof_dv_ipred.jl # DV vs IPRED
gof_wres_time.jl # ResidualsThese files are reproducible and can be re-run independently using include_julia_file.
Summary
User: So what did we end up with?
Assistant: Here's what we completed:
- Loaded 288 records from 18 subjects across 3 dose levels
- Explored concentration-time profiles showing typical oral PK
- Built a Pumas population with dosing and covariate information
- Defined a one-compartment oral model with proportional error
- Fit using progressive estimation (FO → FOCE)
- Evaluated model fit using standard GOF diagnostics
The final model:
- One-compartment with first-order absorption (Depots1Central1)
- Ka = 1.68 hr⁻¹ (absorption rate constant)
- CL = 0.003 L/h, Vc = 0.11 L (apparent values)
- Between-subject variability: ~70% on Ka, ~46% on CL, ~42% on Vc
- Low shrinkage indicating reliable individual estimates
Next steps could include:
- Adding covariates (weight on CL/Vc, metabolizer status on CL)
- Testing alternative structural models
- Visual Predictive Check (VPC) for model validation
For your next analysis, start a fresh conversation to keep context focused.
Key Takeaways
Tool Workflow
- Always call
analysis_workflowsfirst for guidance - Load data with
load_datasetand check the summary - Visualize profiles with
build_plotbefore modeling - Build population with
build_pumas_population- map all NMTRAN columns - Define model with
build_pumas_model- start simple - Set initial estimates with
pumas_model_parameters - Fit progressively:
fit_pumas_modelwith FO then FOCE - Check diagnostics with GOF plots
Model Quality Indicators
| Metric | Good | Concern |
|---|---|---|
| Shrinkage (η) | < 20-30% | > 30% suggests overparameterization |
| Shrinkage (ε) | < 20% | > 20% means IPRED unreliable |
| Condition number | < 1000 | > 1000 indicates instability |
| RSE% | < 30-50% | > 50% means imprecise estimates |
Common Mistakes to Avoid
- Skipping data exploration (miss outliers and structural issues)
- Starting with complex models (begin simple, add complexity as needed)
- Ignoring shrinkage (affects reliability of individual estimates)
- Not using progressive estimation (FO provides good starting point for FOCE)