NCA Analysis Walkthrough
A guided tutorial demonstrating how to perform non-compartmental analysis using PumasAide. This walkthrough follows a realistic conversation between a pharmacometrician new to Pumas and the AI assistant.
Introduction
This tutorial demonstrates a complete NCA workflow using PumasAide's MCP (Model Context Protocol) tools. You'll see how to:
- Load and explore PK data
- Create concentration-time visualizations
- Build an NCA population with proper column mappings
- Calculate NCA parameters with quality metrics
- Generate summary tables and plots
The conversation format shows practical questions new users often have.
Getting Started
User: I'd like to learn how to do NCA analysis. I'm familiar with NCA concepts but I've never used Pumas before. Can you walk me through it?
Assistant: I'll guide you through the complete NCA workflow. Before we start, let me explain how PumasAide works.
PumasAide provides specialized tools for pharmacometric analysis. Each tool performs a specific task - loading data, creating plots, running analyses, etc. When I use a tool, you'll see the parameters I'm sending and the results that come back. You can always ask me to adjust my choices before I proceed.
Let me first get the NCA workflow guidance, then we'll work through it step by step.
Calls analysis_workflows with analysis_type: "nca_analysis"
The workflow has these main steps:
- Load and validate data - Import your PK dataset
- Explore and visualize - Plot concentration-time profiles
- Build NCA population - Map your columns to NCA requirements
- Calculate parameters - Run the analysis with quality metrics
- Review results - Summary tables and visualizations
For this tutorial, I have a sample IV PK dataset we can use. It has concentration-time data from a single-dose study with multiple dose levels - perfect for learning NCA. Shall I load it?
User: Yes, let's use that sample data.
Loading the Data
Assistant: I'll load the sample dataset using load_dataset. This tool reads CSV, Arrow, SAS, and other formats into a DataFrame.
Calls load_dataset:
{
"working_directory": "/path/to/project",
"relative_path": "sample_pk_data.csv",
"dataframe_name": "pk_data"
}Tool Result:
# Summary of DataFrame `pk_data`
480 rows and 6 columns.
- id::Int64, 30 unique, 0 missing
- dose::Int64, 3 unique, 0 missing, values: 250, 500, 750
- time::Float64, 16 unique, 0 missing
- conc::Float64, 459 unique, 0 missing
- amt::Int64, 3 unique, 0 missing, values: 250, 500, 750
- route::String, 1 unique, 0 missing, values: "iv"
DataFrame can be viewed at http://localhost:8081/preview/pk_data
Code written to: programs/data_loading/pk_data.jlThe data has 30 subjects across 3 dose groups (250, 500, 750 mg), all IV dosing. The columns are clean with no missing values. You can click the preview link to see the full dataset in your browser.
User: What's the difference between dose and amt? They seem to have the same values.
Assistant: Good question. In NMTRAN/NONMEM-style data:
doseis typically a grouping variable indicating the dose level or cohortamtis the actual amount administered at a dosing event
In this dataset they're identical because each subject received their labeled dose. When we build the NCA population, we'll use amt for calculations (it's what NCA needs for clearance) and dose for grouping the results.
Data Exploration
User: Can we see what the concentration-time profiles look like first?
Assistant: Good idea. I'll plot the individual profiles by dose group on a log scale.
Calls build_plot:
{
"plot_name": "conc_time_plot",
"layers": [{
"data": "pk_data",
"mapping": ":time, :conc; group=:id => nonnumeric, layout=:dose => nonnumeric",
"visual": ["visual(ScatterLines)"]
}],
"title": "Individual Concentration-Time Profiles",
"xlabel": "Time (h)",
"ylabel": "Concentration (ng/mL)",
"yscale": "pseudolog10"
}Tool Result:
Plot can be viewed at http://localhost:8081/preview/conc_time_plot
Code written to: programs/results/conc_time_plot.jlThe plot shows individual profiles for all 30 subjects, faceted by dose group. A few things to note:
pseudolog10scale handles zero values gracefully (unlikelog10which would error)group=:id => nonnumericconnects each subject's points with lineslayout=:dose => nonnumericcreates separate panels per dose
Click the preview link to see the plot. Do the profiles look as expected?
User: Yes, that looks right. I can see clear dose-proportional increases and typical IV elimination. Let's proceed with the NCA.
Building the NCA Population
Assistant: Now we need to build an NCA population. This maps your DataFrame columns to what NCA expects.
I'll include amt and route in the mappings - these are essential for clearance and volume calculations:
amt- the dose amount for calculating CL = Dose/AUCroute- determines whether we report IV parameters (CL, Vz) or extravascular (CL/F, Vz/F)
Without dose information, NCA can only compute exposure metrics like AUC and Cmax.
Calls build_nca_population:
{
"population_name": "nca_pop",
"dataframe_name": "pk_data",
"mappings": {
"id": "id",
"time": "time",
"observations": "conc",
"amt": "amt",
"route": "route",
"group": ["dose"]
}
}Tool Result:
NCAPopulation (30 subjects):
Group: [["dose" => 250], ["dose" => 500], ["dose" => 750]]
Number of missing observations: 0
Number of blq observations: 0
Code written to: programs/data_loading/nca_pop.jlThe population is ready with all 30 subjects grouped by dose. No missing or BLQ (below limit of quantification) observations to worry about.
Running NCA Analysis
User: What parameters should we calculate?
Assistant: For a standard single-dose IV study, we'll want exposure metrics (Cmax, Tmax, AUC), elimination parameters (half-life, clearance, volume), and quality metrics to assess reliability:
rsq_adj_kel- Adjusted R-squared of the terminal slope fit (should be >= 0.8)auc_extrap_obs- Percent of AUC extrapolated to infinity (should be <= 20%)n_samples_kel- Number of points used for terminal slope (should be >= 3)
Calls run_nca_analysis:
{
"population_name": "nca_pop",
"analysis_result_name": "nca_results",
"parameters": [
"cmax",
"tmax",
"auclast",
"aucinf_obs",
"half_life",
"cl_obs",
"vz_obs",
"rsq_adj_kel",
"auc_extrap_obs",
"n_samples_kel"
]
}Tool Result:
# Summary of DataFrame `nca_results`
30 rows and 12 columns.
- id::String, 30 unique, 0 missing
- dose::Int64, 3 unique, 0 missing, values: 250, 500, 750
- cmax::Float64, 30 unique, 0 missing
- tmax::Float64, 5 unique, 0 missing, values: 0.0, 0.25, 0.5, 0.75, 1.0
- auclast::Float64, 30 unique, 0 missing
- aucinf_obs::Float64, 30 unique, 0 missing
- half_life::Float64, 30 unique, 0 missing
- cl_obs::Float64, 30 unique, 0 missing
- vz_obs::Float64, 30 unique, 0 missing
- rsq_adj_kel::Float64, 30 unique, 0 missing
- auc_extrap_obs::Float64, 30 unique, 0 missing
- n_samples_kel::Int64, 13 unique, 0 missing
NCA results can be viewed at http://localhost:8081/preview/nca_results
Code written to: programs/results/nca_results.jlAll 30 subjects have complete results. Click the preview link to review individual values.
User: How do I know if the terminal slope fits are good enough?
Assistant: Check the quality metrics against these thresholds:
| Metric | Threshold | Meaning |
|---|---|---|
rsq_adj_kel | >= 0.8 | Terminal slope fit is reliable |
auc_extrap_obs | <= 20% | AUCinf not overly dependent on extrapolation |
n_samples_kel | >= 3 | Enough points to characterize elimination |
You should flag and review any subjects failing these criteria. In regulatory submissions, subjects with poor terminal slope fits are often excluded from AUCinf summaries.
Summary Tables
User: Can you create a summary table of the main PK parameters by dose group?
Assistant: I'll use table_summary to create a "Table 1" style summary with statistics by dose group.
Calls table_summary:
{
"table_name": "nca_summary",
"table": {
"type": "table_one",
"options": {
"dataframe_name": "nca_results",
"groupby_columns": ["dose"],
"columns_to_analyze": ["cmax", "tmax", "auclast", "aucinf_obs", "half_life"]
}
}
}Tool Result:
─────────────────────────────────────────────────────────────────────────────────
dose
Total 250 500 750
─────────────────────────────────────────────────────────────────────────────────
cmax
Mean (SD) 67.5 (31.4) 34.4 (6.79) 64.2 (9.94) 104 (18.3)
Median [Min,Max] 66.2 [24,140] 33.9 [24,43.9] 66.2 [46.9,75.1] 102 [80,140]
tmax
Mean (SD) 0.258 (0.282) 0.3 (0.197) 0.225 (0.299) 0.25 (0.354)
Median [Min,Max] 0.25 [0,1] 0.25 [0,0.5] 0.125 [0,0.75] 0.125 [0,1]
auclast
Mean (SD) 329 (143) 173 (18.1) 318 (38.4) 497 (74.5)
Median [Min,Max] 315 [146,605] 179 [146,195] 315 [252,374] 487 [401,605]
aucinf_obs
Mean (SD) 337 (150) 178 (19.8) 323 (38.4) 511 (91.4)
Median [Min,Max] 321 [147,687] 181 [147,201] 321 [254,380] 491 [405,687]
half_life
Mean (SD) 3.98 (1.09) 4.13 (1.23) 3.85 (0.653) 3.97 (1.36)
Median [Min,Max] 3.73 [2.58,7.5] 4.12 [2.58,6.48] 3.73 [2.94,5.42] 3.42 [3.14,7.5]
─────────────────────────────────────────────────────────────────────────────────
Table can be viewed at http://localhost:8081/preview/nca_summary
Code written to: programs/analysis/nca_summary.jlKey observations:
- Cmax increases proportionally with dose (34 -> 64 -> 104 ng/mL)
- AUClast shows dose-proportional increases (173 -> 318 -> 497 ng·h/mL)
- Half-life is consistent across doses (~4 hours), as expected for linear PK
- Tmax is rapid (~0.25 h) consistent with IV bolus administration
Parameter Visualization
User: A boxplot of AUC by dose would be helpful for my report.
Assistant: Sure, I'll create one showing AUClast across the dose groups.
Calls build_plot:
{
"plot_name": "pk_params_boxplot",
"layers": [{
"data": "nca_results",
"mapping": ":dose => nonnumeric, :auclast",
"visual": ["visual(BoxPlot)"]
}],
"title": "AUClast by Dose Group",
"xlabel": "Dose (mg)",
"ylabel": "AUClast (ng*h/mL)"
}Tool Result:
Plot can be viewed at http://localhost:8081/preview/pk_params_boxplot
Code written to: programs/results/pk_params_boxplot.jlThe boxplot clearly shows the dose-proportional increase in exposure. The variability (box width) is reasonable and consistent across dose groups.
Generated Files
User: Where did all the code go? You mentioned files being written.
Assistant: Every tool operation generates executable Julia code in the programs/ directory:
programs/
data_loading/
pk_data.jl # Data loading code
nca_pop.jl # NCA population building
results/
conc_time_plot.jl # Concentration-time plot
nca_results.jl # NCA calculation
pk_params_boxplot.jl # Boxplot
analysis/
nca_summary.jl # Summary tableThese files are:
- Reproducible - Run them again to recreate results
- Editable - Modify parameters and re-execute
- Auditable - Each file includes metadata documenting the tool call
You can use include_julia_file to re-run any of these scripts.
Summary
User: So what did we end up with?
Assistant: Here's what we completed:
- Loaded 480 concentration-time observations from 30 subjects
- Visualized individual profiles across 3 dose groups (250, 500, 750 mg)
- Built an NCA population with proper dose and route mappings
- Calculated key PK parameters: Cmax, Tmax, AUClast, AUCinf, half-life, CL, Vz
- Assessed quality using rsqadjkel, aucextrapobs, and nsampleskel
- Generated summary statistics by dose group
- Created a boxplot showing dose-proportional exposure
The data shows linear pharmacokinetics with:
- Dose-proportional Cmax and AUC
- Consistent half-life (~4 hours) across doses
- Rapid Tmax consistent with IV bolus administration
For your next analysis, start a fresh conversation to keep context focused. You can reference the generated scripts in programs/ to understand the code patterns.
Key Takeaways
Tool Workflow
- Always call
analysis_workflowsfirst to get step-by-step guidance - Load data with
load_datasetand check the summary - Visualize before analyzing with
build_plot - Map columns correctly in
build_nca_population- includeamtandroutefor CL/Vz - Include quality metrics in
run_nca_analysis
Quality Thresholds
| Metric | Good | Action if Failed |
|---|---|---|
| rsqadjkel >= 0.8 | Reliable terminal slope | Review/exclude subject |
| aucextrapobs <= 20% | Reliable AUCinf | Use AUClast instead |
| nsampleskel >= 3 | Adequate data | Collect more samples |
Common Mistakes to Avoid
- Forgetting to map
amtandroute(can't calculate CL/Vz without them) - Skipping quality metrics (won't know if results are reliable)
- Not visualizing data before analysis (miss outliers and data issues)