NCA Analysis Walkthrough
A guided tutorial demonstrating how to perform non-compartmental analysis using PumasAide. This walkthrough follows a realistic conversation between a pharmacometrician new to Pumas and the AI assistant.
Introduction
This tutorial demonstrates a complete NCA workflow using PumasAide's MCP (Model Context Protocol) tools. You'll see how to:
- Load and explore PK data
- Create concentration-time visualizations
- Build an NCA population with proper column mappings
- Calculate NCA parameters with quality metrics
- Generate summary tables and plots
The conversation format shows practical questions new users often have.
Getting Started
User: I'd like to learn how to do NCA analysis. I'm familiar with NCA concepts but I've never used Pumas before. Can you walk me through it?
Assistant: I'll guide you through the complete NCA workflow. Before we start, let me explain how PumasAide works.
PumasAide connects me to your Julia session through three tools. I look up documentation and API references with pumas_knowledge, write Julia scripts to the programs/ directory, and execute them with include_julia_file. Every step produces real, reusable Julia code that you can review and edit.
Let me first read the NCA workflow guide, then we'll work through it step by step.
Reads the NCA workflow via pumas_knowledge with doc_id: ["workflows/nca_analysis.md"]
The workflow has these main steps:
- Load and validate data - Import your PK dataset
- Explore and visualize - Plot concentration-time profiles
- Build NCA population - Map your columns to NCA requirements
- Calculate parameters - Run the analysis with quality metrics
- Review results - Summary tables and visualizations
For this tutorial, I have a sample IV PK dataset we can use. It has concentration-time data from a single-dose study with multiple dose levels - perfect for learning NCA. Shall I load it?
User: Yes, let's use that sample data.
Loading the Data
Assistant: I'll write a script to load the sample dataset. Julia's CSV.read handles CSV, and PumasAide also supports Arrow, SAS, and other formats.
Writes programs/01-data/01-load-pk-data.jl and executes with include_julia_file:
using CSV, DataFrames
pk_data = CSV.read(
"../../datasets/sample_pk_data.csv",
DataFrame;
missingstring = ["NA", ".", ""],
)Result:
Summary of `DataFrame` `pk_data`: 480 rows × 6 columns
| Column | Type | Unique | Missing | Values |
|:-------|:--------|-------:|--------:|:----------------|
| id | Int64 | 30 | 0 | |
| dose | Int64 | 3 | 0 | 250, 500, 750 |
| time | Float64 | 16 | 0 | |
| conc | Float64 | 459 | 0 | |
| amt | Int64 | 3 | 0 | 250, 500, 750 |
| route | String | 1 | 0 | "iv" |
Preview: http://localhost:8081/preview/pk_dataThe data has 30 subjects across 3 dose groups (250, 500, 750 mg), all IV dosing. The columns are clean with no missing values. You can click the preview link to see the full dataset in your browser.
User: What's the difference between dose and amt? They seem to have the same values.
Assistant: Good question. In NMTRAN/NONMEM-style data:
doseis typically a grouping variable indicating the dose level or cohortamtis the actual amount administered at a dosing event
In this dataset they're identical because each subject received their labeled dose. When we build the NCA population, we'll use amt for calculations (it's what NCA needs for clearance) and dose for grouping the results.
Data Exploration
User: Can we see what the concentration-time profiles look like first?
Assistant: Good idea. I'll plot the individual profiles by dose group on a log scale.
Writes programs/02-exploration/01-conc-time-profiles.jl and executes with include_julia_file:
using AlgebraOfGraphics, CairoMakie
plt =
data(pk_data) *
mapping(:time, :conc; group = :id => nonnumeric, layout = :dose => nonnumeric) *
visual(ScatterLines)
conc_time_plot = draw(
plt;
axis = (
xlabel = "Time (h)",
ylabel = "Concentration (ng/mL)",
title = "Individual Concentration-Time Profiles",
yscale = Makie.pseudolog10,
),
)Result:
Preview: http://localhost:8081/preview/conc_time_plotThe plot shows individual profiles for all 30 subjects, faceted by dose group. A few things to note:
pseudolog10scale handles zero values gracefully (unlikelog10which would error)group=:id => nonnumericconnects each subject's points with lineslayout=:dose => nonnumericcreates separate panels per dose
Click the preview link to see the plot. Do the profiles look as expected?
User: Yes, that looks right. I can see clear dose-proportional increases and typical IV elimination. Let's proceed with the NCA.
Building the NCA Population
Assistant: Now we need to build an NCA population. This maps your DataFrame columns to what NCA expects.
I'll include amt and route in the mappings - these are essential for clearance and volume calculations:
amt- the dose amount for calculating CL = Dose/AUCroute- determines whether we report IV parameters (CL, Vz) or extravascular (CL/F, Vz/F)
Without dose information, NCA can only compute exposure metrics like AUC and Cmax.
Writes programs/03-population/01-nca-population.jl and executes with include_julia_file:
using NCA
nca_pop = NCA.read_nca(
pk_data;
id = :id,
time = :time,
observations = :conc,
amt = :amt,
route = :route,
group = [:dose],
)Result:
NCAPopulation (30 subjects):
Group: [["dose" => 250], ["dose" => 500], ["dose" => 750]]
Number of missing observations: 0
Number of blq observations: 0The population is ready with all 30 subjects grouped by dose. No missing or BLQ (below limit of quantification) observations to worry about.
Running NCA Analysis
User: What parameters should we calculate?
Assistant: For a standard single-dose IV study, we'll want exposure metrics (Cmax, Tmax, AUC), elimination parameters (half-life, clearance, volume), and quality metrics to assess reliability:
rsq_adj_kel- Adjusted R-squared of the terminal slope fit (should be >= 0.8)auc_extrap_obs- Percent of AUC extrapolated to infinity (should be <= 20%)n_samples_kel- Number of points used for terminal slope (should be >= 3)
Writes programs/05-fitting/01-run-nca.jl and executes with include_julia_file:
using NCA
nca_report = NCA.run_nca(
nca_pop;
parameters = [
:cmax,
:tmax,
:auclast,
:aucinf_obs,
:half_life,
:cl_obs,
:vz_obs,
:rsq_adj_kel,
:auc_extrap_obs,
:n_samples_kel,
],
)
nca_results = nca_report.reportdfResult:
Summary of `DataFrame` `nca_results`: 30 rows × 12 columns
| Column | Type | Unique | Missing | Values |
|:---------------|:--------|-------:|--------:|:---------------------------|
| id | String | 30 | 0 | |
| dose | Int64 | 3 | 0 | 250, 500, 750 |
| cmax | Float64 | 30 | 0 | |
| tmax | Float64 | 5 | 0 | 0.0, 0.25, 0.5, 0.75, 1.0 |
| auclast | Float64 | 30 | 0 | |
| aucinf_obs | Float64 | 30 | 0 | |
| half_life | Float64 | 30 | 0 | |
| cl_obs | Float64 | 30 | 0 | |
| vz_obs | Float64 | 30 | 0 | |
| rsq_adj_kel | Float64 | 30 | 0 | |
| auc_extrap_obs | Float64 | 30 | 0 | |
| n_samples_kel | Int64 | 13 | 0 | |
Preview: http://localhost:8081/preview/nca_resultsAll 30 subjects have complete results. Click the preview link to review individual values.
User: How do I know if the terminal slope fits are good enough?
Assistant: Check the quality metrics against these thresholds:
| Metric | Threshold | Meaning |
|---|---|---|
rsq_adj_kel | >= 0.8 | Terminal slope fit is reliable |
auc_extrap_obs | <= 20% | AUCinf not overly dependent on extrapolation |
n_samples_kel | >= 3 | Enough points to characterize elimination |
You should flag and review any subjects failing these criteria. In regulatory submissions, subjects with poor terminal slope fits are often excluded from AUCinf summaries.
Summary Tables
User: Can you create a summary table of the main PK parameters by dose group?
Assistant: I'll create a "Table 1" style summary with statistics by dose group using SummaryTables.
Writes programs/07-results/01-nca-summary-table.jl and executes with include_julia_file:
using SummaryTables
nca_summary = table_one(
nca_results,
[:cmax, :tmax, :auclast, :aucinf_obs, :half_life];
groupby = [:dose],
)Result:
─────────────────────────────────────────────────────────────────────────────────
dose
Total 250 500 750
─────────────────────────────────────────────────────────────────────────────────
cmax
Mean (SD) 67.5 (31.4) 34.4 (6.79) 64.2 (9.94) 104 (18.3)
Median [Min,Max] 66.2 [24,140] 33.9 [24,43.9] 66.2 [46.9,75.1] 102 [80,140]
tmax
Mean (SD) 0.258 (0.282) 0.3 (0.197) 0.225 (0.299) 0.25 (0.354)
Median [Min,Max] 0.25 [0,1] 0.25 [0,0.5] 0.125 [0,0.75] 0.125 [0,1]
auclast
Mean (SD) 329 (143) 173 (18.1) 318 (38.4) 497 (74.5)
Median [Min,Max] 315 [146,605] 179 [146,195] 315 [252,374] 487 [401,605]
aucinf_obs
Mean (SD) 337 (150) 178 (19.8) 323 (38.4) 511 (91.4)
Median [Min,Max] 321 [147,687] 181 [147,201] 321 [254,380] 491 [405,687]
half_life
Mean (SD) 3.98 (1.09) 4.13 (1.23) 3.85 (0.653) 3.97 (1.36)
Median [Min,Max] 3.73 [2.58,7.5] 4.12 [2.58,6.48] 3.73 [2.94,5.42] 3.42 [3.14,7.5]
─────────────────────────────────────────────────────────────────────────────────
Preview: http://localhost:8081/preview/nca_summaryKey observations:
- Cmax increases proportionally with dose (34 -> 64 -> 104 ng/mL)
- AUClast shows dose-proportional increases (173 -> 318 -> 497 ng·h/mL)
- Half-life is consistent across doses (~4 hours), as expected for linear PK
- Tmax is rapid (~0.25 h) consistent with IV bolus administration
Parameter Visualization
User: A boxplot of AUC by dose would be helpful for my report.
Assistant: Sure, I'll create one showing AUClast across the dose groups.
Writes programs/07-results/02-auc-boxplot.jl and executes with include_julia_file:
using AlgebraOfGraphics, CairoMakie
plt = data(nca_results) * mapping(:dose => nonnumeric, :auclast) * visual(BoxPlot)
pk_params_boxplot = draw(
plt;
axis = (
xlabel = "Dose (mg)",
ylabel = "AUClast (ng·h/mL)",
title = "AUClast by Dose Group",
),
)Result:
Preview: http://localhost:8081/preview/pk_params_boxplotThe boxplot clearly shows the dose-proportional increase in exposure. The variability (box width) is reasonable and consistent across dose groups.
Generated Files
User: Where did all the code go? You mentioned files being written.
Assistant: Every step generated an executable Julia script in the programs/ directory:
programs/
01-data/
01-load-pk-data.jl # Data loading
02-exploration/
01-conc-time-profiles.jl # Concentration-time plot
03-population/
01-nca-population.jl # NCA population building
05-fitting/
01-run-nca.jl # NCA parameter calculation
07-results/
01-nca-summary-table.jl # Summary table
02-auc-boxplot.jl # BoxplotThese files are:
- Reproducible - Run them again to recreate results
- Editable - Modify parameters and re-execute with
include_julia_file - Auditable - Each file uses standard Pumas packages you can verify independently
Summary
User: So what did we end up with?
Assistant: Here's what we completed:
- Loaded 480 concentration-time observations from 30 subjects
- Visualized individual profiles across 3 dose groups (250, 500, 750 mg)
- Built an NCA population with proper dose and route mappings
- Calculated key PK parameters: Cmax, Tmax, AUClast, AUCinf, half-life, CL, Vz
- Assessed quality using rsqadjkel, aucextrapobs, and nsampleskel
- Generated summary statistics by dose group
- Created a boxplot showing dose-proportional exposure
The data shows linear pharmacokinetics with:
- Dose-proportional Cmax and AUC
- Consistent half-life (~4 hours) across doses
- Rapid Tmax consistent with IV bolus administration
For your next analysis, start a fresh conversation to keep context focused. You can reference the generated scripts in programs/ to understand the code patterns.
Key Takeaways
Workflow
- The AI reads the NCA workflow guide via
pumas_knowledgefor step-by-step guidance - Each step produces a Julia script in
programs/using standard Pumas packages - Visualize before analyzing to catch data issues early
- Map columns correctly in
NCA.read_nca()- includeamtandroutefor CL/Vz - Always include quality metrics in
NCA.run_nca()
Quality Thresholds
| Metric | Good | Action if Failed |
|---|---|---|
| rsqadjkel >= 0.8 | Reliable terminal slope | Review/exclude subject |
| aucextrapobs <= 20% | Reliable AUCinf | Use AUClast instead |
| nsampleskel >= 3 | Adequate data | Collect more samples |
Common Mistakes to Avoid
- Forgetting to map
amtandroute(can't calculate CL/Vz without them) - Skipping quality metrics (won't know if results are reliable)
- Not visualizing data before analysis (miss outliers and data issues)