`ADaM` Docstrings

ADaM.JoinColumnsKeywordError — Type

JoinColumnsKeywordError(keyword::Symbol, err::Exception)

Custom error type for join_columns keyword argument failures.

ADaM.basic_info_pc — Method

basic_info_pc(df::DataFrame)

Displays basic information about the pc(PK conc.) dataset in a dictionary containing

Studies involved
No of subjects(overall)
Treatments
Sample specimens

Example

julia> pc = PharmaDatasets.dataset("SDTM/CDISCPILOT01/pc")
3556×20 DataFrame
  Row │ STUDYID       DOMAIN   USUBJID      PCSEQ    PCTESTCD  PCTEST      PCORRES            PCORRESU  PCSTRESC           PCS ⋯
      │ String15      String3  String15     Float64  String3   String15    String31           String7   String31           Flo ⋯
──────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    1 │ CDISCPILOT01  PC       01-701-1015      1.0  XAN       XANOMELINE  <BLQ               ug/ml     <BLQ                   ⋯
    2 │ CDISCPILOT01  PC       01-701-1015      2.0  XAN       XANOMELINE  <BLQ               ug/ml     <BLQ               mis
    3 │ CDISCPILOT01  PC       01-701-1015      3.0  XAN       XANOMELINE  <BLQ               ug/ml     <BLQ               mis
    4 │ CDISCPILOT01  PC       01-701-1015      4.0  XAN       XANOMELINE  <BLQ               ug/ml     <BLQ               mis
    5 │ CDISCPILOT01  PC       01-701-1015      5.0  XAN       XANOMELINE  <BLQ               ug/ml     <BLQ               mis ⋯
    6 │ CDISCPILOT01  PC       01-701-1015      6.0  XAN       XANOMELINE  <BLQ               ug/ml     <BLQ               mis
    7 │ CDISCPILOT01  PC       01-701-1015      7.0  XAN       XANOMELINE  <BLQ               ug/ml     <BLQ               mis
    8 │ CDISCPILOT01  PC       01-701-1015      8.0  XAN       XANOMELINE  <BLQ               ug/ml     <BLQ               mis
  ⋮   │      ⋮           ⋮          ⋮          ⋮        ⋮          ⋮               ⋮             ⋮              ⋮              ⋱
 3550 │ CDISCPILOT01  PC       01-718-1427      8.0  XAN       XANOMELINE  1.87286298246525   ug/ml     1.87286298246525       ⋯
 3551 │ CDISCPILOT01  PC       01-718-1427      9.0  XAN       XANOMELINE  1.8956805216499    ug/ml     1.8956805216499
 3552 │ CDISCPILOT01  PC       01-718-1427     10.0  XAN       XANOMELINE  0.575294228033741  ug/ml     0.575294228033741
 3553 │ CDISCPILOT01  PC       01-718-1427     11.0  XAN       XANOMELINE  0.173882563295603  ug/ml     0.173882563295603
 3554 │ CDISCPILOT01  PC       01-718-1427     12.0  XAN       XANOMELINE  0.015885031037154  ug/ml     0.015885031037154      ⋯
 3555 │ CDISCPILOT01  PC       01-718-1427     13.0  XAN       XANOMELINE  <BLQ               ug/ml     <BLQ               mis
 3556 │ CDISCPILOT01  PC       01-718-1427     14.0  XAN       XANOMELINE  <BLQ               ug/ml     <BLQ               mis
                                                                                                11 columns and 3541 rows omitted

julia> basic_info_pc(pc)
Dict{String, Any} with 4 entries:
  "studies"    => String15["CDISCPILOT01"]
  "subjects"   => 254
  "treatments" => String15["XANOMELINE"]
  "specimens"  => String7["PLASMA"]

ADaM.bmi_summary — Method

bmi_summary(df::DataFrame; id, bmi)

Displays the count of each BMI category based on id. Category names are based on National Library of Medicine.

Underweight (Below 18.5)
Normal (18.5 to 24.9)
Overweight (25.0 to 29.9)
Obese (30.0 to 39.9)
Extreme (Over 40)

id defaults to :USUBJID and bmi defaults to :BMIBL.

Example

julia> df = DataFrame(USUBJID = [1,2,3,4,5,6,7,8], BMIBL = [15,42,31,25,21,46,18,19])

julia> bmi_summary(df)
5×2 DataFrame
 Row │ BMIC         count 
     │ String       Int64 
─────┼────────────────────
   1 │ Underweight      2
   2 │ Extreme          2
   3 │ Obese            1
   4 │ Overweight       1
   5 │ Normal           2

julia> df = DataFrame(ID = [1,2,3,4,5,6,7,8], BMI = [15,42,31,25,21,46,18,19])

julia> bmi_summary(df, id="ID", bmi = "BMI")
5×2 DataFrame
 Row │ BMIC         count 
     │ String       Int64 
─────┼────────────────────
   1 │ Underweight      2
   2 │ Extreme          2
   3 │ Obese            1
   4 │ Overweight       1
   5 │ Normal           2

ADaM.body_mass_index — Method

body_mass_index(weight::Number, height::Number; kwargs...)
body_mass_index(weight::Quantity, height::Quantity)
body_mass_index(df::DataFrame; kwargs...)

Calculates BMI from height and weight which can be provided as Quantitys, Scalars or Vectors via DataFrame. BMI Wikipedia

The weight_unit and height_unit can be explicitly passed as unit values or can be passed as a Vector. Follows the units from DynamicQuantities.jl

Arguments

weight: Weight value.
height: Height value.
weight_unit: Weight unit (default: "kg").
height_unit: Height unit (default: "cm").

Default Columns

The following default column names are used for DataFrame input:

weight = :WTBL
height = :HTBL
weight_unit = :WTBLU
height_unit = :HTBLU
col = :BMIBL

Output

BMI value in kg/m² as a scalar or DataFrame column with unit column.

Examples

julia> bmi = body_mass_index(60, 160)
23.437499999999996 m⁻² kg

julia> value, unit = ustrip(bmi), dimension(bmi)
(23.437499999999996, m⁻² kg)

julia> body_mass_index(60, 1.6, height_unit = :m)
23.437499999999996 m⁻² kg

julia> body_mass_index(60000, 160, weight_unit = :g)
23.437499999999996 m⁻² kg

julia> body_mass_index(60u"kg", 160u"cm")
23.437499999999996 m⁻² kg

julia> df = DataFrame(
        HTBL = [150, 160, 170, 180],
        WTBL = [50, 60, 70, 80],
        HTBLU = "cm",
        WTBLU = "kg",
    )
4×4 DataFrame
 Row │ HTBL   WTBL   HTBLU   WTBLU  
     │ Int64  Int64  String  String 
─────┼──────────────────────────────
   1 │   150     50  cm      kg
   2 │   160     60  cm      kg
   3 │   170     70  cm      kg
   4 │   180     80  cm      kg

julia> body_mass_index(df)
4×6 DataFrame
 Row │ HTBL   WTBL   HTBLU   WTBLU   BMIBL    BMIBLU    
     │ Int64  Int64  String  String  Float64  Symbolic… 
─────┼──────────────────────────────────────────────────
   1 │   150     50  cm      kg      22.2222  m⁻² kg
   2 │   160     60  cm      kg      23.4375  m⁻² kg
   3 │   170     70  cm      kg      24.2215  m⁻² kg
   4 │   180     80  cm      kg      24.6914  m⁻² kg

julia> df = DataFrame(HT = [150, 160, 170, 180], WT = [50, 60, 70, 80], HTU = "cm", WTU = "kg")
4×4 DataFrame
 Row │ HT     WT     HTU     WTU    
     │ Int64  Int64  String  String 
─────┼──────────────────────────────
   1 │   150     50  cm      kg
   2 │   160     60  cm      kg
   3 │   170     70  cm      kg
   4 │   180     80  cm      kg

julia> body_mass_index(df, height = :HT, weight = :WT, height_unit = :HTU, weight_unit = :WTU, col = :BMI)
4×6 DataFrame
 Row │ HT     WT     HTU     WTU     BMI      BMIU      
     │ Int64  Int64  String  String  Float64  Symbolic… 
─────┼──────────────────────────────────────────────────
   1 │   150     50  cm      kg      22.2222  m⁻² kg
   2 │   160     60  cm      kg      23.4375  m⁻² kg
   3 │   170     70  cm      kg      24.2215  m⁻² kg
   4 │   180     80  cm      kg      24.6914  m⁻² kg

ADaM.body_surface_area — Method

body_surface_area(height::Number, weight::Number; kwargs...)
body_surface_area(height::Quantity, weight::Quantity; kwargs...)
body_surface_area(df::DataFrame; kwargs...)

Calculates BSA from height and weight which can be provided as Quantitys, Scalars or Vectors via DataFrame. BSA Wikipedia

BSA can be calculated using the following formulas:

mosteller (default)
dubois-dubois
haycock
gehan-george
boyd
fujimoto
takahira

The weight_unit and height_unit can be explicitly passed as unit values or can be passed as a Vector. Follows the units from DynamicQuantities.jl

Arguments

height: Height value.
weight: Weight value.
height_unit: Height unit (default: "cm").
weight_unit: Weight unit (default: "kg").
formula: BSA calculation formula (default: :mosteller).

Default Columns

The following default column names are used for DataFrame input:

height = :HTBL
weight = :WTBL
height_unit = :HTBLU
weight_unit = :WTBLU
col = :BSABL

Output

BSA value in m² as a scalar or DataFrame column with unit column.

Examples

julia> bsa = body_surface_area(160, 60) # default height(cm), weight(kg), formula(mosteller)
1.632993161855452 m²

julia> value, unit = ustrip(bsa), dimension(bsa)
(1.632993161855452, m²)

julia> bsa = body_surface_area(160, 60, formula="dubois-dubois")
1.6220414635466536 m²

julia> bsa = body_surface_area(160, 60, formula=:takahira)
1.6349324596500971 m²

julia> body_surface_area( 1.6, 60, height_unit = :m)
1.632993161855452 m²

julia> body_surface_area(160, 60000, weight_unit = :g)
1.632993161855452 m²

julia> body_surface_area(160u"cm", 60u"kg")
1.632993161855452 m²

julia> df = DataFrame(
        HTBL = [150, 160, 170, 180],
        WTBL = [50, 60, 70, 80],
        HTBLU = "cm",
        WTBLU = "kg",
    )
4×4 DataFrame
 Row │ HTBL   WTBL   HTBLU   WTBLU  
     │ Int64  Int64  String  String 
─────┼──────────────────────────────
   1 │   150     50  cm      kg
   2 │   160     60  cm      kg
   3 │   170     70  cm      kg
   4 │   180     80  cm      kg

julia> body_surface_area(df)
4×6 DataFrame
 Row │ HTBL   WTBL   HTBLU   WTBLU   BSABL    BSABLU    
     │ Int64  Int64  String  String  Float64  Symbolic… 
─────┼──────────────────────────────────────────────────
   1 │   150     50  cm      kg      1.44338  m²
   2 │   160     60  cm      kg      1.63299  m²
   3 │   170     70  cm      kg      1.81812  m²
   4 │   180     80  cm      kg      2.0      m²

julia> df = DataFrame(HT = [150, 160, 170, 180], WT = [50, 60, 70, 80], HTU = "cm", WTU = "kg")
4×4 DataFrame
 Row │ HT     WT     HTU     WTU    
     │ Int64  Int64  String  String 
─────┼──────────────────────────────
   1 │   150     50  cm      kg
   2 │   160     60  cm      kg
   3 │   170     70  cm      kg
   4 │   180     80  cm      kg

julia> body_surface_area(df, height = :HT, weight = :WT, height_unit = :HTU, weight_unit = :WTU, col = :BSA)
4×6 DataFrame
 Row │ HT     WT     HTU     WTU     BSA      BSAU      
     │ Int64  Int64  String  String  Float64  Symbolic… 
─────┼──────────────────────────────────────────────────
   1 │   150     50  cm      kg      1.44338  m²
   2 │   160     60  cm      kg      1.63299  m²
   3 │   170     70  cm      kg      1.81812  m²
   4 │   180     80  cm      kg      2.0      m²

ADaM.compress_dose_events — Method

compress_dose_events(df::DataFrame; group, order, sampling_rows)

This function replaces a sequence of dosing rows (EVID == 1) into compressed format based on EVID column, creating ADDL (Additional Doses) and II (Inter-dose Interval) columns

group and order variables (Vectors or Scalars) can be passed to customise the compression.

Compression can be done so as to retain one inter-sampling row sampling_rows = :single or two sampling rows sampling_rows = :double. Only the information of the 1st row of the sequence is retained.

Required columns for expansion: EVID

Example

julia> df = DataFrame([
    (1, 1, 40),
    (2, 1, 90),
    (1, 0, 10),
    (2, 0, 60),
    (1, 1, 20),
    (2, 1, 70),
    (1, 1, 30),
    (2, 1, 80),
    (1, 0, 50),
    (2, 0, 100)
], [:ID, :EVID, :AFRLT]) # unordered and ungrouped dataset
10×3 DataFrame
 Row │ ID     EVID   AFRLT 
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1     40
   2 │     2      1     90
   3 │     1      0     10
   4 │     2      0     60
   5 │     1      1     20
   6 │     2      1     70
   7 │     1      1     30
   8 │     2      1     80
   9 │     1      0     50
  10 │     2      0    100

julia> compress_dose_events(df) # compress without groupby
6×4 DataFrame
 Row │ ID     EVID   AFRLT  ADDL  
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 │     1      1     40      1
   2 │     1      0     10      0
   3 │     2      0     60      0
   4 │     1      1     20      3
   5 │     1      0     50      0
   6 │     2      0    100      0

julia> compress_dose_events(df, group = ["ID"])
8×4 DataFrame
 Row │ ID     EVID   AFRLT  ADDL  
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 │     1      1     40      0
   2 │     1      0     10      0
   3 │     1      1     20      1
   4 │     1      0     50      0
   5 │     2      1     90      0
   6 │     2      0     60      0
   7 │     2      1     70      1
   8 │     2      0    100      0

julia> compress_dose_events(df, group = [:ID], order = [:AFRLT])
6×4 DataFrame
 Row │ ID     EVID   AFRLT  ADDL  
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 │     1      0     10      0
   2 │     1      1     20      2
   3 │     1      0     50      0
   4 │     2      0     60      0
   5 │     2      1     70      2
   6 │     2      0    100      0

julia> compress_dose_events(df, group = "ID", order = "AFRLT", sampling_rows = :double)
8×4 DataFrame
 Row │ ID     EVID   AFRLT  ADDL  
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 │     1      0     10      0
   2 │     1      1     20      1
   3 │     1      1     40      0
   4 │     1      0     50      0
   5 │     2      0     60      0
   6 │     2      1     70      1
   7 │     2      1     90      0
   8 │     2      0    100      0

ADaM.convert_to_missing — Method

convert_to_missing(df::DataFrame, NaStr::Vector)

Converts values to missing. The values that need to be converted to missing can be passed as a Vector.
Example : [nothing, "", NaN, ".", "-"]

Example

julia> df = DataFrame(Col1 = [1, "."], Col2 = ["", 2], Col3 = [3, nothing], Col4 = ["-", 4])
2×4 DataFrame
 Row │ Col1  Col2  Col3    Col4 
     │ Any   Any   Union…  Any  
─────┼──────────────────────────
   1 │ 1           3       -
   2 │ .     2             4

julia> convert_to_missing(df, ["", nothing, ".", "-"])
2×4 DataFrame
 Row │ Col1     Col2     Col3     Col4
     │ Int64?   Int64?   Int64?   Int64?
─────┼────────────────────────────────────
   1 │       1  missing        3  missing
   2 │ missing        2  missing        4

ADaM.creatinine_clearance — Method

creatinine_clearance(weight::Number, height::Number; kwargs...)
creatinine_clearance(weight::Quantity, height::Quantity)
creatinine_clearance(df::DataFrame; kwargs...)

Calculates creatinine clearance from age, weight,creatinine and sex using Cockcroft–Gault formula which can be provided as Quantitys, Scalars or Vectors via DataFrame. Cockcroft–Gault formula Wikipedia

The age_unit, weight_unit and creat_unit can be explicitly passed as unit values or can be passed as a Vector. Follows the units from DynamicQuantities.jl

Default values for keyword arguments:

age = :AGE
weight = :WTBL
creat = :CREATBL
sex = :SEX
age_unit = :AGEU
weight_unit = :WTBLU
creat_unit = :CREATBLU
col = :CRCLBL

Example

julia> creatinine_clearance(53, 85, 90, "M", creat_unit = "umol/L")
100.88434438681963 min⁻¹ mL

julia> creatinine_clearance(53, 85, 1, "M", creat_unit = "mg/dL")
102.70833333333333 min⁻¹ mL

julia> creatinine_clearance(53us"yr", 85us"kg", 90us"umol/L", "M")
100.88434438681963 min⁻¹ mL

julia> creatinine_clearance(53u"yr", 85u"kg", 1us"mg/dL", "M")
102.70833333333333 min⁻¹ mL

julia> df = DataFrame(
        AGE = [20, 30, 40, 53],
        WTBL = [50, 60, 70, 85],
        CREATBL = [60, 70, 80, 90],
        SEX = ["M", "M", "F", "F"],
        AGEU = :yr,
        WTBLU = :kg,
        CREATBLU = "umol/L",
    )
4×7 DataFrame
 Row │ AGE    WTBL   CREATBL  SEX     AGEU    WTBLU   CREATBLU 
     │ Int64  Int64  Int64    String  Symbol  Symbol  String   
─────┼─────────────────────────────────────────────────────────
   1 │    20     50       60  M       yr      kg      umol/L
   2 │    30     60       70  M       yr      kg      umol/L
   3 │    40     70       80  F       yr      kg      umol/L
   4 │    53     85       90  F       yr      kg      umol/L

julia> creatinine_clearance(df)
4×9 DataFrame
 Row │ AGE    WTBL   CREATBL  SEX     AGEU    WTBLU   CREATBLU  CRCLBL    CRCLBLU   
     │ Int64  Int64  Int64    String  Symbol  Symbol  String    Float64   Symbolic… 
─────┼──────────────────────────────────────────────────────────────────────────────
   1 │    20     50       60  M       yr      kg      umol/L    122.78    min⁻¹ mL
   2 │    30     60       70  M       yr      kg      umol/L    115.764   min⁻¹ mL
   3 │    40     70       80  F       yr      kg      umol/L     91.3177  min⁻¹ mL
   4 │    53     85       90  F       yr      kg      umol/L     85.7517  min⁻¹ mL

julia> df = DataFrame(
        AGEYRS = [20, 30, 40, 53],
        WEIGHT = [50, 60, 70, 85],
        CREAT = [60, 70, 80, 90],
        GENDER = ["M", "M", "F", "F"],
        AGEUNI = :yr,
        WTUNI = :kg,
        CREATUNI = "umol/L",
    )
4×7 DataFrame
 Row │ AGEYRS  WEIGHT  CREAT  GENDER  AGEUNI  WTUNI   CREATUNI 
     │ Int64   Int64   Int64  String  Symbol  Symbol  String   
─────┼─────────────────────────────────────────────────────────
   1 │     20      50     60  M       yr      kg      umol/L
   2 │     30      60     70  M       yr      kg      umol/L
   3 │     40      70     80  F       yr      kg      umol/L
   4 │     53      85     90  F       yr      kg      umol/L

julia> creatinine_clearance(
        df;
        age = :AGEYRS,
        weight = :WEIGHT,
        creat = :CREAT,
        sex = :GENDER,
        age_unit = :AGEUNI,
        weight_unit = :WTUNI,
        creat_unit = :CREATUNI,
        col = :CREATCL,
    )
4×9 DataFrame
 Row │ AGEYRS  WEIGHT  CREAT  GENDER  AGEUNI  WTUNI   CREATUNI  CREATCL   CREATCLU  
     │ Int64   Int64   Int64  String  Symbol  Symbol  String    Float64   Symbolic… 
─────┼──────────────────────────────────────────────────────────────────────────────
   1 │     20      50     60  M       yr      kg      umol/L    122.78    min⁻¹ mL
   2 │     30      60     70  M       yr      kg      umol/L    115.764   min⁻¹ mL
   3 │     40      70     80  F       yr      kg      umol/L     91.3177  min⁻¹ mL
   4 │     53      85     90  F       yr      kg      umol/L     85.7517  min⁻¹ mL

ADaM.definition_table — Method

definition_table(table)

Creates a Table that gives a defintion overview of the columns of table, intended to give a quick intuition of the dataset. ategoric and Numeric columns(NUM, CD , N) are automatically mapped to each other in the Summary column. Custom comments can be passed for each column as additional information.

Keyword arguments

max_categories = 10: Limit the number of categories listed individually for categorical columns, the rest will be lumped together.
label_metadata_key = "label": Key to look up column label metadata with.
map_dict = Helps map unique values of 2 columns.
comment_dict = Helps display custom comments for columns.

ADaM.derive_body_covariates — Method

derive_body_covariates(df; kwargs...)

Derive standard body size (BMI, BSA) and renal function (CRCL, EGFR) covariates from a DataFrame.

This is a convenience function that internally calls:

body_mass_index
body_surface_area
creatinine_clearance
est_glomerular_filtration_rate

Positional Arguments

df: Input DataFrame containing subject data.

Keyword Arguments

bmi: Output column for Body Mass Index (default: :BMIBL)
bsa: Output column for Body Surface Area (default: :BSABL)
crcl: Output column for Creatinine Clearance (default: :CRCLBL)
egfr: Output column for estimated GFR (default: :EGFRBL)
weight: Input column for weight (default: :WTBL)
height: Input column for height (default: :HTBL)
weight_unit: Input column for weight units (default: :WTBLU)
height_unit: Input column for height units (default: :HTBLU)
age: Input column for age (default: :AGE)
age_unit: Input column for age units (default: :AGEU)
sex: Input column for sex/gender (default: :SEX)
creat: Input column for serum creatinine (default: :CREATBL)
creat_unit: Input column for creatinine units (default: :CREATBLU)
bsa_formula: Formula for BSA calculation (default: :mosteller)
egfr_formula: Formula for eGFR calculation (default: "ckd-epi-creat-2021")

Returns

A new DataFrame with additional columns for derived covariates, including BMI, BSA, CRCL, and EGFR column values and units.

Examples

julia> df = DataFrame(
        HTBL = [150, 160, 170, 180],
        WTBL = [50, 60, 70, 80],
        HTBLU = "cm",
        WTBLU = "kg",
    )
4×4 DataFrame
 Row │ HTBL   WTBL   HTBLU   WTBLU  
     │ Int64  Int64  String  String 
─────┼──────────────────────────────
   1 │   150     50  cm      kg
   2 │   160     60  cm      kg
   3 │   170     70  cm      kg
   4 │   180     80  cm      kg

julia> derive_body_covariates(df)
4×8 DataFrame
 Row │ HTBL   WTBL   HTBLU   WTBLU   BMIBL    BMIBLU     BSABL    BSABLU    
     │ Int64  Int64  String  String  Float64  Symbolic…  Float64  Symbolic… 
─────┼──────────────────────────────────────────────────────────────────────
   1 │   150     50  cm      kg      22.2222  m⁻² kg     1.44338  m²
   2 │   160     60  cm      kg      23.4375  m⁻² kg     1.63299  m²
   3 │   170     70  cm      kg      24.2215  m⁻² kg     1.81812  m²
   4 │   180     80  cm      kg      24.6914  m⁻² kg     2.0      m²

julia> df = DataFrame(
        HTBL = [150, 160, 170, 180],
        WTBL = [50, 60, 70, 80],
        HTBLU = "cm",
        WTBLU = "kg",
        AGE = [20, 30, 40, 53],
        CREATBL = [60, 70, 80, 90],
        SEX = ["M", "M", "F", "F"],
        AGEU = :yr,
        CREATBLU = "umol/L",
    )
4×9 DataFrame
 Row │ HTBL   WTBL   HTBLU   WTBLU   AGE    CREATBL  SEX     AGEU    CREATBLU 
     │ Int64  Int64  String  String  Int64  Int64    String  Symbol  String   
─────┼────────────────────────────────────────────────────────────────────────
   1 │   150     50  cm      kg         20       60  M       yr      umol/L
   2 │   160     60  cm      kg         30       70  M       yr      umol/L
   3 │   170     70  cm      kg         40       80  F       yr      umol/L
   4 │   180     80  cm      kg         53       90  F       yr      umol/L

julia> derive_body_covariates(df)
4×17 DataFrame
 Row │ HTBL   WTBL   HTBLU   WTBLU   AGE    CREATBL  SEX     AGEU    CREATBLU  BMIBL    BMIBLU     BSABL    BSABLU     CRCLBL    CRCLBLU    EGFRBL    EGFRBLU         
     │ Int64  Int64  String  String  Int64  Int64    String  Symbol  String    Float64  Symbolic…  Float64  Symbolic…  Float64   Symbolic…  Float64   String          
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │   150     50  cm      kg         20       60  M       yr      umol/L    22.2222  m⁻² kg     1.44338  m²         122.78    min⁻¹ mL   136.546   min⁻¹ mL/1.73m²
   2 │   160     60  cm      kg         30       70  M       yr      umol/L    23.4375  m⁻² kg     1.63299  m²         115.764   min⁻¹ mL   122.476   min⁻¹ mL/1.73m²
   3 │   170     70  cm      kg         40       80  F       yr      umol/L    24.2215  m⁻² kg     1.81812  m²          91.3177  min⁻¹ mL    82.3362  min⁻¹ mL/1.73m²
   4 │   180     80  cm      kg         53       90  F       yr      umol/L    24.6914  m⁻² kg     2.0      m²          80.7075  min⁻¹ mL    65.9318  min⁻¹ mL/1.73m²

ADaM.dose_interval_sequence — Method

function dose_interval_sequence(df::DataFrame; group, order)

This function creates a column DINTSEQ from a either dose(EX) or combined dataset (EX and PC).

The values in DINTSEQ column contains a sequences of doses/washout (EVID in [1, 4]) starting from missing (predose sample) or 1 (initial dose).

Group and order variables (Vectors or Scalars) can be passed to customise the sequence.

Required columns for dose count: EVID

Example

julia> df = DataFrame([
    (1, 1, 40),
    (2, 1, 90),
    (1, 0, 10),
    (2, 0, 60),
    (1, 1, 20),
    (2, 1, 70),
    (1, 1, 30),
    (2, 1, 80),
    (1, 0, 50),
    (2, 0, 100)
], [:ID, :EVID, :AFRLT]) # unordered and ungrouped dataset
10×3 DataFrame
 Row │ ID     EVID   AFRLT 
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      1     40
   2 │     2      1     90
   3 │     1      0     10
   4 │     2      0     60
   5 │     1      1     20
   6 │     2      1     70
   7 │     1      1     30
   8 │     2      1     80
   9 │     1      0     50
  10 │     2      0    100

julia> dose_interval_sequence(df) # count without groupby
10×4 DataFrame
 Row │ ID     EVID   AFRLT  DINTSEQ 
     │ Int64  Int64  Int64  Int64   
─────┼──────────────────────────────
   1 │     1      1     40        1
   2 │     2      1     90        2
   3 │     1      0     10        2
   4 │     2      0     60        2
   5 │     1      1     20        3
   6 │     2      1     70        4
   7 │     1      1     30        5
   8 │     2      1     80        6
   9 │     1      0     50        6
  10 │     2      0    100        6

julia> dose_interval_sequence(df, group = ["ID"])
10×4 DataFrame
 Row │ ID     EVID   AFRLT  DINTSEQ 
     │ Int64  Int64  Int64  Int64   
─────┼──────────────────────────────
   1 │     1      1     40        1
   2 │     1      0     10        1
   3 │     1      1     20        2
   4 │     1      1     30        3
   5 │     1      0     50        3
   6 │     2      1     90        1
   7 │     2      0     60        1
   8 │     2      1     70        2
   9 │     2      1     80        3
  10 │     2      0    100        3

julia> dose_interval_sequence(df, group = :ID, order = :AFRLT)
10×4 DataFrame
 Row │ ID     EVID   AFRLT  DINTSEQ 
     │ Int64  Int64  Int64  Int64?  
─────┼──────────────────────────────
   1 │     1      0     10  missing 
   2 │     1      1     20        1
   3 │     1      1     30        2
   4 │     1      1     40        3
   5 │     1      0     50        3
   6 │     2      0     60  missing 
   7 │     2      1     70        1
   8 │     2      1     80        2
   9 │     2      1     90        3
  10 │     2      0    100        3

ADaM.est_glomerular_filtration_rate — Method

est_glomerular_filtration_rate(age::Number, sex::Union{Symbol,AbstractString}; kwargs...)
est_glomerular_filtration_rate(age::Quantity, sex::Union{Symbol,AbstractString}; kwargs...)
est_glomerular_filtration_rate(df::DataFrame; kwargs...)

Calculates eGFR from age, creatinine and sex which can be provided as Quantitys, Scalars or Vectors via DataFrame. Supports various formulas:

CKD-EPI Creatinine 2021 Formula - ckd-epi-creat-2021 (default)
CKD-EPI Cystatin C 2012 Formula - ckd-epi-cyst-2012
MDRD Formula - mdrd (requires race parameter)

The age_unit, creat_unit, and cyst_unit can be explicitly passed as unit values or can be passed as a Vector. Follows the units from DynamicQuantities.jl

Arguments

age: Age value.
sex: Sex/gender (valid values: "M", "MALE", "F", "FEMALE", case-insensitive).
creat: Serum creatinine value (default: 0).
cyst: Cystatin C value (default: 0).
race: Race (default: "U"). Required for MDRD formula.
age_unit: Age unit (default: "yr").
creat_unit: Creatinine unit (default: "mg/dL").
cyst_unit: Cystatin C unit (default: "mg/L").
formula: eGFR calculation formula (default: "ckd-epi-creat-2021").

Default Columns

The following default column names are used for DataFrame input:

age = :AGE
sex = :SEX
creat = :CREATBL
cyst = :CYSTBL
race = :RACE
age_unit = :AGEU
creat_unit = :CREATBLU
cyst_unit = :CYSTBLU
col = :EGFRBL

Output

eGFR value in mL/min/1.73m² as a scalar or DataFrame column with unit column.

Notes

Age must be between 1 and 140 years.

Examples

julia> est_glomerular_filtration_rate(53, "M", creat = 90, creat_unit = "umol/L")
88.08210123133497

julia> est_glomerular_filtration_rate(53, "M", creat = 1, creat_unit = "mg/dL")
89.99656911635697

julia> est_glomerular_filtration_rate(53us"yr", "M", creat =  90us"umol/L", formula="ckd-epi-creat-2021")
88.08210123133497

julia> est_glomerular_filtration_rate(53u"yr", "M", creat = 1us"mg/dL")
89.99656911635697

julia> est_glomerular_filtration_rate(53u"yr", "M", creat = 1us"mg/dL", race="BLACK OR AFRICAN AMERICAN", formula=:mdrd)
94.7354815425664

julia> est_glomerular_filtration_rate(53u"yr", "M", cyst = 0.6us"mg/L", formula="ckd-epi-cyst-2012")
124.1483657437901

julia> df = DataFrame(
        AGE = [20, 30, 40, 53],
        CREATBL = [60, 70, 80, 90],
        SEX = ["M", "M", "F", "F"],
        AGEU = :yr,
        CREATBLU = "umol/L",
    )
4×5 DataFrame
 Row │ AGE    CREATBL  SEX     AGEU    CREATBLU 
     │ Int64  Int64    String  Symbol  String   
─────┼──────────────────────────────────────────
   1 │    20       60  M       yr      umol/L
   2 │    30       70  M       yr      umol/L
   3 │    40       80  F       yr      umol/L
   4 │    53       90  F       yr      umol/L

julia> est_glomerular_filtration_rate(df)
4×7 DataFrame
 Row │ AGE    CREATBL  SEX     AGEU    CREATBLU  EGFRBL    EGFRBLU         
     │ Int64  Int64    String  Symbol  String    Float64   String          
─────┼─────────────────────────────────────────────────────────────────────
   1 │    20       60  M       yr      umol/L    136.546   min⁻¹ mL/1.73m²
   2 │    30       70  M       yr      umol/L    122.476   min⁻¹ mL/1.73m²
   3 │    40       80  F       yr      umol/L     82.3362  min⁻¹ mL/1.73m²
   4 │    53       90  F       yr      umol/L     65.9318  min⁻¹ mL/1.73m²

julia> df = DataFrame(
        AGEYRS = [20, 30, 40, 53],
        CREAT = [60, 70, 80, 90],
        GENDER = ["M", "M", "F", "F"],
        AGEUNI = :yr,
        RACEC = "BLACK OR AFRICAN AMERICAN",
        CREATUNI = "umol/L"
    )
4×6 DataFrame
 Row │ AGEYRS  CREAT  GENDER  AGEUNI  RACEC                      CREATUNI 
     │ Int64   Int64  String  Symbol  String                     String   
─────┼────────────────────────────────────────────────────────────────────
   1 │     20     60  M       yr      BLACK OR AFRICAN AMERICAN  umol/L
   2 │     30     70  M       yr      BLACK OR AFRICAN AMERICAN  umol/L
   3 │     40     80  F       yr      BLACK OR AFRICAN AMERICAN  umol/L
   4 │     53     90  F       yr      BLACK OR AFRICAN AMERICAN  umol/L

julia> est_glomerular_filtration_rate(
        df;
        age = :AGEYRS,
        creat = :CREAT,
        sex = :GENDER,
        race = :RACEC,
        age_unit = :AGEUNI,
        creat_unit = :CREATUNI,
        col = :EGFR,
        formula=:mdrd
    )
4×8 DataFrame
 Row │ AGEYRS  CREAT  GENDER  AGEUNI  RACEC                      CREATUNI  EGFR      EGFRU           
     │ Int64   Int64  String  Symbol  String                     String    Float64   String          
─────┼───────────────────────────────────────────────────────────────────────────────────────────────
   1 │     20     60  M       yr      BLACK OR AFRICAN AMERICAN  umol/L    180.576   min⁻¹ mL/1.73m²
   2 │     30     70  M       yr      BLACK OR AFRICAN AMERICAN  umol/L    139.206   min⁻¹ mL/1.73m²
   3 │     40     80  F       yr      BLACK OR AFRICAN AMERICAN  umol/L     83.5172  min⁻¹ mL/1.73m²
   4 │     53     90  F       yr      BLACK OR AFRICAN AMERICAN  umol/L     68.8551  min⁻¹ mL/1.73m²

julia> df = DataFrame(
        AGE = [20, 30, 40, 53],
        CYSTBL = [0.6, 0.7, 0.8, 0.9],
        SEX = ["M", "M", "F", "F"],
        RACE = "ASIAN",
        AGEU = :yr,
        CYSTBLU = "mg/L",
    )

julia> est_glomerular_filtration_rate(df, formula="ckd-epi-cyst-2012")
4×8 DataFrame
 Row │ AGE    CYSTBL   SEX     RACE    AGEU    CYSTBLU  EGFRBL   EGFRBLU         
     │ Int64  Float64  String  String  Symbol  String   Float64  String          
─────┼───────────────────────────────────────────────────────────────────────────
   1 │    20      0.6  M       ASIAN   yr      mg/L     141.704  min⁻¹ mL/1.73m²
   2 │    30      0.7  M       ASIAN   yr      mg/L     126.058  min⁻¹ mL/1.73m²
   3 │    40      0.8  F       ASIAN   yr      mg/L     105.594  min⁻¹ mL/1.73m²
   4 │    53      0.9  F       ASIAN   yr      mg/L      85.72   min⁻¹ mL/1.73m²

ADaM.expand_dose_events — Method

expand_dose_events(df::DataFrame; start_dtm, end_dtm, trt_start_dtm, dose_freq)

Returns expanded dataset when you pass EX (dose) dataset as input with all columns.
Creates unique row for every dosing in a dosing period for a subject.
Generates NFRLT (Nominal time relative to first dose) that varies across every row based on dosing frequency.
All other column values will be duplicated on expansion.

Arguments

start_dtm: Column name containing the start datetime for the dosing period
end_dtm: Column name containing the end datetime for the dosing period
trt_start_dtm: Column name containing the first analyte dose datetime (used as reference for NFRLT calculation)
dose_freq: Column name containing the dosing frequency

Default Columns

The following default column names are used for DataFrame input:

start_dtm = :ASTDTM
end_dtm = :AENDTM
trt_start_dtm = :FANLDTM
dose_freq = :EXDOSFRQ

Notes

If dosing frequency == 'ONCE' or start date == end date; no expansion happens.

Example

julia> df = DataFrame(USUBJID = "1", 
                      EVID = 1, 
                      ASTDTM = DateTime.(["2025-01-01", "2025-02-01", "2025-03-01"]), 
                      AENDTM = DateTime.(["2025-01-31", "2025-02-28", "2025-03-31"]), 
                      EXDOSFRQ = "QD", 
                      FANLDTM = DateTime.("2025-01-01"))
3×6 DataFrame
 Row │ USUBJID  EVID   ASTDTM               AENDTM               EXDOSFRQ  FANLDTM             
     │ String   Int64  DateTime             DateTime             String    DateTime            
─────┼─────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1            1  2025-01-01T00:00:00  2025-01-31T00:00:00  QD        2025-01-01T00:00:00
   2 │ 1            1  2025-02-01T00:00:00  2025-02-28T00:00:00  QD        2025-01-01T00:00:00
   3 │ 1            1  2025-03-01T00:00:00  2025-03-31T00:00:00  QD        2025-01-01T00:00:00

julia> expand_dose_events(df) # QD
90×7 DataFrame
 Row │ USUBJID  EVID   ASTDTM               AENDTM               EXDOSFRQ  FANLDTM              NFRLT   
     │ String   Int64  DateTime             DateTime             String    DateTime             Float64 
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1            1  2025-01-01T00:00:00  2025-01-31T00:00:00  QD        2025-01-01T00:00:00      0.0
   2 │ 1            1  2025-01-02T00:00:00  2025-01-31T00:00:00  QD        2025-01-01T00:00:00     24.0
   3 │ 1            1  2025-01-03T00:00:00  2025-01-31T00:00:00  QD        2025-01-01T00:00:00     48.0
   4 │ 1            1  2025-01-04T00:00:00  2025-01-31T00:00:00  QD        2025-01-01T00:00:00     72.0
   5 │ 1            1  2025-01-05T00:00:00  2025-01-31T00:00:00  QD        2025-01-01T00:00:00     96.0
   6 │ 1            1  2025-01-06T00:00:00  2025-01-31T00:00:00  QD        2025-01-01T00:00:00    120.0
  ⋮  │    ⋮       ⋮             ⋮                    ⋮              ⋮               ⋮              ⋮
  86 │ 1            1  2025-03-27T00:00:00  2025-03-31T00:00:00  QD        2025-01-01T00:00:00   2040.0
  87 │ 1            1  2025-03-28T00:00:00  2025-03-31T00:00:00  QD        2025-01-01T00:00:00   2064.0
  88 │ 1            1  2025-03-29T00:00:00  2025-03-31T00:00:00  QD        2025-01-01T00:00:00   2088.0
  89 │ 1            1  2025-03-30T00:00:00  2025-03-31T00:00:00  QD        2025-01-01T00:00:00   2112.0
  90 │ 1            1  2025-03-31T00:00:00  2025-03-31T00:00:00  QD        2025-01-01T00:00:00   2136.0
                                                                                         79 rows omitted

julia> df = DataFrame(USUBJID = "1", 
                      EVID = 1, 
                      ASTDTM = DateTime.(["2025-01-01", "2025-02-01", "2025-03-01"]), 
                      AENDTM = DateTime.(["2025-01-31", "2025-02-28", "2025-03-31"]), 
                      EXDOSFRQ = "EVERY WEEK", 
                      FANLDTM = DateTime.("2025-01-01"))
3×6 DataFrame
 Row │ USUBJID  EVID   ASTDTM               AENDTM               EXDOSFRQ    FANLDTM             
     │ String   Int64  DateTime             DateTime             String      DateTime            
─────┼───────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1            1  2025-01-01T00:00:00  2025-01-31T00:00:00  EVERY WEEK  2025-01-01T00:00:00
   2 │ 1            1  2025-02-01T00:00:00  2025-02-28T00:00:00  EVERY WEEK  2025-01-01T00:00:00
   3 │ 1            1  2025-03-01T00:00:00  2025-03-31T00:00:00  EVERY WEEK  2025-01-01T00:00:00

julia> expand_dose_events(df) # EVERY WEEK
14×7 DataFrame
 Row │ USUBJID  EVID   ASTDTM               AENDTM               EXDOSFRQ    FANLDTM              NFRLT   
     │ String   Int64  DateTime             DateTime             String      DateTime             Float64 
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1            1  2025-01-01T00:00:00  2025-01-31T00:00:00  EVERY WEEK  2025-01-01T00:00:00      0.0
   2 │ 1            1  2025-01-08T00:00:00  2025-01-31T00:00:00  EVERY WEEK  2025-01-01T00:00:00    168.0
   3 │ 1            1  2025-01-15T00:00:00  2025-01-31T00:00:00  EVERY WEEK  2025-01-01T00:00:00    336.0
   4 │ 1            1  2025-01-22T00:00:00  2025-01-31T00:00:00  EVERY WEEK  2025-01-01T00:00:00    504.0
   5 │ 1            1  2025-01-29T00:00:00  2025-01-31T00:00:00  EVERY WEEK  2025-01-01T00:00:00    672.0
   6 │ 1            1  2025-02-01T00:00:00  2025-02-28T00:00:00  EVERY WEEK  2025-01-01T00:00:00    744.0
   7 │ 1            1  2025-02-08T00:00:00  2025-02-28T00:00:00  EVERY WEEK  2025-01-01T00:00:00    912.0
   8 │ 1            1  2025-02-15T00:00:00  2025-02-28T00:00:00  EVERY WEEK  2025-01-01T00:00:00   1080.0
   9 │ 1            1  2025-02-22T00:00:00  2025-02-28T00:00:00  EVERY WEEK  2025-01-01T00:00:00   1248.0
  10 │ 1            1  2025-03-01T00:00:00  2025-03-31T00:00:00  EVERY WEEK  2025-01-01T00:00:00   1416.0
  11 │ 1            1  2025-03-08T00:00:00  2025-03-31T00:00:00  EVERY WEEK  2025-01-01T00:00:00   1584.0
  12 │ 1            1  2025-03-15T00:00:00  2025-03-31T00:00:00  EVERY WEEK  2025-01-01T00:00:00   1752.0
  13 │ 1            1  2025-03-22T00:00:00  2025-03-31T00:00:00  EVERY WEEK  2025-01-01T00:00:00   1920.0
  14 │ 1            1  2025-03-29T00:00:00  2025-03-31T00:00:00  EVERY WEEK  2025-01-01T00:00:00   2088.0

Example with custom column names using kwargs

julia> df_custom = DataFrame(USUBJID = "1", 
                             EVID = 1, 
                             START_TIME = DateTime.(["2025-01-01", "2025-01-08"]), 
                             END_TIME = DateTime.(["2025-01-07", "2025-01-14"]), 
                             FREQUENCY = ["QD", "BID"], 
                             FIRST_DOSE = DateTime.("2025-01-01"))
2×6 DataFrame
 Row │ USUBJID  EVID   START_TIME           END_TIME             FREQUENCY  FIRST_DOSE          
     │ String   Int64  DateTime             DateTime             String     DateTime            
─────┼─────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1            1  2025-01-01T00:00:00  2025-01-07T00:00:00  QD         2025-01-01T00:00:00
   2 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00

julia> expand_dose_events(df_custom, 
                          start_dtm = :START_TIME,
                          end_dtm = :END_TIME, 
                          trt_start_dtm = :FIRST_DOSE, 
                          dose_freq = :FREQUENCY)
20×7 DataFrame
 Row │ USUBJID  EVID   START_TIME           END_TIME             FREQUENCY  FIRST_DOSE           NFRLT   
     │ String   Int64  DateTime             DateTime             String     DateTime             Float64 
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1            1  2025-01-01T00:00:00  2025-01-07T00:00:00  QD         2025-01-01T00:00:00      0.0
   2 │ 1            1  2025-01-01T00:00:00  2025-01-07T00:00:00  QD         2025-01-01T00:00:00     24.0
   3 │ 1            1  2025-01-01T00:00:00  2025-01-07T00:00:00  QD         2025-01-01T00:00:00     48.0
   4 │ 1            1  2025-01-01T00:00:00  2025-01-07T00:00:00  QD         2025-01-01T00:00:00     72.0
   5 │ 1            1  2025-01-01T00:00:00  2025-01-07T00:00:00  QD         2025-01-01T00:00:00     96.0
   6 │ 1            1  2025-01-01T00:00:00  2025-01-07T00:00:00  QD         2025-01-01T00:00:00    120.0
   7 │ 1            1  2025-01-01T00:00:00  2025-01-07T00:00:00  QD         2025-01-01T00:00:00    144.0
   8 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    168.0
   9 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    180.0
  10 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    192.0
  11 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    204.0
  12 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    216.0
  13 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    228.0
  14 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    240.0
  15 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    252.0
  16 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    264.0
  17 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    276.0
  18 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    288.0
  19 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    300.0
  20 │ 1            1  2025-01-08T00:00:00  2025-01-14T00:00:00  BID        2025-01-01T00:00:00    312.0

Example with ONCE dosing (no expansion)

julia> df_once = DataFrame(USUBJID = "1", 
                           EVID = 1, 
                           ASTDTM = DateTime.("2025-01-01"), 
                           AENDTM = DateTime.("2025-01-01"), 
                           EXDOSFRQ = "ONCE", 
                           FANLDTM = DateTime.("2025-01-01"))
1×6 DataFrame
 Row │ USUBJID  EVID   ASTDTM               AENDTM               EXDOSFRQ  FANLDTM             
     │ String   Int64  DateTime             DateTime             String    DateTime            
─────┼─────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1            1  2025-01-01T00:00:00  2025-01-01T00:00:00  ONCE      2025-01-01T00:00:00

julia> expand_dose_events(df_once)
1×7 DataFrame
 Row │ USUBJID  EVID   ASTDTM               AENDTM               EXDOSFRQ  FANLDTM              NFRLT   
     │ String   Int64  DateTime             DateTime             String    DateTime             Float64 
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1            1  2025-01-01T00:00:00  2025-01-01T00:00:00  ONCE      2025-01-01T00:00:00      0.0

Example with different dosing frequencies

julia> df_freq = DataFrame(USUBJID = ["1", "1", "1"], 
                           EVID = [1, 1, 1], 
                           ASTDTM = DateTime.(["2025-01-01", "2025-01-01", "2025-01-01"]), 
                           AENDTM = DateTime.(["2025-01-03", "2025-01-03", "2025-01-02"]),
                           EXDOSFRQ = ["TID", "Q8H", "Q12H"], 
                           FANLDTM = DateTime.(["2025-01-01", "2025-01-01", "2025-01-01"]))
3×6 DataFrame
 Row │ USUBJID  EVID   ASTDTM               AENDTM               EXDOSFRQ  FANLDTM             
     │ String   Int64  DateTime             DateTime             String    DateTime            
─────┼─────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1            1  2025-01-01T00:00:00  2025-01-03T00:00:00  TID       2025-01-01T00:00:00
   2 │ 1            1  2025-01-01T00:00:00  2025-01-03T00:00:00  Q8H       2025-01-01T00:00:00
   3 │ 1            1  2025-01-01T00:00:00  2025-01-02T00:00:00  Q12H      2025-01-01T00:00:00

julia> expand_dose_events(df_freq)
17×7 DataFrame
 Row │ USUBJID  EVID   ASTDTM               AENDTM               EXDOSFRQ  FANLDTM              NFRLT   
     │ String   Int64  DateTime             DateTime             String    DateTime             Float64 
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1            1  2025-01-01T00:00:00  2025-01-03T00:00:00  TID       2025-01-01T00:00:00      0.0
   2 │ 1            1  2025-01-01T08:00:00  2025-01-03T00:00:00  TID       2025-01-01T00:00:00      8.0
   3 │ 1            1  2025-01-01T16:00:00  2025-01-03T00:00:00  TID       2025-01-01T00:00:00     16.0
   4 │ 1            1  2025-01-02T00:00:00  2025-01-03T00:00:00  TID       2025-01-01T00:00:00     24.0
   5 │ 1            1  2025-01-02T08:00:00  2025-01-03T00:00:00  TID       2025-01-01T00:00:00     32.0
   6 │ 1            1  2025-01-02T16:00:00  2025-01-03T00:00:00  TID       2025-01-01T00:00:00     40.0
   7 │ 1            1  2025-01-03T00:00:00  2025-01-03T00:00:00  TID       2025-01-01T00:00:00     48.0
   8 │ 1            1  2025-01-01T00:00:00  2025-01-03T00:00:00  Q8H       2025-01-01T00:00:00      0.0
   9 │ 1            1  2025-01-01T08:00:00  2025-01-03T00:00:00  Q8H       2025-01-01T00:00:00      8.0
  10 │ 1            1  2025-01-01T16:00:00  2025-01-03T00:00:00  Q8H       2025-01-01T00:00:00     16.0
  11 │ 1            1  2025-01-02T00:00:00  2025-01-03T00:00:00  Q8H       2025-01-01T00:00:00     24.0
  12 │ 1            1  2025-01-02T08:00:00  2025-01-03T00:00:00  Q8H       2025-01-01T00:00:00     32.0
  13 │ 1            1  2025-01-02T16:00:00  2025-01-03T00:00:00  Q8H       2025-01-01T00:00:00     40.0
  14 │ 1            1  2025-01-03T00:00:00  2025-01-03T00:00:00  Q8H       2025-01-01T00:00:00     48.0
  15 │ 1            1  2025-01-01T00:00:00  2025-01-02T00:00:00  Q12H      2025-01-01T00:00:00      0.0
  16 │ 1            1  2025-01-01T12:00:00  2025-01-02T00:00:00  Q12H      2025-01-01T00:00:00     12.0
  17 │ 1            1  2025-01-02T00:00:00  2025-01-02T00:00:00  Q12H      2025-01-01T00:00:00     24.0

ADaM.interdose_interval — Method

interdose_interval(df::DataFrame)

Returns Inter-dose Interval(II hours) when a valid dose freqency is passed. Dose frequencies are defined using CDISC SDTM Controlled Terminology. The assumptions made are

A day is 24 hours
A week is 7 days
A month is 30 days
An year is 52 weeks

Example

julia> interdose_interval("QD")
24

julia> interdose_interval("EVERY AFTERNOON")
24

julia> interdose_interval("Q45MIN")
0.75

julia> interdose_interval("QY")
8736

julia> df = DataFrame(USUBJID = "1", EXDOSFRQ = ["QD", "BID", "QD", "TID"])
4×2 DataFrame
 Row │ USUBJID  EXDOSFRQ 
     │ String   String   
─────┼───────────────────
   1 │ 1        QD
   2 │ 1        BID
   3 │ 1        QD
   4 │ 1        TID

julia> @rtransform! df :II = interdose_interval(:EXDOSFRQ)
4×3 DataFrame
 Row │ USUBJID  EXDOSFRQ  II   
     │ String   String    Real 
─────┼─────────────────────────
   1 │ 1        QD        24
   2 │ 1        BID       12.0
   3 │ 1        QD        24
   4 │ 1        TID        8.0

ADaM.join_columns — Method

join_columns(target, reference; on, order, rev_order, keep, filter_ref, filter_join, mode)

Add columns from a reference DataFrame to a target DataFrame with flexible matching and filtering.

Positional Arguments

target: Main DataFrame to join to
reference: DataFrame to join columns from

Keyword Arguments

on: Column(s) to match on (join keys)
order: Column(s) to sort matches by (optional)
rev_order: Boolean vector indicating descending sort for each order column (default: all ascending)
keep: Vector of source => destination pairs for column mapping (default: all reference columns)
filter_ref: Function to pre-filter reference rows (default: include all)
filter_join: Function (target_row, ref_row) to filter matches (default: include all)
mode: Take :first or :last match after sorting (default: :first)

Examples

julia> patients = DataFrame(SUBJID = ["001", "002", "003"], 
                            VISIT_DATE = [Date("2023-01-15"), Date("2023-01-20"), Date("2023-01-25")])
3×2 DataFrame
 Row │ SUBJID  VISIT_DATE 
     │ String  Date       
─────┼────────────────────
   1 │ 001     2023-01-15
   2 │ 002     2023-01-20
   3 │ 003     2023-01-25

julia> lab_data = DataFrame(SUBJID = ["001", "001", "001", "002", "002", "003"],
                            LAB_DATE = [Date("2023-01-01"), Date("2023-01-10"), Date("2023-01-18"), 
                                        Date("2023-01-05"), Date("2023-01-25"), Date("2023-01-20")],
                            LAB_VALUE = [100, 95, 88, 110, 105, 92])
6×3 DataFrame
 Row │ SUBJID  LAB_DATE    LAB_VALUE 
     │ String  Date        Int64     
─────┼───────────────────────────────
   1 │ 001     2023-01-01        100
   2 │ 001     2023-01-10         95
   3 │ 001     2023-01-18         88
   4 │ 002     2023-01-05        110
   5 │ 002     2023-01-25        105
   6 │ 003     2023-01-20         92

julia> join_columns(patients,
                    lab_data;
                    on = [:SUBJID],
                    order = [:LAB_DATE],
                    keep = [:LAB_VALUE => :PRIOR_LAB, :LAB_DATE => :PRIOR_LAB_DATE],
                    filter_join = (t, r) -> r.LAB_DATE < t.VISIT_DATE,
                    mode = :last)
3×4 DataFrame
 Row │ SUBJID  VISIT_DATE  PRIOR_LAB  PRIOR_LAB_DATE 
     │ String  Date        Int64?     Date?          
─────┼───────────────────────────────────────────────
   1 │ 001     2023-01-15         95  2023-01-10
   2 │ 002     2023-01-20        110  2023-01-05
   3 │ 003     2023-01-25         92  2023-01-20

ADaM.make_dose_interruption — Method

make_dose_interruption(df::DataFrame)

Creates dose interruption columns INTRFL (Interruption Flag) and INTRDUR (Interruption Duration) on the EX (dose) dataset where doses are recorded as intervals.
If there is any gap(more than a day) between end date AENDT of one dosing interval and start date ASTDT of the next interval within a subject, the interruption will be recorded in rows. If there is no interruption, these columns will be missing.

The EX dataset needs to be prepared with the following columns :

ASTDT (Analysis Start Date)
AENDT (Analysis End Date)
FANLDTM (First Anaalyte dose DateTime)

Example

julia> df = DataFrame(USUBJID = "1", EXDOSFRQ = "QD", EVID = 1, ASTDT = Date.(["2025-01-01", "2025-02-01", "2025-03-03"]), AENDT = Date.(["2025-01-31", "2025-02-28", "2025-03-31"]), FANLDTM = Date.("2025-01-01"))
3×6 DataFrame
 Row │ USUBJID  EXDOSFRQ  EVID   ASTDT       AENDT       FANLDTM    
     │ String   String    Int64  Date        Date        Date       
─────┼──────────────────────────────────────────────────────────────
   1 │ 1        QD            1  2025-01-01  2025-01-31  2025-01-01
   2 │ 1        QD            1  2025-02-01  2025-02-28  2025-01-01
   3 │ 1        QD            1  2025-03-03  2025-03-31  2025-01-01

julia> make_dose_interruption(df)
4×10 DataFrame
 Row │ USUBJID  EXDOSFRQ  EVID   ASTDT       AENDT       FANLDTM     ASTDY  AENDY  INTRFL   INTRDUR 
     │ String   String    Int64  Date        Date        Date        Int64  Int64  String?  Int64?  
─────┼──────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 1        QD            1  2025-01-01  2025-01-31  2025-01-01      1     31  missing  missing 
   2 │ 1        QD            1  2025-02-01  2025-02-28  2025-01-01     32     59  missing  missing 
   3 │ 1        QD            1  2025-03-01  2025-03-02  2025-01-01     60     61  Y              2
   4 │ 1        QD            1  2025-03-03  2025-03-31  2025-01-01     62     90  missing  missing

ADaM.make_dtm — Method

make_dtm(df::DataFrame, col; prefix, fmt)

Derives DateTime column from DateTime-convertible String/Date column in a DataFrame.
The DateTime column with of custom prefix(kwarg) suffix 'DTM' will be of the format yyyy-mm-ddTHH:MM:SS.
Custom DateTime formats can be passes through the fmt keyword.

Example

julia> df = DataFrame(DTC = ["2024", "2025", "2026"])
3×1 DataFrame
 Row │ DTC    
     │ String 
─────┼────────
   1 │ 2024
   2 │ 2025
   3 │ 2026

julia> make_dtm(df, "DTC")
3×2 DataFrame
 Row │ DTC     ADTM                
     │ String  DateTime            
─────┼─────────────────────────────
   1 │ 2024    2024-01-01T00:00:00
   2 │ 2025    2025-01-01T00:00:00
   3 │ 2026    2026-01-01T00:00:00

julia> df = DataFrame(DTC = Date.(["2024", "2025", "2026"]))
3×1 DataFrame
 Row │ DTC        
     │ Date       
─────┼────────────
   1 │ 2024-01-01
   2 │ 2025-01-01
   3 │ 2026-01-01

julia> make_dtm(df, "DTC", prefix = "B")
3×2 DataFrame
 Row │ DTC         BDTM                
     │ Date        DateTime            
─────┼─────────────────────────────────
   1 │ 2024-01-01  2024-01-01T00:00:00
   2 │ 2025-01-01  2025-01-01T00:00:00
   3 │ 2026-01-01  2026-01-01T00:00:00

Input col and prefix can be passed as either String or Symbol.

julia> df = DataFrame(DTC = ["2024", "2025", "2026"])
3×1 DataFrame
 Row │ DTC    
     │ String 
─────┼────────
   1 │ 2024
   2 │ 2025
   3 │ 2026

julia> make_dtm(df, :DTC, prefix = :AST)
3×2 DataFrame
 Row │ DTC     ASTDTM              
     │ String  DateTime            
─────┼─────────────────────────────
   1 │ 2024    2024-01-01T00:00:00
   2 │ 2025    2025-01-01T00:00:00
   3 │ 2026    2026-01-01T00:00:00

For a different DateTime Format. Valid Julia Date Formats

julia> df = DataFrame(DTC = ["2024-May-21T01:10", "2024-Jun-22T02:20", "2024-Aug-23T03:30"])
3×1 DataFrame
 Row │ DTC               
     │ String            
─────┼───────────────────
   1 │ 2024-May-21T01:10
   2 │ 2024-Jun-22T02:20
   3 │ 2024-Aug-23T03:30

julia> make_dtm(df, "DTC", prefix = "C", fmt = "yyyy-u-ddTHH:MM")
3×2 DataFrame
 Row │ DTC                CDTM                
     │ String             DateTime            
─────┼────────────────────────────────────────
   1 │ 2024-May-21T01:10  2024-05-21T01:10:00
   2 │ 2024-Jun-22T02:20  2024-06-22T02:20:00
   3 │ 2024-Aug-23T03:30  2024-08-23T03:30:00

julia> make_dtm(df, :DTC, prefix = "C", fmt = dateformat"yyyy-u-ddTHH:MM") # dateformat object
3×2 DataFrame
 Row │ DTC                CDTM                
     │ String             DateTime            
─────┼────────────────────────────────────────
   1 │ 2024-May-21T01:10  2024-05-21T01:10:00
   2 │ 2024-Jun-22T02:20  2024-06-22T02:20:00
   3 │ 2024-Aug-23T03:30  2024-08-23T03:30:00

ADaM.make_dtm_to_dt — Method

make_dtm_to_dt(df::DataFrame, col; prefix)

Derives Date column from DateTime column in a DataFrame.
The Time column with suffix 'TM' will be of the format yyyy-mm-dd.
Input col and prefix can be passed as either String or Symbol.

Example

julia> df = DataFrame(ADTM = DateTime.(["2024-10-21T01:10", "2024-11-22T02:20", "2024-12-23T03:30"]))
3×1 DataFrame
 Row │ ADTM                
     │ DateTime            
─────┼─────────────────────
   1 │ 2024-10-21T01:10:00
   2 │ 2024-11-22T02:20:00
   3 │ 2024-12-23T03:30:00

julia> make_dtm_to_dt(df, "ADTM", prefix = "A")
3×2 DataFrame
 Row │ ADTM                 ADT        
     │ DateTime             Date       
─────┼─────────────────────────────────
   1 │ 2024-10-21T01:10:00  2024-10-21
   2 │ 2024-11-22T02:20:00  2024-11-22
   3 │ 2024-12-23T03:30:00  2024-12-23

julia> make_dtm_to_dt(df, "ADTM")
3×2 DataFrame
 Row │ ADTM                 ADT        
     │ DateTime             Date       
─────┼─────────────────────────────────
   1 │ 2024-10-21T01:10:00  2024-10-21
   2 │ 2024-11-22T02:20:00  2024-11-22
   3 │ 2024-12-23T03:30:00  2024-12-23

ADaM.make_dtm_to_tm — Method

make_dtm_to_tm(df::DataFrame, col; prefix)

Derives Time column from DateTime column in a DataFrame.
The Time column with suffix 'TM' will be of the format hh:mm:ss.
Input col and prefix can be passed as either String or Symbol.

Example

julia> df = DataFrame(ADTM = DateTime.(["2024-10-21T01:10", "2024-11-22T02:20", "2024-12-23T03:30"]))
3×1 DataFrame
 Row │ ADTM                
     │ DateTime            
─────┼─────────────────────
   1 │ 2024-10-21T01:10:00
   2 │ 2024-11-22T02:20:00
   3 │ 2024-12-23T03:30:00

julia> make_dtm_to_tm(df, "ADTM", prefix = "A")
3×2 DataFrame
 Row │ ADTM                 ATM      
     │ DateTime             Time     
─────┼───────────────────────────────
   1 │ 2024-10-21T01:10:00  01:10:00
   2 │ 2024-11-22T02:20:00  02:20:00
   3 │ 2024-12-23T03:30:00  03:30:00

julia> make_dtm_to_tm(df, "ADTM")
3×2 DataFrame
 Row │ ADTM                 ATM      
     │ DateTime             Time     
─────┼───────────────────────────────
   1 │ 2024-10-21T01:10:00  01:10:00
   2 │ 2024-11-22T02:20:00  02:20:00
   3 │ 2024-12-23T03:30:00  03:30:00

ADaM.make_duration — Method

make_duration(; start_dtm, end_dtm, output_unit)
make_duration(df, col; start_dtm, end_dtm, output_unit)

Computes the duration between two dates or datetimes and returns the result in the specified unit. Dates can be passed as Scalars or Vectors via DataFrame.`

Arguments:

start_dtm: The starting date or datetime.
end_dtm: The ending date or datetime.
output_unit: The unit for the output duration (e.g., :h for hours, :min for minutes). Default is :h.

Valid time units supported by DynamicQuantities.jl.

Returns:

The DataFrame with new columns (col) for the duration, its unit, and the stripped value.

Examples

Scalar Dates

julia> make_duration(start_dtm=DateTime(2024, 1, 1, 8), end_dtm=DateTime(2024, 1, 1, 10), output_unit=:h)
2.0 h

julia> make_duration(start_dtm=Date(2024, 1, 1), end_dtm=Date(2024, 1, 2), output_unit=:h)
24.0 h

Date columns

julia> df = DataFrame(ASTDT = [Date(2024,1,1), Date(2024,2,1)], 
                      AENDT = [Date(2024,3,1), Date(2024,4,1)])
2×2 DataFrame
 Row │ ASTDT       AENDT      
     │ Date        Date       
─────┼────────────────────────
   1 │ 2024-01-01  2024-03-01
   2 │ 2024-02-01  2024-04-01

julia> make_duration(df, :ADUR; start_dtm=:ASTDT, end_dtm=:AENDT, output_unit=:day)
2×4 DataFrame
 Row │ ASTDT       AENDT       ADUR     ADURU     
     │ Date        Date        Float64  Symbolic… 
─────┼────────────────────────────────────────────
   1 │ 2024-01-01  2024-03-01     60.0  day
   2 │ 2024-02-01  2024-04-01     60.0  day

DateTime columns

julia> df = DataFrame(ASTDTM = [DateTime(2024,1,1,8), DateTime(2024,1,1,9)], 
                      AENDTM = [DateTime(2024,1,1,10), DateTime(2024,1,1,12)])
2×2 DataFrame
 Row │ ASTDTM               AENDTM              
     │ DateTime             DateTime            
─────┼──────────────────────────────────────────
   1 │ 2024-01-01T08:00:00  2024-01-01T10:00:00
   2 │ 2024-01-01T09:00:00  2024-01-01T12:00:00

julia> make_duration(df, :ADUR; start_dtm=:ASTDTM, end_dtm=:AENDTM, output_unit=:h)
2×4 DataFrame
 Row │ ASTDTM               AENDTM               ADUR     ADURU     
     │ DateTime             DateTime             Float64  Symbolic… 
─────┼──────────────────────────────────────────────────────────────
   1 │ 2024-01-01T08:00:00  2024-01-01T10:00:00      2.0  h
   2 │ 2024-01-01T09:00:00  2024-01-01T12:00:00      3.0  h
   
julia> make_duration(df, :ADUR; start_dtm=:ASTDTM, end_dtm=:AENDTM, output_unit=:day)
2×4 DataFrame
 Row │ ASTDTM               AENDTM               ADUR       ADURU     
     │ DateTime             DateTime             Float64    Symbolic… 
─────┼────────────────────────────────────────────────────────────────
   1 │ 2024-01-01T08:00:00  2024-01-01T10:00:00  0.0833333  day
   2 │ 2024-01-01T09:00:00  2024-01-01T12:00:00  0.125      day

ADaM.make_nominal_time — Method

derive_nominal_time(pc, ex, domain; visitdy_col, atptn_col)

Derives the nominal time column NFRLT for PC and EX SDTM datasets.

Arguments

pc: PC DataFrame (must contain ATPTN)
ex: EX DataFrame
domain: The output DataFrame (PC or EX)
visitdy_col: Column name for VISITDY (Symbol or String, default = :VISITDY)
atptn_col: Column name for ATPTN (Symbol or String, default = :ATPTN)

Output

Returns a DataFrame (PC or EX) with a new NFRLT column representing nominal time in hours.

Notes

For PC: NFRLT = (VISITDY - 1) * 24 + ATPTN
For EX: NFRLT = (VISITDY - 1) * 24
If VISITDY is missing in either dataset, it is joined from the other dataset using common keys (excluding ATPTN for PC).
Throws an error if required columns are missing or if join keys cannot be determined.

Examples

# Example PC and EX DataFrames
julia> pc = DataFrame(DOMAIN = ["PC", "PC"], USUBJID = ["01", "01"], VISITDY = [2, 3], ATPTN = [0, 24])
2×4 DataFrame
 Row │ DOMAIN  USUBJID  VISITDY  ATPTN 
     │ String  String   Int64    Int64 
─────┼─────────────────────────────────
   1 │ PC      01             2      0
   2 │ PC      01             3     24

julia> ex = DataFrame(DOMAIN = ["EX", "EX"], USUBJID = ["01", "01"], VISITDY = [2, 3])
2×3 DataFrame
 Row │ DOMAIN  USUBJID  VISITDY 
     │ String  String   Int64   
─────┼──────────────────────────
   1 │ EX      01             2
   2 │ EX      01             3

# Example 1: Standard usage for PC
julia> make_nominal_time(pc, ex, "PC")
2×5 DataFrame
 Row │ DOMAIN  USUBJID  VISITDY  ATPTN  NFRLT 
     │ String  String   Int64    Int64  Int64 
─────┼────────────────────────────────────────
   1 │ PC      01             2      0     24
   2 │ PC      01             3     24     72

# Example 1: Standard usage for PC
julia> make_nominal_time(pc, ex, "EX")
2×4 DataFrame
 Row │ DOMAIN  USUBJID  VISITDY  NFRLT 
     │ String  String   Int64    Int64 
─────┼─────────────────────────────────
   1 │ EX      01             2     24
   2 │ EX      01             3     48

# Example 2: Custom column names
julia> rename!(pc, :VISITDY => :VISDY, :ATPTN => :TIMEPT)
2×4 DataFrame
 Row │ DOMAIN  USUBJID  VISDY  TIMEPT 
     │ String  String   Int64  Int64  
─────┼────────────────────────────────
   1 │ PC      01           2       0
   2 │ PC      01           3      24

julia> make_nominal_time(pc, ex, "PC"; visitdy_col=:VISDY, atptn_col=:TIMEPT)
2×5 DataFrame
 Row │ DOMAIN  USUBJID  VISDY  TIMEPT  NFRLT 
     │ String  String   Int64  Int64   Int64 
─────┼───────────────────────────────────────
   1 │ PC      01           2       0     24
   2 │ PC      01           3      24     72

ADaM.make_time_fl — Method

make_time_fl(df::DataFrame, col; group, order)

This function creates flag columns (FN (Flag Numeric) and FC (Flag comment)), that flags the time information based on following criteria.

Missing Times
Negative Times
Non-ascending Times

The points to be considered for data validation can be referred through this link: https://www.lexjansen.com/phuse-us/2020/as/AS06.pdf#page=3

order and group variables can be used while flagging.

Example

julia> df = DataFrame(USUBJID = 1, AFRLT = [0, -1, -2, 3, missing, 5, 4])
10×2 DataFrame
 Row │ USUBJID  AFRLT   
     │ Int64    Int64?  
─────┼──────────────────
   1 │       1        0
   2 │       1       -1
   3 │       1       -2
   4 │       1        3
   5 │       1  missing 
   6 │       1        5
   7 │       1        4

julia> make_time_fl(df, :AFRLT, group = [], order = []) # group variables can be kept empty for dataframe without group.
7×4 DataFrame
 Row │ USUBJID  AFRLT    AFRLTFN  AFRLTFC            
     │ Int64    Int64?   Int64    String             
─────┼───────────────────────────────────────────────
   1 │       1        0        0
   2 │       1       -1        1  Negative Time
   3 │       1       -2        1  Negative Time
   4 │       1        3        0
   5 │       1  missing        1  Missing Time
   6 │       1        5        0
   7 │       1        4        1  Non-ascending Time

julia> df1 = DataFrame(USUBJID = [1,1,1,2,2,2], NFRLT = [missing, 2, 1, 0, 1, -1])
6×2 DataFrame
 Row │ USUBJID  NFRLT   
     │ Int64    Int64?  
─────┼──────────────────
   1 │       1  missing 
   2 │       1        2
   3 │       1        1
   4 │       2        0
   5 │       2        1
   6 │       2       -1

julia> make_time_fl(df1, "NFRLT", group=["USUBJID"])
6×4 DataFrame
 Row │ USUBJID  NFRLT    NFRLTFN  NFRLTFC            
     │ Int64    Int64?   Int64    String             
─────┼───────────────────────────────────────────────
   1 │       1  missing        1  Missing Time
   2 │       1        2        0
   3 │       1        1        1  Non-ascending Time
   4 │       2        0        0
   5 │       2        1        0
   6 │       2       -1        1  Negative Time

ADaM.merge_columns — Method

 merge_columns(df1, df2; filter_func, keep, kwargs...)

leftjoin's new column(s) to the primary dataset based on columns from a secondary dataset.

Positional Arguments

df1: Main DataFrame to join to
df2: DataFrame to leftjoin columns from

Keyword Arguments

keep: Vector of source => destination pairs for column mapping (default: all reference columns)
filter_func: Function to filter rows from secondary dataset (default: include all)
kwargs...: Keyword aguments of leftjoin function can be passed. Eg: on, makeunique etc.

Example

julia> dm = DataFrame(USUBJID = ["01-701-1015", "01-701-1023", "01-701-1028"], SEX = ["M", "F", "F"], COUNTRY = ["USA", "USA", "NOR"])
3×3 DataFrame
 Row │ ID           SEX     COUNTRY 
     │ String       String  String  
─────┼──────────────────────────────
   1 │ 01-701-1015  M       USA
   2 │ 01-701-1023  F       USA
   3 │ 01-701-1028  F       NOR

julia> vs = DataFrame(
    [
        ("01-701-1015", "WEIGHT", 50, "kg"),
        ("01-701-1015", "HEIGHT", 150, "cm"),
        ("01-701-1023", "WEIGHT", 60, "kg"),
        ("01-701-1023", "HEIGHT", 160, "cm"),
        ("01-701-1028", "WEIGHT", 70, "kg"),
        ("01-701-1028", "HEIGHT", 170, "cm"),
    ],
    [:USUBJID, :VSTESTCD, :VSSTRESN, :VSSTRESU],
)
6×4 DataFrame
 Row │ USUBJID      VSTESTCD  VSSTRESN  VSSTRESU 
     │ String       String    Int64     String   
─────┼───────────────────────────────────────────
   1 │ 01-701-1015  WEIGHT          50  kg
   2 │ 01-701-1015  HEIGHT         150  cm
   3 │ 01-701-1023  WEIGHT          60  kg
   4 │ 01-701-1023  HEIGHT         160  cm
   5 │ 01-701-1028  WEIGHT          70  kg
   6 │ 01-701-1028  HEIGHT         170  cm

julia> merge_ht = merge_columns(
            dm, 
            vs, 
            filter_func = r -> r.VSTESTCD == "HEIGHT", 
            keep = [:VSSTRESN => :HTBL, :VSSTRESU => :HTBLU], 
            on = [:USUBJID]
        )
3×5 DataFrame
 Row │ USUBJID      SEX     COUNTRY  HTBL    HTBLU   
     │ String       String  String   Int64?  String? 
─────┼───────────────────────────────────────────────
   1 │ 01-701-1015  M       USA         150  cm
   2 │ 01-701-1023  F       USA         160  cm
   3 │ 01-701-1028  F       NOR         170  cm

julia> merge_ht_wt = merge_columns(
            merge_ht, 
            vs, 
            filter_func = r -> r.VSTESTCD == "WEIGHT", 
            keep = ["VSSTRESN" => "WTBL", "VSSTRESU" => "WTBLU"], 
            on = ["USUBJID"]
        )
3×7 DataFrame
 Row │ USUBJID      SEX     COUNTRY  HTBL    HTBLU    WTBL    WTBLU   
     │ String       String  String   Int64?  String?  Int64?  String? 
─────┼────────────────────────────────────────────────────────────────
   1 │ 01-701-1015  M       USA         150  cm           50  kg
   2 │ 01-701-1023  F       USA         160  cm           60  kg
   3 │ 01-701-1028  F       NOR         170  cm           70  kg

julia> merge_columns(
            dm, 
            vs, 
            filter_func = r -> r.VSTESTCD == "HEIGHT", 
            on = [:USUBJID]
        ) # without specifying keep
3×6 DataFrame
 Row │ USUBJID      SEX     COUNTRY  VSTESTCD  VSSTRESN  VSSTRESU 
     │ String       String  String   String?   Int64?    String?  
─────┼────────────────────────────────────────────────────────────
   1 │ 01-701-1015  M       USA      HEIGHT         150  cm
   2 │ 01-701-1023  F       USA      HEIGHT         160  cm
   3 │ 01-701-1028  F       NOR      HEIGHT         170  cm

julia> merge_columns(
            dm, 
            vs, 
            on = [:USUBJID]
        ) # without specifying keep and filter_func
6×6 DataFrame
 Row │ USUBJID      SEX     COUNTRY  VSTESTCD  VSSTRESN  VSSTRESU 
     │ String       String  String   String?   Int64?    String?  
─────┼────────────────────────────────────────────────────────────
   1 │ 01-701-1015  M       USA      WEIGHT          50  kg
   2 │ 01-701-1015  M       USA      HEIGHT         150  cm
   3 │ 01-701-1023  F       USA      WEIGHT          60  kg
   4 │ 01-701-1023  F       USA      HEIGHT         160  cm
   5 │ 01-701-1028  F       NOR      WEIGHT          70  kg
   6 │ 01-701-1028  F       NOR      HEIGHT         170  cm

ADaM.merge_covariates — Method

merge_covariates(df1, df2; domain, covariates, filter_func, baseline=true, kwargs...)

Iteratively merges covariate columns from a secondary DataFrame (df2) into a primary DataFrame (df1) using ADaM.merge_columns for each covariate in the covariates vector.
This is designed for SDTM-style domains (e.g., LB, VS) where a test code column (e.g., LBTESTCD) identifies the covariate.

Positional Arguments

df1: Main DataFrame to join to (primary dataset)
df2: DataFrame to leftjoin columns from (secondary dataset)

Keyword Arguments

domain: domain prefix (String or Symbol) used to construct test code and value column names (e.g., LBTESTCD, LBSTRESN, LBSTRESU)
covariates: Vector of covariate names to merge iteratively
filter_func: Function to filter rows from the secondary dataset for each covariate.
baseline: If true, appends "BL"/"BLU" to output column names (default: true)
kwargs...: Keyword aguments of leftjoin function can be passed. Eg: on, makeunique etc.

Examples

julia> dm = DataFrame(USUBJID = ["01-701-1015"], SEX = ["M"], COUNTRY = ["USA"])
1×3 DataFrame
 Row │ USUBJID      SEX     COUNTRY 
     │ String       String  String  
─────┼──────────────────────────────
   1 │ 01-701-1015  M       USA

julia> lb = DataFrame(USUBJID = fill("01-701-1015", 16),
                    LBTESTCD = repeat(["AST", "ALT", "CREAT", "BILI"], inner = 4),
                    LBSTRESN = [35, 30, 28, 33, 40, 32, 29, 35, 1, 0, 1, 1, 0, 1, 1, 0],
                    LBSTRESU = vcat(fill("U/L", 8), fill("mg/dL", 8)),
                    LBBLFL = repeat(["Y", "", "", ""], 4),
                )
16×5 DataFrame
 Row │ USUBJID      LBTESTCD  LBSTRESN  LBSTRESU  LBBLFL 
     │ String       String    Int64     String    String 
─────┼───────────────────────────────────────────────────
   1 │ 01-701-1015  AST             35  U/L       Y
   2 │ 01-701-1015  AST             30  U/L
   3 │ 01-701-1015  AST             28  U/L
   4 │ 01-701-1015  AST             33  U/L
  ⋮  │      ⋮          ⋮         ⋮         ⋮        ⋮
  13 │ 01-701-1015  BILI             0  mg/dL     Y
  14 │ 01-701-1015  BILI             1  mg/dL
  15 │ 01-701-1015  BILI             1  mg/dL
  16 │ 01-701-1015  BILI             0  mg/dL
                                           8 rows omitted

julia> merge_covariates(
            dm,
            lb;
            domain = "LB",
            covariates = ["AST", "ALT", "CREAT", "BILI"],
            filter_func = r -> coalesce(r.LBBLFL, "") == "Y",
            on = [:USUBJID],
        )
1×11 DataFrame
 Row │ USUBJID      SEX     COUNTRY  ASTBL   ASTBLU   ALTBL   ALTBLU   CREATBL  CREATBLU  BILIBL  BILIBLU 
     │ String       String  String   Int64?  String?  Int64?  String?  Int64?   String?   Int64?  String? 
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 01-701-1015  M       USA          35  U/L          40  U/L            1  mg/dL          0  mg/dL

The baseline filter will be automatically determined, if the appropriate columns are present (LBBLFL == "Y", LBLOBXFL == "Y", VISIT == "Screening")

julia> merge_covariates(
            dm,
            lb;
            domain = "LB",
            covariates = ["AST", "ALT", "CREAT", "BILI"],
            on = [:USUBJID],
        )
1×11 DataFrame
 Row │ USUBJID      SEX     COUNTRY  ASTBL   ASTBLU   ALTBL   ALTBLU   CREATBL  CREATBLU  BILIBL  BILIBLU 
     │ String       String  String   Int64?  String?  Int64?  String?  Int64?   String?   Int64?  String? 
─────┼────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 01-701-1015  M       USA          35  U/L          40  U/L            1  mg/dL          0  mg/dL

ADaM.nci_liver_score — Method

nci_liver_score(bili::Number, ast::Number, uln_bili::Number, uln_ast::Number; kwargs...)
nci_liver_score(bili::Quantity, ast::Number, uln_bili::Quantity, uln_ast::Quantity; kwargs...)
nci_liver_score(df::DataFrame; kwargs...)

Compute NCI (National Cancer Institute) hepatic impairment score (numeric, categorical) from Bilirubin and AST values which can be provided as Quantitys, Scalars or Vectors via DataFrame. Refer NCI Criteria

Arguments

bili: Bilirubin value.
ast: AST value.
uln_bili: Upper limit of normal for bilirubin.
uln_ast: Upper limit of normal for AST.
bili_unit: Bilirubin unit. (default: "umol/L")
ast_unit: Column name for AST unit (default: "U/L").

Default Columns

The following default column names are used for DataFrame input:

bili = :BILIBL
ast = :ASTBL
uln_bili = :ULN_BILIBL
uln_ast = :ULN_ASTBL
bili_unit = :BILIBLU
ast_unit = :ASTBLU
col = :NCI

Output

Numeric NCI liver score (1=Normal, 2=Mild, 3=Moderate, 4=Severe, or missing)
Categorical NCI liver score ("Normal", "Mild", "Moderate", "Severe", or missing)

Result output in form of DataFrame Columns or a 2-element Tuple.

Notes

The function implements the NCI hepatic impairment criteria:
- Severe: bili > 3 * uln_bili
- Moderate: 1.5 * uln_bili < bili <= 3 * uln_bili
- Mild: (bili > uln_bili && bili <= 1.5 * uln_bili) || (ast > uln_ast)
- Normal: bili <= uln_bili && ast <= uln_ast
Returns missing for both outputs if any argument is missing or does not match a category.
Output columns can be customized via the col keyword argument.
The units of bili and uln_bili must match; otherwise, an error is thrown.

Examples

julia> nci_liver_score(7.0, 20.0, 2.0, 30.0, bili_unit="mg/dL", ast_unit="U/L")
(4, "Severe")

julia> nci_liver_score(4.0, 20.0, 2.0, 30.0)
(3, "Moderate")

julia> nci_liver_score(2.2, 20.0, 2.0, 15.0)
(2, "Mild")

julia> nci_liver_score(1.0, 10.0, 2.0, 15.0)
(1, "Normal")

julia> nci_liver_score(7us"mg/dL", 20, 2us"mg/dL", 30)
(4, "Severe")

julia> nci_liver_score(4us"mg/dL", 20, 2us"mg/dL", 30)
(3, "Moderate")

julia> nci_liver_score(2.2us"mg/dL", 20, 2us"mg/dL", 15)
(2, "Mild")

julia> nci_liver_score(1us"mg/dL", 10, 2us"mg/dL", 15)
(1, "Normal")

julia> df = DataFrame(BILIBL = [7.0, 4.0, 2.2, 1.0, missing],
                    ASTBL = [20.0, 20.0, 20.0, 10.0, 10.0],
                    ULN_BILIBL = [2.0, 2.0, 2.0, 2.0, 2.0],
                    ULN_ASTBL = [30.0, 30.0, 15.0, 15.0, 15.0],
                    BILIBLU = "mg/dL",
                    ASTBLU = "U/L")
5×6 DataFrame
 Row │ BILIBL     ASTBL    ULN_BILIBL  ULN_ASTBL  BILIBLU  ASTBLU 
     │ Float64?   Float64  Float64     Float64    String   String 
─────┼────────────────────────────────────────────────────────────
   1 │       7.0     20.0         2.0       30.0  mg/dL    U/L
   2 │       4.0     20.0         2.0       30.0  mg/dL    U/L
   3 │       2.2     20.0         2.0       15.0  mg/dL    U/L
   4 │       1.0     10.0         2.0       15.0  mg/dL    U/L
   5 │ missing       10.0         2.0       15.0  mg/dL    U/L

julia> nci_liver_score(df)
5×8 DataFrame
 Row │ BILIBL     ASTBL    ULN_BILIBL  ULN_ASTBL  BILIBLU  ASTBLU  NCIN     NCIC     
     │ Float64?   Float64  Float64     Float64    String   String  Int64?   String?  
─────┼───────────────────────────────────────────────────────────────────────────────
   1 │       7.0     20.0         2.0       30.0  mg/dL    U/L           4  Severe
   2 │       4.0     20.0         2.0       30.0  mg/dL    U/L           3  Moderate
   3 │       2.2     20.0         2.0       15.0  mg/dL    U/L           2  Mild
   4 │       1.0     10.0         2.0       15.0  mg/dL    U/L           1  Normal
   5 │ missing       10.0         2.0       15.0  mg/dL    U/L     missing  missing

julia> nci_liver_score(df, col=:MYNCI)
5×8 DataFrame
 Row │ BILIBL     ASTBL    ULN_BILIBL  ULN_ASTBL  BILIBLU  ASTBLU  MYNCIN   MYNCIC   
     │ Float64?   Float64  Float64     Float64    String   String  Int64?   String?  
─────┼───────────────────────────────────────────────────────────────────────────────
   1 │       7.0     20.0         2.0       30.0  mg/dL    U/L           4  Severe
   2 │       4.0     20.0         2.0       30.0  mg/dL    U/L           3  Moderate
   3 │       2.2     20.0         2.0       15.0  mg/dL    U/L           2  Mild
   4 │       1.0     10.0         2.0       15.0  mg/dL    U/L           1  Normal
   5 │ missing       10.0         2.0       15.0  mg/dL    U/L     missing  missing

ADaM.round_columns — Method

round_columns(df::DataFrame, digit::Int64)

Rounds values across Float columns to specific number of digits.

Example

julia> df = DataFrame(Col1 = [1, 1.0023], Col2 = [3.14159, 2], Col3 = [3, 1.7], Col4 = [1, 4])
2×4 DataFrame
 Row │ Col1     Col2     Col3     Col4  
     │ Float64  Float64  Float64  Int64 
─────┼──────────────────────────────────
   1 │  1.0     3.14159      3.0      1
   2 │  1.0023  2.0          1.7      4


julia> round_columns(df, 2)
2×4 DataFrame
 Row │ Col1     Col2     Col3     Col4    
     │ Float64  Float64  Float64  Float64 
─────┼────────────────────────────────────
   1 │     1.0     3.14      3.0      1.0
   2 │     1.0     2.0       1.7      4.0

ADaM.set_exclusion — Method

set_exclusion(df, excl_func, comment; group, order)

Helps to set exclusions to the dataframe rows based on input excl_func and group variables.
The exclusion comment can be passed as a String or from a DataFrame column(Symbol).

Keyword Arguments

excl_func: Function condition based on which rows of a dataframe are flagged as excluded.
group: Grouping variable(s) for exclusion logic.
order: Ordering variable(s) for sorting within groups.

Example

julia> df = DataFrame(
        ID = [1,1,1,2,2,2,3,3,3], 
        SEQ = [1,2,3,1,2,3,1,2,3],
        EVID = [1,1,1,0,0,0,0,1,0],  
        STAT = ["NA","NA", "NA", missing, missing, missing, missing, "NA", missing],
        CONC = [missing,missing,missing,20,40,60,30,missing,70],
    )
9×5 DataFrame
 Row │ ID     SEQ    EVID   STAT     CONC    
     │ Int64  Int64  Int64  String?  Int64?  
─────┼───────────────────────────────────────
   1 │     1      1      1  NA       missing 
   2 │     1      2      1  NA       missing 
   3 │     1      3      1  NA       missing 
   4 │     2      1      0  missing       20
   5 │     2      2      0  missing       40
   6 │     2      3      0  missing       60
   7 │     3      1      0  missing       30
   8 │     3      2      1  NA       missing 
   9 │     3      3      0  missing       70

# Exclusion 1: Subjects with missing conc. data
julia> set_exclusion(df, "All missing conc subs", excl_func = group -> all(ismissing, group.CONC), group = :ID)
9×7 DataFrame
 Row │ ID     SEQ    EVID   STAT     CONC     EXCLFCOM               EXCLF 
     │ Int64  Int64  Int64  String?  Int64?   String?                Int64 
─────┼─────────────────────────────────────────────────────────────────────
   1 │     1      1      1  NA       missing  All missing conc subs      1
   2 │     1      2      1  NA       missing  All missing conc subs      1
   3 │     1      3      1  NA       missing  All missing conc subs      1
   4 │     2      1      0  missing       20  missing                    0
   5 │     2      2      0  missing       40  missing                    0
   6 │     2      3      0  missing       60  missing                    0
   7 │     3      1      0  missing       30  missing                    0
   8 │     3      2      1  NA       missing  missing                    0
   9 │     3      3      0  missing       70  missing                    0

# Exclusion 2: Subjects with no dosing data
julia> set_exclusion(df, "No dose subs", excl_func = group -> all(iszero, group.EVID), group = "ID")
9×7 DataFrame
 Row │ ID     SEQ    EVID   STAT     CONC     EXCLFCOM      EXCLF 
     │ Int64  Int64  Int64  String?  Int64?   String?       Int64 
─────┼────────────────────────────────────────────────────────────
   1 │     1      1      1  NA       missing  missing           0
   2 │     1      2      1  NA       missing  missing           0
   3 │     1      3      1  NA       missing  missing           0
   4 │     2      1      0  missing       20  No dose subs      1
   5 │     2      2      0  missing       40  No dose subs      1
   6 │     2      3      0  missing       60  No dose subs      1
   7 │     3      1      0  missing       30  missing           0
   8 │     3      2      1  NA       missing  missing           0
   9 │     3      3      0  missing       70  missing           0

# Exclusion 3: Subjects with no conc. data
julia> set_exclusion(df, "No conc subs", excl_func = group -> all(isone, group.EVID), group = [:ID])
9×7 DataFrame
 Row │ ID     SEQ    EVID   STAT     CONC     EXCLFCOM      EXCLF 
     │ Int64  Int64  Int64  String?  Int64?   String?       Int64 
─────┼────────────────────────────────────────────────────────────
   1 │     1      1      1  NA       missing  No conc subs      1
   2 │     1      2      1  NA       missing  No conc subs      1
   3 │     1      3      1  NA       missing  No conc subs      1
   4 │     2      1      0  missing       20  missing           0
   5 │     2      2      0  missing       40  missing           0
   6 │     2      3      0  missing       60  missing           0
   7 │     3      1      0  missing       30  missing           0
   8 │     3      2      1  NA       missing  missing           0
   9 │     3      3      0  missing       70  missing           0

julia> set_exclusion(df, :STAT, excl_func = group -> all(ismissing, group.CONC), group = [:ID])
9×7 DataFrame
 Row │ ID     SEQ    EVID   STAT     CONC     EXCLFCOM  EXCLF 
     │ Int64  Int64  Int64  String?  Int64?   String?   Int64 
─────┼────────────────────────────────────────────────────────
   1 │     1      1      1  NA       missing  NA            1
   2 │     1      2      1  NA       missing  NA            1
   3 │     1      3      1  NA       missing  NA            1
   4 │     2      1      0  missing       20  missing       0
   5 │     2      2      0  missing       40  missing       0
   6 │     2      3      0  missing       60  missing       0
   7 │     3      1      0  missing       30  missing       0
   8 │     3      2      1  NA       missing  missing       0
   9 │     3      3      0  missing       70  missing       0

julia> set_exclusion(df, :STAT, excl_func = group -> all(ismissing, group.CONC), group = [:ID, :SEQ])
9×7 DataFrame
 Row │ ID     SEQ    EVID   STAT     CONC     EXCLFCOM  EXCLF 
     │ Int64  Int64  Int64  String?  Int64?   String?   Int64 
─────┼────────────────────────────────────────────────────────
   1 │     1      1      1  NA       missing  NA            1
   2 │     1      2      1  NA       missing  NA            1
   3 │     1      3      1  NA       missing  NA            1
   4 │     2      1      0  missing       20  missing       0
   5 │     2      2      0  missing       40  missing       0
   6 │     2      3      0  missing       60  missing       0
   7 │     3      1      0  missing       30  missing       0
   8 │     3      2      1  NA       missing  NA            1
   9 │     3      3      0  missing       70  missing       0

ADaM.subject_trt_count — Method

subject_trt_count(df::DataFrame)

Displays the information about the subject treatment count. Outputs a DataFrame with DRUG, DRUGCD (DRUG Code) and count
Valid datsets are : PC, EX, DM, ADPC, ADEX, ADSL

Example

julia> ex = PharmaDatasets.dataset("SDTM/CDISCPILOT01/ex")
591×17 DataFrame
 Row │ STUDYID       DOMAIN   USUBJID      EXSEQ    EXTRT       EXDOSE   EXDOSU   EXDOSFRM  EXDOSFRQ  EXROUTE      VISITNUM  V ⋯
     │ String15      String3  String15     Float64  String15    Float64  String3  String7   String3   String15     Float64   S ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ CDISCPILOT01  EX       01-701-1015      1.0  PLACEBO         0.0  mg       PATCH     QD        TRANSDERMAL       3.0  B ⋯
   2 │ CDISCPILOT01  EX       01-701-1015      2.0  PLACEBO         0.0  mg       PATCH     QD        TRANSDERMAL       4.0  W
   3 │ CDISCPILOT01  EX       01-701-1015      3.0  PLACEBO         0.0  mg       PATCH     QD        TRANSDERMAL      12.0  W
   4 │ CDISCPILOT01  EX       01-701-1023      1.0  PLACEBO         0.0  mg       PATCH     QD        TRANSDERMAL       3.0  B
   5 │ CDISCPILOT01  EX       01-701-1023      2.0  PLACEBO         0.0  mg       PATCH     QD        TRANSDERMAL       4.0  W ⋯
   6 │ CDISCPILOT01  EX       01-701-1028      1.0  XANOMELINE     54.0  mg       PATCH     QD        TRANSDERMAL       3.0  B
   7 │ CDISCPILOT01  EX       01-701-1028      2.0  XANOMELINE     81.0  mg       PATCH     QD        TRANSDERMAL       4.0  W
   8 │ CDISCPILOT01  EX       01-701-1028      3.0  XANOMELINE     54.0  mg       PATCH     QD        TRANSDERMAL      12.0  W
  ⋮  │      ⋮           ⋮          ⋮          ⋮         ⋮          ⋮        ⋮        ⋮         ⋮           ⋮          ⋮        ⋱
 585 │ CDISCPILOT01  EX       01-718-1355      1.0  PLACEBO         0.0  mg       PATCH     QD        TRANSDERMAL       3.0  B ⋯
 586 │ CDISCPILOT01  EX       01-718-1355      2.0  PLACEBO         0.0  mg       PATCH     QD        TRANSDERMAL       4.0  W
 587 │ CDISCPILOT01  EX       01-718-1355      3.0  PLACEBO         0.0  mg       PATCH     QD        TRANSDERMAL      12.0  W
 588 │ CDISCPILOT01  EX       01-718-1371      1.0  XANOMELINE     54.0  mg       PATCH     QD        TRANSDERMAL       3.0  B
 589 │ CDISCPILOT01  EX       01-718-1371      2.0  XANOMELINE     81.0  mg       PATCH     QD        TRANSDERMAL       4.0  W ⋯
 590 │ CDISCPILOT01  EX       01-718-1427      1.0  XANOMELINE     54.0  mg       PATCH     QD        TRANSDERMAL       3.0  B
 591 │ CDISCPILOT01  EX       01-718-1427      2.0  XANOMELINE     81.0  mg       PATCH     QD        TRANSDERMAL       4.0  W
                                                                                                  6 columns and 576 rows omitted

julia> subject_trt_count(ex)
2×3 DataFrame
 Row │ DRUG        DRUGCD  count 
     │ String      String  Int64 
─────┼───────────────────────────
   1 │ PLACEBO     NA         86
   2 │ XANOMELINE  NA        168

ADaM.validate_column_structure — Method

validate_column_structure(df::DataFrame; ignore, silence_errors)

Validate a DataFrame's column structure according to ADaM Data conventions(https://sastricks.com/cdisc/ADaMIG_v1.3.pdf#page=14).

Returns a DataFrame with all validation checks. Each row represents a validation check with columns:

check: Description of the validation check
tag: Check tag identifier (Symbol)
column: The column name (if applicable)
status: Check result (pass, fail, or missing for ignored checks)
level: Severity level: error for failures, ignored for ignored checks
rows: Affected row numbers (if applicable)
details: Additional details about the finding

Arguments

df: The DataFrame to validate
ignore: Check tags to ignore. Can be provided as Symbols or Strings. Available tags (validation checks):
- name_start - Column name must start with a letter
- name_format - Column name can only contain letters, underscores, and numerals
- name_length - Column name length exceeds 8 characters
- char_length - Character column value length exceeds 200 characters
- label_required - Column labels must be present
- label_length - Column label length exceeds 40 characters
silence_errors: If true, errors are not thrown even if validation fails; instead, the findings DataFrame is returned with all errors and passes. Default is false, which throws an error if any validation fails.

Returns

A DataFrame with all validation checks and their pass/fail status. Errors will be thrown on failed checks.

Examples

# Basic validation
julia> df = DataFrame(ID = [1, 2], USUBJID = ["S001", "S002"], DV = [10.5, 20.3])
2×3 DataFrame
 Row │ ID     USUBJID  DV      
     │ Int64  String   Float64 
─────┼─────────────────────────
   1 │     1  S001        10.5
   2 │     2  S002        20.3

julia> validate_column_structure(df)
ERROR: Column structure validation failed with 3 error(s):
1. "ID": Column label is missing (label_required)
2. "USUBJID": Column label is missing (label_required)
3. "DV": Column label is missing (label_required)

julia> validate_column_structure(df, silence_errors=true)
7×7 DataFrame
 Row │ check                              tag             column   status   level    rows     details                       
     │ String                             Symbol?         String?  String?  String?  Array…?  String?                       
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Column label is missing            label_required  ID       fail     error    missing  All columns must have a label
   2 │ Column label is missing            label_required  USUBJID  fail     error    missing  All columns must have a label
   3 │ Column label is missing            label_required  DV       fail     error    missing  All columns must have a label
   4 │ All column names start with a le…  name_start      missing  pass     missing  missing  Checked: 3/3 column(s)
   5 │ All column names are valid format  name_format     missing  pass     missing  missing  Checked: 3/3 column(s)
   6 │ All column name lengths are valid  name_length     missing  pass     missing  missing  Checked: 3/3 column(s)
   7 │ All character column lengths are…  char_length     missing  pass     missing  missing  Checked: 1/3 column(s)

julia> validate_column_structure(df, ignore = [:label_required])
5×7 DataFrame
 Row │ check                              tag             column   status   level    rows     details                
     │ String                             Symbol?         String?  String?  String?  Array…?  String?                
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Column labels must be present      label_required  missing  missing  ignored  missing  Check ignored by user
   2 │ All column names start with a le…  name_start      missing  pass     missing  missing  Checked: 3/3 column(s)
   3 │ All column names are valid format  name_format     missing  pass     missing  missing  Checked: 3/3 column(s)
   4 │ All column name lengths are valid  name_length     missing  pass     missing  missing  Checked: 3/3 column(s)
   5 │ All character column lengths are…  char_length     missing  pass     missing  missing  Checked: 1/3 column(s)

ADaM Docstrings

`ADaM` Docstrings