--- title: "Clinical ADaM Derivations with sasif" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Clinical ADaM Derivations with sasif} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction Clinical programmers working in R often face a common challenge when migrating from SAS: in SAS, a single `IF ... THEN DO` block can assign **multiple variables** at once under one condition. In R, traditional approaches like `case_when()` or `fifelse()` force you to **repeat the same condition for every variable** — increasing QC risk and reducing readability. `sasif` solves this by bringing SAS-style `IF / ELSE IF / ELSE` control flow into R's `data.table` ecosystem. One condition governs all assignments in a block — just like SAS. This vignette walks through three real-world ADaM derivation scenarios: 1. **ADSL** — Population flags and treatment variables 2. **ADLB** — Laboratory value categorisation 3. **ADAE** — Treatment-emergent adverse event flags --- ## Setup ```{r setup} library(sasif) library(data.table) ``` --- ## Scenario 1 — ADSL: Population Flags ### The Problem In a typical ADSL derivation, when a subject is in the treatment arm, multiple variables need to be assigned simultaneously — population flags, treatment labels, numeric codes, and treatment dates. In traditional R, every variable requires its own repeated condition: ```{r adsl_problem, eval=FALSE} # ❌ Traditional R — condition repeated for every variable adsl <- adsl %>% mutate( SAFFL = case_when(ACTARMCD == "TRTA" ~ "Y"), SAFFLN = case_when(ACTARMCD == "TRTA" ~ 1), TRT01A = case_when(ACTARMCD == "TRTA" ~ ACTARMCD), TRT01AN = case_when(ACTARMCD == "TRTA" ~ 1), ITTFL = case_when(ACTARMCD == "TRTA" ~ "Y"), FASFL = case_when(ACTARMCD == "TRTA" ~ "Y"), RANDFL = case_when(ACTARMCD == "TRTA" ~ "Y"), PPFL = case_when(ACTARMCD == "TRTA" ~ "Y") # Same condition written 8 times — high QC risk ) ``` If the condition ever changes, you must update it in 8 places. Miss one and your derivation silently diverges — a real risk in regulated environments. ### The sasif Solution ```{r adsl_solution} # Create sample ADSL data adsl <- data.table( USUBJID = c("S01", "S02", "S03", "S04"), ACTARMCD = c("TRTA", "TRTA", "SCRNFAIL", "TRTA"), RFSTDTC = c("2024-01-10", "2024-01-15", NA, "2024-01-20"), RFENDTC = c("2024-06-10", "2024-06-15", NA, "2024-06-20") ) # ✅ sasif — condition written ONCE, governs all assignments ADSL <- data_step(adsl, if_do(ACTARMCD == "TRTA", SAFFL = "Y", SAFFLN = 1, TRT01A = "Treatment A", TRT01AN = 1, TRTSDT = as.Date(RFSTDTC, "%Y-%m-%d"), TRTEDT = as.Date(RFENDTC, "%Y-%m-%d"), ITTFL = "Y", FASFL = "Y", RANDFL = "Y", PPFL = "Y" ) ) print(ADSL[, .(USUBJID, ACTARMCD, SAFFL, TRT01A, TRT01AN, ITTFL, FASFL)]) ``` All 10 variables are derived from a single condition block. Clean, readable, and audit-friendly — exactly like SAS `IF ... THEN DO`. --- ## Scenario 2 — ADSL: Multi-Arm Treatment Assignment (IF / ELSE IF / ELSE) When a study has multiple treatment arms, use the full IF / ELSE IF / ELSE chain. The first matching condition wins — all others are skipped: ```{r adsl_multiarm} adsl2 <- data.table( USUBJID = c("S01", "S02", "S03", "S04", "S05"), ACTARMCD = c("TRTA", "TRTB", "TRTC", "TRTA", "TRTB"), AGE = c(35, 52, 67, 44, 58) ) ADSL2 <- data_step(adsl2, if_do(ACTARMCD == "TRTA", TRT01A = "Treatment A", TRT01AN = 1 ), else_if_do(ACTARMCD == "TRTB", TRT01A = "Treatment B", TRT01AN = 2 ), else_do( TRT01A = "Placebo", TRT01AN = 99 ) ) print(ADSL2[, .(USUBJID, ACTARMCD, TRT01A, TRT01AN)]) ``` Notice that both `TRT01A` (character label) and `TRT01AN` (numeric code) are derived together under each condition — no repetition needed. --- ## Scenario 3 — ADSL: Age Categorisation Derive both the age category label and its numeric code in one chain: ```{r adsl_agecat} adsl3 <- data.table( USUBJID = c("S01", "S02", "S03", "S04", "S05"), AGE = c(32, 45, 58, 71, 80) ) ADSL3 <- data_step(adsl3, if_do(AGE <= 45, AGECAT = "YOUNG", AGECATN = 1 ), else_if_do(AGE <= 70, AGECAT = "MIDDLE", AGECATN = 2 ), else_do( AGECAT = "OLD", AGECATN = 3 ) ) print(ADSL3[, .(USUBJID, AGE, AGECAT, AGECATN)]) ``` --- ## Scenario 4 — ADLB: Laboratory Value Categorisation A common ADaM derivation — categorise lab values as LOW, NORMAL, or HIGH based on reference ranges, and derive both the character and numeric category together: ```{r adlb_example} adlb <- data.table( USUBJID = c("S01", "S01", "S02", "S02", "S03"), LBTESTCD = c("ALB", "ALB", "ALB", "ALB", "ALB"), AVAL = c(2.8, 4.2, 5.6, 3.5, 1.9), ANRLO = c(3.5, 3.5, 3.5, 3.5, 3.5), ANRHI = c(5.0, 5.0, 5.0, 5.0, 5.0) ) ADLB <- data_step(adlb, if_do(LBTESTCD == "ALB" & AVAL < ANRLO, ALBCAT = "LOW", ALBCATN = 1 ), else_if_do(LBTESTCD == "ALB" & AVAL > ANRHI, ALBCAT = "HIGH", ALBCATN = 2 ), else_do( ALBCAT = "NORMAL", ALBCATN = 3 ) ) print(ADLB[, .(USUBJID, LBTESTCD, AVAL, ANRLO, ANRHI, ALBCAT, ALBCATN)]) ``` Both `ALBCAT` and `ALBCATN` are always consistent — they are derived from the same condition, so they can never diverge. --- ## Scenario 5 — ADAE: Treatment-Emergent Flag (TRTEMFL) Flag adverse events that started on or after the treatment start date: ```{r adae_example} adae <- data.table( USUBJID = c("S01", "S01", "S02", "S02", "S03"), AEDECOD = c("Headache", "Nausea", "Fatigue", "Dizziness", "Rash"), ASTDT = as.Date(c("2024-01-15", "2023-12-01", "2024-01-20", "2024-02-10", "2024-01-25")), TRTSDT = as.Date(c("2024-01-10", "2024-01-10", "2024-01-15", "2024-01-15", "2024-01-20")), TRTEDT = as.Date(c("2024-06-10", "2024-06-10", "2024-06-15", "2024-06-15", "2024-06-20")) ) ADAE <- data_step(adae, if_do(ASTDT >= TRTSDT & ASTDT <= TRTEDT, TRTEMFL = "Y", TRTEMA = AEDECOD ) ) print(ADAE[, .(USUBJID, AEDECOD, ASTDT, TRTSDT, TRTEMFL)]) ``` --- ## Scenario 6 — DELETE: Remove Unwanted Records Use `delete_if()` to remove rows explicitly — mirrors the SAS `DELETE` statement and makes the intent clear in the code: ```{r delete_example} adlb2 <- data.table( USUBJID = c("S01", "S02", "S03", "S04", "S05"), LBTESTCD = c("ALB", NA, "ALB", "ALB", NA), VISIT = c("WEEK 1", "WEEK 1", "UNSCHEDULED", "WEEK 2", "WEEK 4"), AVAL = c(4.2, 3.8, 5.1, 4.0, 3.5) ) ADLB2 <- data_step(adlb2, delete_if(is.na(LBTESTCD)), delete_if(VISIT == "UNSCHEDULED") ) print(ADLB2) ``` Only records with valid test codes and scheduled visits are retained. --- ## Scenario 7 — Independent Flags (if_independent) Use `if_independent()` when conditions are **not** mutually exclusive — each condition is evaluated on its own, so multiple flags can apply to the same row simultaneously: ```{r if_independent_example} adsl4 <- data.table( USUBJID = c("S01", "S02", "S03", "S04"), AGE = c(30, 68, 45, 72), WEIGHTKG = c(48, 72, 55, 43), DIABFL = c("N", "Y", "N", "Y") ) ADSL4 <- data_step(adsl4, if_independent(AGE > 65, SENIORFL = "Y"), if_independent(WEIGHTKG < 50, LOWWTFL = "Y"), if_independent(DIABFL == "Y", COMORBFL = "Y") ) print(ADSL4) ``` Subject S04 (age 72, weight 43, diabetic) receives all three flags — because all three conditions are TRUE for that row simultaneously. --- ## Key Principle: When to Use Which Function | Situation | Use | |-----------|-----| | First matching condition should win | `if_do()` + `else_if_do()` + `else_do()` | | Multiple conditions can apply to same row | `if_independent()` | | Remove rows from dataset | `delete_if()` | > **Important:** Do not mix `if_do()` chains with `if_independent()` on the > same variable. `if_independent()` runs **after** the chain and will overwrite > earlier assignments. Use one approach consistently per variable. --- ## Summary `sasif` brings three key benefits to clinical R programming: - **One condition, multiple assignments** — no repeated logic, no QC risk of conditions diverging - **Familiar SAS syntax** — `IF / ELSE IF / ELSE` control flow that clinical programmers already know - **data.table performance** — fully vectorized, no row loops, scales to millions of rows For more information, see the [package documentation](https://chandrt23-lang.github.io/sasif/).