---
title: "Tabulation"
date: "2025-06-17"
output:
    rmarkdown::html_document:
        theme: "spacelab"
        highlight: "kate"
        toc: true
        toc_float: true
vignette: >
  %\VignetteIndexEntry{Tabulation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
editor_options:
  markdown:
      wrap: 72
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  collapse = TRUE,
  comment = "#>"
)
```


## `junco` Tabulation

The `junco` R package provides functions to create analyses for clinical trials in `R`.
It is considered as an add-on/alternative to the `tern` package, which is the first package available in the `NEST` framework for analysis functions with main focus for clinical trials.
The core functionality for tabulation is built on the more general purpose `rtables` package.
New users should first begin by reading the ["Introduction to `tern`"](https://insightsengineering.github.io/tern/latest-tag/articles/tern.html) and ["Introduction to `rtables`"](https://insightsengineering.github.io/rtables/latest-release/articles/rtables.html) vignettes.

The packages used in this vignette are:

```{r load_packages, message=FALSE}
library(rtables)
library(junco)
library(tern)
library(dplyr)
```

The datasets used in this vignette are:

```{r load_datasets, message=FALSE}
adsl <- ex_adsl
adae <- ex_adae
advs <- ex_advs
```

Some common data manipulation on/from these datasets, such as adding extra variables and defining some tabulation settings, like treatment variable, control group for the table.

```{r data_manipulation, message=FALSE}
trtvar <- "ARM"
ctrl_grp <- "B: Placebo"

non_ctrl_grp <- setdiff(levels(adsl[[trtvar]]), ctrl_grp)

adsl$colspan_trt <- factor(ifelse(adsl[[trtvar]] == ctrl_grp, " ", "Active Study Agent"),
  levels = c("Active Study Agent", " ")
)

adsl$rrisk_header <- "Risk Difference (%) (95% CI)"
adsl$rrisk_label <- paste(adsl[[trtvar]], paste("vs", ctrl_grp))
adsl$rrisk_header_vs <- "Difference in Mean Change (95% CI)"

# define colspan_trt_map
colspan_trt_map <- create_colspan_map(adsl,
  non_active_grp = ctrl_grp,
  non_active_grp_span_lbl = " ",
  active_grp_span_lbl = "Active Study Agent",
  colspan_var = "colspan_trt",
  trt_var = trtvar
)

# define reference group specification
ref_path <- c("colspan_trt", " ", trtvar, ctrl_grp)

adae[["TRTEMFL"]] <- "Y"

# add adsl variables to adae
adae <- adae %>% left_join(., adsl)
advs <- advs %>% left_join(., adsl)

advs[advs[["ABLFL"]] == "Y", "CHG"] <- NA
```

## `junco` Analysis Functions 

The `junco` analysis functions are used in combination with the `rtables` layout functions, `rtables::analyze` and `rtables::summarize_row_groups`, in the pipeline which creates the `rtables` table.
They apply some statistical logic to the layout of the `rtables` table.
The table layout is materialized with the `rtables::build_table` function and the data.

The `junco` analysis functions are functions that can be applied as an `afun` in either `rtables::analyze` or as a `cfun` in  `rtables::summarize_row_groups` function. This is a slightly different approach to `tern`, where the table layout is constructed using `analyze` functions, which are wrappers around `rtables::analyze`.

Just like `tern` analyze functions, the `junco` analysis functions offer various methods useful from the perspective of clinical trials and other statistical projects.

Examples of the `junco` analysis functions are `a_freq_j`, `a_freq_subcol_j`, `a_summarize_aval_chg_diff_j` or  `a_summarize_ex_j`.

A complete list of analysis functions can be found in [the junco website functions reference](../reference/index.html).

## Tabulation Examples using `a_freq_j`

We present an example usage of `a_freq_j` for the very common AE table in clinical trials.

The standard table of adverse events is a summary by system organ class and preferred term. For frequency counts by preferred term, if there are multiple occurrences of the same AE in an individual we count them only once.

With `junco` package, this table can be created with several calls to the same `a_freq_j` function in a tabulation pipeline using `rtables::analyze` and `rtables::summarize_row_groups`.

In next paragraph, we'll describe the difference versus using `tern` package, for similar AE table. Differences in how to define the layout as well as differences in the resulting table.


```{r extra_args_definition}
## extra args for a_freq_j controlling specification of
## reference group, denominator used to calculate percentages,
## and other details
extra_args_rr <- list(
  denom = "n_altdf",
  riskdiff = TRUE,
  ref_path = ref_path,
  method = "wald",
  .stats = c("count_unique_fraction")
)
```


Using `junco` analysis function `a_freq_j` we define the layout, using 2 `analyze` calls (for overall summary - TRTEMFL and preferred term - AEDECOD), and 1 `summarize_row_groups` call (for system organ class - AEBODSYS)

Note that the same specifications, `extra_args_rr` can be re-used in the first and second `analyze` call, as well as in the in-between call to `summarize_row_groups`.

For the overall summary, we add as extra specification to restrict to TRTEMFL = "Y" values and the label to show in the row.

```{r layout_definition, eval = TRUE}
lyt <- basic_table(
  top_level_section_div = " ",
  colcount_format = "N=xx"
) %>%
  ## main columns
  split_cols_by("colspan_trt", split_fun = trim_levels_to_map(map = colspan_trt_map)) %>%
  split_cols_by(trtvar, show_colcounts = TRUE) %>%
  ## risk diff columns, note nested = FALSE
  split_cols_by("rrisk_header", nested = FALSE) %>%
  split_cols_by(trtvar,
    labels_var = "rrisk_label", split_fun = remove_split_levels(ctrl_grp),
    show_colcounts = FALSE
  ) %>%
  analyze("TRTEMFL",
    afun = a_freq_j,
    extra_args = append(extra_args_rr, list(val = "Y", label = "Number of subjects with AE"))
  ) %>%
  split_rows_by("AEBODSYS",
    split_label = "System Organ Class",
    split_fun = trim_levels_in_group("AEDECOD"),
    label_pos = "topleft",
    section_div = c(" "),
    nested = FALSE
  ) %>%
  summarize_row_groups("AEBODSYS",
    cfun = a_freq_j,
    extra_args = extra_args_rr
  ) %>%
  analyze("AEDECOD",
    afun = a_freq_j,
    extra_args = extra_args_rr
  )
```


Now we just build and view the resulting table.
```{r build_table, eval = TRUE}
tbl <- build_table(lyt, adae, alt_counts_df = adsl)
head(tbl, 10)
```

### Minor differences between current table and a similar table using `tern` analyze functions.

Here is a similar table generated with the tern functions (`summarize_occurrences` and `count_occurrences`), as close as possible to our target AE table we produced before in `tbl`.
```{r tern_layout_and_table}
lyt_tern <- basic_table(
  top_level_section_div = " ",
  colcount_format = "N=xx"
) %>%
  ## main columns
  split_cols_by(trtvar,
    show_colcounts = TRUE,
    split_fun = add_riskdiff(arm_x = ctrl_grp, arm_y = non_ctrl_grp)
  ) %>%
  analyze_num_patients(
    vars = "USUBJID",
    .stats = c("unique"),
    .labels = c(
      unique = "Total number of patients with at least one adverse event"
    ),
    riskdiff = TRUE
  ) %>%
  split_rows_by("AEBODSYS",
    split_label = "System Organ Class",
    split_fun = trim_levels_in_group("AEDECOD"),
    label_pos = "topleft",
    section_div = c(" "),
    nested = FALSE
  ) %>%
  summarize_occurrences(
    var = "AEBODSYS",
    denom = "N_col",
    riskdiff = TRUE,
    .stats = c("count_fraction")
  ) %>%
  count_occurrences(
    vars = "AEDECOD",
    denom = "N_col",
    riskdiff = TRUE,
    .stats = c("count_fraction")
  )


tbl_tern <- build_table(lyt_tern, adae, alt_counts_df = adsl)
head(tbl_tern, 10)
```
The main differences in the resulting table are 

* the comparison against the reference group is reversed (ie B: Placebo vs. A: Drug X	 instead of A: Drug X vs. B: Placebo).   
* the extra column spanner for active treatment group is not present in the `tern` version, and cannot be added.  


### a_freq_j supports various methods for the risk difference column.
One of the statistical options to control with the usage of `a_freq_j` is the method for the risk difference calculation for the risk difference columns. All methods available in ´tern::estimate_proportion_diff´ can be supported. Eg, switching to `waldcc`, or others can be done.
There is no option to switch methods using tern `count_occurrences` layouts, only `wald` method is available.

```{r method_switching, eval = TRUE} 
extra_args_rr[["method"]] <- "waldcc"

tbl2 <- basic_table(
  top_level_section_div = " ",
  colcount_format = "N=xx"
) %>%
  ## main columns
  split_cols_by("colspan_trt", split_fun = trim_levels_to_map(map = colspan_trt_map)) %>%
  split_cols_by(trtvar, show_colcounts = TRUE) %>%
  ## risk diff columns, note nested = FALSE
  split_cols_by("rrisk_header", nested = FALSE) %>%
  split_cols_by(trtvar,
    labels_var = "rrisk_label", split_fun = remove_split_levels(ctrl_grp),
    show_colcounts = FALSE
  ) %>%
  split_rows_by("AEBODSYS",
    split_label = "System Organ Class",
    split_fun = trim_levels_in_group("AEDECOD"),
    label_pos = "topleft",
    section_div = c(" "),
    nested = FALSE
  ) %>%
  summarize_row_groups("AEBODSYS",
    cfun = a_freq_j,
    extra_args = extra_args_rr
  ) %>%
  analyze("AEDECOD",
    afun = a_freq_j,
    extra_args = extra_args_rr
  ) %>%
  build_table(adae, alt_counts_df = adsl)

head(tbl2, 10)
```

### Creation of Subgroup tables with `a_freq_j`
The junco function `a_freq_j` also supports the creation of subgroup tables, which is demonstrated in below table. 

```{r subgroup_tables, eval = TRUE} 
extra_args_rr_common <- list(
  denom = "n_altdf",
  denom_by = "SEX"
)

extra_args_rr <- append(
  extra_args_rr_common,
  list(
    riskdiff = FALSE,
    extrablankline = TRUE,
    .stats = c("n_altdf"),
    label_fstr = "Gender: %s"
  )
)

extra_args_rr2 <- append(
  extra_args_rr_common,
  list(
    riskdiff = TRUE,
    ref_path = ref_path,
    method = "wald",
    .stats = c("count_unique_denom_fraction"),
    na_str = rep("NA", 3)
  )
)

tbl <- basic_table(
  top_level_section_div = " ",
  colcount_format = "N=xx"
) %>%
  ## main columns
  split_cols_by("colspan_trt", split_fun = trim_levels_to_map(map = colspan_trt_map)) %>%
  split_cols_by(trtvar, show_colcounts = TRUE) %>%
  ## risk diff columns, note nested = FALSE
  split_cols_by("rrisk_header", nested = FALSE) %>%
  split_cols_by(trtvar,
    labels_var = "rrisk_label", split_fun = remove_split_levels(ctrl_grp),
    show_colcounts = FALSE
  ) %>%
  split_rows_by("SEX", split_fun = drop_split_levels) %>%
  summarize_row_groups("SEX",
    cfun = a_freq_j,
    extra_args = extra_args_rr
  ) %>%
  split_rows_by("TRTEMFL",
    split_fun = keep_split_levels("Y"),
    indent_mod = -1L,
    section_div = c(" ")
  ) %>%
  summarize_row_groups("TRTEMFL",
    cfun = a_freq_j,
    extra_args = append(extra_args_rr2, list(label = "Subjects with >=1 AE", extrablankline = TRUE))
  ) %>%
  split_rows_by("AEBODSYS",
    split_label = "System Organ Class",
    split_fun = trim_levels_in_group("AEDECOD"),
    label_pos = "topleft",
    section_div = c(" "),
    nested = TRUE
  ) %>%
  summarize_row_groups("AEBODSYS",
    cfun = a_freq_j,
    extra_args = extra_args_rr2
  ) %>%
  analyze("AEDECOD",
    afun = a_freq_j,
    extra_args = extra_args_rr2
  ) %>%
  build_table(adae, alt_counts_df = adsl)

head(tbl, 30)
```


This table cannot be generated using the tern `count_occurrences` methods.

## Other junco features : Extra Statistics have been added to some tern statistical functions 
```{r extra_statistics, eval = TRUE} 
tern::get_stats("summarize_ancova")
tern::get_stats("analyze_vars_numeric")
```

In junco, we have added extra stats for

* ancova: se for lsmean, combined lsmean-CI (3d stat), combined lsmean_diff-CI (3d stat)
* analyze_vars_numeric: combined version of mean-CI, median-CI, geom_mean-CI (available in tern >= 0.9.6)
* similar extra stats for Kaplan-Meier/survival and other methods

## Tabulation Examples using `a_summarize_aval_chg_diff_j`

```{r summarize_aval_chg_diff, eval = TRUE} 
multivars <- c("AVAL", "AVAL", "CHG")

extra_args_3col <- list(
  format_na_str = rep(NA, 3),
  ref_path = ref_path,
  ancova = FALSE,
  comp_btw_group = TRUE,
  multivars = multivars
)


lyt_vs_p1 <- basic_table(
  show_colcounts = FALSE,
  colcount_format = "N=xx"
) %>%
  ### first columns
  split_cols_by("colspan_trt",
    split_fun = trim_levels_to_map(map = colspan_trt_map),
    show_colcounts = FALSE
  ) %>%
  split_cols_by(trtvar,
    show_colcounts = TRUE, colcount_format = "N=xx"
  ) %>%
  ## set up a 3 column split
  split_cols_by_multivar(multivars,
    varlabels = c("n/N (%)", "Mean (95% CI)", "Mean Change From Baseline (95% CI)")
  ) %>%
  split_rows_by("PARAM",
    label_pos = "topleft",
    split_label = "Parameter",
    section_div = " ",
    split_fun = drop_split_levels
  ) %>%
  ## note the child_labels = hidden for AVISIT, these labels will be taken care off by
  ## applying function summarize_aval_chg_diff further in the layout
  split_rows_by("AVISIT",
    label_pos = "topleft",
    split_label = "Study Visit",
    split_fun = drop_split_levels,
    child_labels = "hidden"
  )


lyt_vs <- lyt_vs_p1 %>%
  ### restart for the rrisk_header columns - note the nested = FALSE option
  ### also note the child_labels = "hidden" in both PARAM and AVISIT
  split_cols_by("rrisk_header_vs", nested = FALSE) %>%
  split_cols_by(trtvar,
    split_fun = remove_split_levels(ctrl_grp),
    labels_var = "rrisk_label",
    show_colcounts = TRUE,
    colcount_format = "N=xx"
  ) %>%
  ### difference columns : just 1 column & analysis needs to be done on change
  split_cols_by_multivar(multivars[3],
    varlabels = c(" ")
  ) %>%
  ### the variable passed here in analyze is not used (STUDYID), it is a dummy var passing,
  ### the function a_summarize_aval_chg_diff_j grabs the required vars from cols_by_multivar calls
  analyze("STUDYID",
    afun = a_summarize_aval_chg_diff_j,
    extra_args = extra_args_3col
  )

result_vs <- build_table(lyt_vs, advs, alt_counts_df = adsl)
```

This is the resulting table.
```{r display_result_vs, eval = TRUE} 
head(result_vs, 15)
```

The columns for the comparison between reference group is optional. The same table without these extra columns can be produced by specifying the argument `comp_btw_group` = `FALSE` and leaving out the 3 `split_cols_by` calls

* split_cols_by("rrisk_header_vs") 
* split_cols_by(trtvar) 
* split_cols_by_multivar(multivars[3]) 


```{r alternative_table, eval = TRUE}
multivars <- c("AVAL", "AVAL", "CHG")

extra_args_3col <- list(
  format_na_str = rep(NA, 3),
  ancova = FALSE,
  comp_btw_group = FALSE,
  multivars = multivars
)

lyt_vs2 <- lyt_vs_p1 %>%
  ### the variable passed here in analyze is not used (STUDYID), it is a dummy var passing,
  ### the function a_summarize_aval_chg_diff_j grabs the required vars from cols_by_multivar calls
  analyze("STUDYID",
    afun = a_summarize_aval_chg_diff_j,
    extra_args = extra_args_3col
  )

result_vs2 <- build_table(lyt_vs2, advs, alt_counts_df = adsl)
```

This is the resulting table without the difference between treatment group columns.
```{r display_alternative_table, eval = TRUE}
head(result_vs2, 15)
```