--- title: "From rbmi Analysis to Regulatory Tables" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 number_sections: true vignette: > %\VignetteIndexEntry{From rbmi Analysis to Regulatory Tables} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) ``` # Introduction Clinical trial analyses that use reference-based multiple imputation typically involve several steps: fitting an imputation model, generating imputed datasets, applying an analysis to each dataset, and pooling results using Rubin's rules. The [rbmi](https://cran.r-project.org/package=rbmi) package implements this pipeline with its `draws()`, `impute()`, `analyse()`, and `pool()` functions. However, turning the pooled results into publication-ready outputs -- formatted tables and forest plots suitable for regulatory submissions -- requires additional work. The **rbmiUtils** package bridges this gap. It provides utilities that sit on top of rbmi, transforming pooled analysis objects into tidy data frames, regulatory-style efficacy tables, and three-panel forest plots. It also includes data preparation helpers that catch common issues before the computationally expensive imputation step. In this vignette, you will walk through the complete pipeline from raw clinical trial data to two publication-ready outputs: an efficacy summary table and a forest plot. By the end, you will have seen every major step in a typical rbmi + rbmiUtils workflow. For a more focused reference on the analysis functions, see the [analyse2 vignette](analyse2.html). For detailed guidance on data preparation, see the [data preparation vignette](data-preparation.html). # Setup and Data We begin by loading the packages we need. The core pipeline requires **rbmi** and **rbmiUtils**, with **dplyr** for data manipulation. The reporting outputs use **gt** (for tables) and **ggplot2** + **patchwork** (for plots), which are optional dependencies. ```{r libraries, message = FALSE, warning = FALSE} library(rbmi) library(rbmiUtils) library(dplyr) ``` The `ADEFF` dataset bundled with rbmiUtils is a simulated ADaM-style efficacy dataset from a two-arm clinical trial. It contains continuous change-from-baseline outcomes (`CHG`) measured at two visits (Week 24 and Week 48), with realistic missing data patterns across 500 subjects. ```{r load-data} data("ADEFF", package = "rbmiUtils") ``` ```{r inspect-data} str(ADEFF) ``` Before entering the rbmi pipeline, we convert the key grouping columns to factors. Factor levels control the ordering of treatment arms and visits throughout the analysis, so it is important to set them explicitly. ```{r factor-prep} ADEFF <- ADEFF |> mutate( TRT = factor(TRT01P, levels = c("Placebo", "Drug A")), USUBJID = factor(USUBJID), AVISIT = factor(AVISIT, levels = c("Week 24", "Week 48")) ) ``` # Data Preparation Before running the imputation model, it is worth investing a moment to validate the data and understand the missing data patterns. These checks can save considerable time by catching issues before the computationally intensive `draws()` step. ## Validation The `validate_data()` function performs a comprehensive set of pre-flight checks: it verifies that all required columns exist, that factors are properly typed, that the outcome is numeric, that covariates have no missing values, and that there are no duplicate subject-visit rows. ```{r define-vars} vars <- set_vars( subjid = "USUBJID", visit = "AVISIT", group = "TRT", outcome = "CHG", covariates = c("BASE", "STRATA", "REGION") ) ``` ```{r validate} validate_data(ADEFF, vars) ``` A successful validation returns `TRUE` silently. If issues are found, all problems are reported together in a single error message so you can fix them in one pass. ## Missingness Summary The `summarise_missingness()` function characterises the missing data patterns in your dataset, classifying each subject as complete, monotone (dropout), or intermittent. ```{r missingness} miss <- summarise_missingness(ADEFF, vars) print(miss$summary) ``` This summary helps you decide on an appropriate imputation strategy. For datasets that require intercurrent event (ICE) handling, rbmiUtils also provides `prepare_data_ice()` to build the `data_ice` data frame from flag columns -- see the [data preparation vignette](data-preparation.html) for details. # rbmi Analysis Pipeline With the data validated, we now run the core rbmi pipeline. This consists of four steps: specifying the imputation method, fitting the imputation model, generating imputed datasets, and analysing each one. ## Specify the Imputation Method The `method_bayes()` function configures Bayesian multiple imputation using MCMC sampling. Here we use a small number of samples and a short warmup period to keep the vignette build time manageable. In a real analysis, you would typically use more samples (e.g., `n_samples = 500` or more) for better precision. For details on the statistical methodology, see the [rbmi quickstart vignette](https://CRAN.R-project.org/package=rbmi/vignettes/quickstart.html). ```{r method} set.seed(1974) method <- method_bayes( n_samples = 100, control = control_bayes(warmup = 200, thin = 2) ) ``` ## Fit the Imputation Model The `draws()` function fits the Bayesian imputation model. This is the most computationally intensive step in the pipeline. We pass the dataset with only the columns needed for the model. ```{r draws, message = FALSE, warning = FALSE} dat <- ADEFF |> select(USUBJID, STRATA, REGION, TRT, BASE, CHG, AVISIT) draws_obj <- draws(data = dat, vars = vars, method = method) ``` ## Generate Imputed Datasets The `impute()` function generates complete datasets under the specified reference-based assumption. Here we use a jump-to-reference approach where both arms are imputed under the reference (Placebo) distribution. ```{r impute} impute_obj <- impute( draws_obj, references = c("Placebo" = "Placebo", "Drug A" = "Placebo") ) ``` ## Analyse Each Imputed Dataset Rather than calling `rbmi::analyse()` directly, rbmiUtils provides `analyse_mi_data()`, which wraps the analyse step to work with the stacked imputed data format. It applies an analysis function -- here, the built-in `ancova` function -- to each imputed dataset and stores the results for pooling. First, we extract the imputed data into a stacked data frame using `get_imputed_data()`: ```{r get-imputed} ADMI <- get_imputed_data(impute_obj) ``` Then we analyse: ```{r analyse} ana_obj <- analyse_mi_data( data = ADMI, vars = vars, method = method, fun = ancova ) ``` ## Pool Results Finally, `pool()` combines the per-imputation results using [Rubin's rules](https://cran.r-project.org/package=rbmi) to produce a single set of estimates, standard errors, confidence intervals, and p-values. ```{r pool} pool_obj <- pool(ana_obj) print(pool_obj) ``` # Tidying Results The pool object contains all the information we need, but its structure is not immediately convenient for reporting. The `tidy_pool_obj()` function converts it into a tidy tibble with clearly labelled columns. ```{r tidy} tidy_df <- tidy_pool_obj(pool_obj) print(tidy_df) ``` Each row represents one parameter at one visit. The key columns are: - **parameter**: the raw parameter name from the pool object - **parameter_type**: `"trt"` for treatment differences, `"lsm"` for least squares means - **lsm_type**: `"ref"` or `"alt"` for LS mean rows - **visit**: the visit name - **est**, **se**, **lci**, **uci**, **pval**: the numeric results This tidy format is the foundation for all downstream reporting. You can filter, reshape, or format it as needed for your specific tables. # Efficacy Table The `efficacy_table()` function takes the pool object and produces a regulatory-style gt table in the format commonly seen in ICH/CDISC Table 14.2.x submissions. It displays LS means by arm, treatment differences, confidence intervals, and p-values, organised by visit. ```{r efficacy-table-default, eval = requireNamespace("gt", quietly = TRUE)} tbl <- efficacy_table(pool_obj) tbl ``` You can customise the table with descriptive titles and treatment arm labels that match your study protocol: ```{r efficacy-table-custom, eval = requireNamespace("gt", quietly = TRUE)} tbl_custom <- efficacy_table( pool_obj, title = "Table 14.2.1: ANCOVA of Change from Baseline", subtitle = "Reference-Based Multiple Imputation (Jump to Reference)", arm_labels = c(ref = "Placebo", alt = "Drug A") ) tbl_custom ``` The returned object is a standard gt table, so you can apply any gt customisation on top. For example, you could add footnotes, adjust column widths, or change the table styling using `gt::tab_options()`. # Forest Plot The `plot_forest()` function creates a three-panel forest plot: a text panel with visit labels and formatted estimates, a graphical panel with point estimates and confidence interval whiskers, and a p-value panel. ## Treatment Difference Mode The default `display = "trt"` mode shows treatment differences across visits, with a vertical reference line at zero. Filled circles indicate visits where the confidence interval excludes zero (statistically significant); open circles indicate non-significant results. ```{r forest-trt, fig.width = 10, fig.height = 3, eval = requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("patchwork", quietly = TRUE)} p <- plot_forest( pool_obj, title = "Treatment Effect: Change from Baseline (Drug A vs Placebo)" ) p ``` ## LS Mean Display Mode The `display = "lsm"` mode shows the LS mean estimates for each treatment arm, colour-coded using the Okabe-Ito colourblind-friendly palette. ```{r forest-lsm, fig.width = 10, fig.height = 4, eval = requireNamespace("ggplot2", quietly = TRUE) && requireNamespace("patchwork", quietly = TRUE)} p_lsm <- plot_forest( pool_obj, display = "lsm", arm_labels = c(ref = "Placebo", alt = "Drug A"), title = "LS Mean Estimates by Visit" ) p_lsm ``` Both plot modes return a patchwork object that you can further customise using the `& theme()` operator. For example, to increase the text size across all panels: ``` plot_forest(pool_obj) & ggplot2::theme(text = ggplot2::element_text(size = 14)) ``` # Binary/Responder Analysis For binary responder endpoints using g-computation, see the [deriving endpoints vignette](deriving-endpoints.html), which demonstrates how to define responder thresholds and analyse them using the same reporting functions (`efficacy_table()`, `plot_forest()`).