---
title: "MI Diagnostics and Pipeline Inspection"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{MI Diagnostics and Pipeline Inspection}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

The rbmi multiple imputation pipeline produces several intermediate objects --
draws, imputation, analysis, and pool -- each containing useful metadata about
the imputation model and results. **rbmiUtils** v0.3.0 adds tools to inspect
these objects and extract diagnostic statistics, making it easier to verify
that the MI pipeline behaved as expected.

This vignette covers three features:

- `describe_draws()`: structured metadata from draws objects (method, formula,
  samples, MCMC convergence)
- `describe_imputation()`: structured metadata from imputation objects (method,
  M, references, missingness breakdown)
- `pool_to_ard()` MI diagnostic enrichment: fraction of missing information
  (FMI), lambda, relative increase in variance (RIV), and other Rubin's rules
  diagnostics embedded in the ARD output

# Setup

We load the required packages and prepare data for the `pool_to_ard()`
diagnostic enrichment examples in Section 5. This setup uses `analyse_mi_data()`
which works directly with the pre-imputed `ADMI` dataset (containing an `IMPID`
column), so there is no need for `draws()` or `impute()`.

```{r setup-pipeline, message = FALSE, warning = FALSE}
library(rbmiUtils)
library(rbmi)
library(dplyr)

data("ADMI", package = "rbmiUtils")
ADMI <- ADMI |>
  mutate(
    TRT = factor(TRT, levels = c("Placebo", "Drug A")),
    USUBJID = factor(USUBJID),
    AVISIT = factor(AVISIT)
  )

vars <- set_vars(
  subjid = "USUBJID", visit = "AVISIT", group = "TRT",
  outcome = "CHG", covariates = c("BASE", "STRATA", "REGION")
)
method <- method_bayes(
  n_samples = 100,
  control = control_bayes(warmup = 200, thin = 5)
)

ana_obj <- analyse_mi_data(ADMI, vars, method, fun = ancova)
pool_obj <- pool(ana_obj)
```

# Inspecting Draws with `describe_draws()`

The `describe_draws()` function extracts structured metadata from an rbmi
`draws` object, providing a quick summary of the imputation model configuration
and, for Bayesian methods, MCMC convergence diagnostics.

The code below uses `ADEFF` data and defines its own `vars` and `method`
objects. We use `eval = FALSE` because `draws()` runs MCMC sampling, which
is too slow for vignette builds.

```{r describe-draws-code, eval = FALSE}
data("ADEFF", package = "rbmiUtils")
ADEFF <- ADEFF |>
  mutate(
    TRT = factor(TRT01P, levels = c("Placebo", "Drug A")),
    USUBJID = factor(USUBJID),
    AVISIT = factor(AVISIT, levels = c("Week 24", "Week 48"))
  )

vars <- set_vars(
  subjid = "USUBJID", visit = "AVISIT", group = "TRT",
  outcome = "CHG", covariates = c("BASE", "STRATA", "REGION")
)
method <- method_bayes(
  n_samples = 100,
  control = control_bayes(warmup = 200, thin = 2)
)

dat <- ADEFF |> select(USUBJID, STRATA, REGION, TRT, BASE, CHG, AVISIT)
draws_obj <- draws(data = dat, vars = vars, method = method)
desc <- describe_draws(draws_obj)
print(desc)
```

Example output from `describe_draws()`:

```
-- Draws Summary --
Method: Bayesian (MCMC via Stan)
Formula: CHG ~ 1 + BASE + STRATA + REGION + TRT + AVISIT + TRT:AVISIT
Samples: 100
Failures: 0
Covariance: us
Same covariance across groups: Yes
--
-- MCMC Convergence --
v All Rhat < 1.1 (42 parameters)
Max Rhat: 1.003
Min ESS: 245.2
```

The returned object is a list with programmatic access to all fields:

- `$method` -- human-readable method name (e.g., "Bayesian (MCMC via Stan)")
- `$method_class` -- raw class: `"bayes"`, `"approxbayes"`, or `"condmean"`
- `$formula` -- the deparsed model formula string
- `$n_samples` -- total number of samples drawn
- `$n_failures` -- number of failed samples
- `$mcmc` -- (Bayesian only) list with `rhat`, `ess`, `max_rhat`, `min_ess`,
  `n_params`, `converged`

# Inspecting Imputations with `describe_imputation()`

The `describe_imputation()` function extracts metadata from an rbmi
`imputation` object, including the method, number of imputations (M),
reference arm mappings, and a missingness breakdown by visit and treatment arm.

This section continues from the `draws_obj` created in the code above
(Section 3). Again, we use `eval = FALSE` because the pipeline requires MCMC.

```{r describe-imputation-code, eval = FALSE}
impute_obj <- impute(
  draws_obj,
  references = c("Placebo" = "Placebo", "Drug A" = "Placebo")
)
desc <- describe_imputation(impute_obj)
print(desc)
```

Example output from `describe_imputation()`:

```
-- Imputation Summary --
Method: Bayesian (MCMC via Stan)
Imputations (M): 100
Subjects: 200
--
-- References --
Placebo -> Placebo
Drug A -> Placebo
-- Missingness by Visit and Arm --
    visit   group n_total n_miss pct_miss
  Week 24 Placebo     100      8      8.0
  Week 24  Drug A     100     10     10.0
  Week 48 Placebo     100     15     15.0
  Week 48  Drug A     100     18     18.0
```

The returned object provides programmatic access to:

- `$method` -- human-readable method name
- `$n_imputations` -- number of imputations (M)
- `$n_subjects` -- total number of unique subjects
- `$references` -- named character vector of reference arm mappings (or `NULL`)
- `$missingness` -- a `data.frame` with columns `visit`, `group`, `n_total`,
  `n_miss`, `pct_miss`

# MI Diagnostic Statistics in ARD

The `pool_to_ard()` function converts a pool object to the pharmaverse Analysis
Results Dataset (ARD) format. When you also pass the `analysis_obj`, it enriches
the ARD with MI diagnostic statistics computed from Rubin's rules.

```{r ard-base-vs-enriched, eval = requireNamespace("cards", quietly = TRUE)}
# Base ARD (no diagnostics)
ard <- pool_to_ard(pool_obj)

# Enriched ARD with MI diagnostics
ard_enriched <- pool_to_ard(pool_obj, analysis_obj = ana_obj)
```

The enriched ARD includes additional rows for each parameter with diagnostic
statistics. We can filter and display them:

```{r ard-diagnostics, eval = requireNamespace("cards", quietly = TRUE)}
ard_enriched |>
  dplyr::filter(stat_name %in% c("fmi", "lambda", "riv", "df.adjusted", "re")) |>
  dplyr::select(group1_level, variable_level, stat_name, stat)
```

Each diagnostic statistic has a specific interpretation:

- **FMI** (fraction of missing information) -- the adjusted proportion of total
  sampling variance attributable to missing data, following the mice convention:
  `(riv + 2/(df + 3)) / (1 + riv)`
- **lambda** -- the proportion of total variance due to between-imputation
  variance (missingness)
- **RIV** (relative increase in variance) -- the ratio of between-imputation
  variance to within-imputation variance, scaled by `(1 + 1/M)`
- **df.adjusted** -- Barnard-Rubin adjusted degrees of freedom, which accounts
  for finite complete-data degrees of freedom
- **re** (relative efficiency) -- `1 / (1 + fmi/M)`, the efficiency of the MI
  estimator relative to an estimator with infinite imputations

# When Diagnostics Are Not Available

Non-Rubin pooling methods (e.g., conditional mean with jackknife) do not produce
MI diagnostic statistics because the variance decomposition does not apply.
When `pool_to_ard()` is called with an `analysis_obj` from a non-Rubin method,
it emits an informative message and omits diagnostic rows from the ARD.

The `describe_draws()` and `describe_imputation()` functions work with all
method types (Bayesian, approximate Bayesian, and conditional mean).

# Learn More

- [From rbmi Analysis to Regulatory Tables](pipeline.html) -- the full
  end-to-end pipeline vignette
- [`pool_to_ard()`](../reference/pool_to_ard.html) -- function documentation
  with ARD format details
- [`describe_draws()`](../reference/describe_draws.html) and
  [`describe_imputation()`](../reference/describe_imputation.html) -- function
  documentation with full field descriptions