--- title: "STC and Naive Benchmarks" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{STC and Naive Benchmarks} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE, purl = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE, message = FALSE, warning = FALSE ) ``` ## Overview In addition to ML-UMR, mlumr provides two frequentist methods: - **STC** (Simulated Treatment Comparison): adjusts for cross-trial covariate differences using parametric G-computation - **Naive**: unadjusted comparison of crude outcome summaries These serve as important benchmarks alongside the Bayesian ML-UMR approach. ## Naive estimate For binary outcomes, the naive method compares crude event rates without any covariate adjustment: $$ \text{LOR}_{\text{naive}} = \text{logit}(\hat{p}_A) - \text{logit}(\hat{p}_B) $$ where $\hat{p}_A = \bar{Y}_{\text{IPD}}$ and $\hat{p}_B = r_B / n_B$ from the AgD. The standard error is computed via the delta method: $$ \text{SE} = \sqrt{\frac{1}{n_A \hat{p}_A(1-\hat{p}_A)} + \frac{1}{n_B \hat{p}_B(1-\hat{p}_B)}} $$ ### Usage The chunks below operate on a small toy `dat`. Run this setup once first so `naive()`, `stc()`, and the comparison block all have data to work with. ```{r, eval = TRUE, purl = FALSE} library(mlumr) set.seed(2026) trial_a_data <- data.frame( trt = "Drug_A", response = rbinom(300, 1, 0.55), age_cat = rbinom(300, 1, 0.40), sex = rbinom(300, 1, 0.55) ) trial_b_data <- data.frame( trt = "Drug_B", n_total = 400, n_events = 160, age_cat_mean = 0.35, sex_mean = 0.50 ) ipd <- set_ipd(trial_a_data, treatment = "trt", outcome = "response", covariates = c("age_cat", "sex")) agd <- set_agd(trial_b_data, treatment = "trt", outcome_n = "n_total", outcome_r = "n_events", cov_means = c("age_cat_mean", "sex_mean"), cov_types = c("binary", "binary")) ``` ```{r, eval = TRUE, purl = FALSE} # Prepare data (no integration points needed) dat <- combine_data(ipd, agd) result <- naive(dat) print(result) ``` ### Interpreting the naive estimate The naive LOR is **biased** when covariate distributions differ between the IPD and AgD populations. It provides a useful reference point: - If the naive and adjusted estimates agree, covariate imbalance has little impact. - If they disagree substantially, adjustment is important and the naive estimate should not be used for inference. ### Extreme rates The function includes continuity-correction-style boundary guards for extreme proportions (0% or 100% event rates), ensuring finite log odds ratios even at the boundaries. Event-probability confidence intervals use Wald standard errors and are bounded to `[0, 1]`. ## Simulated Treatment Comparison (STC) STC uses parametric G-computation to adjust for covariate differences: 1. **Fit an outcome model** on IPD: $\text{logit}(P(Y_i = 1)) = \alpha + \mathbf{x}_i^T \boldsymbol{\beta}$ 2. **Predict** counterfactual outcomes for the comparator population 3. **Marginalize**: $\hat{p}_{A|B} = \frac{1}{N} \sum_j \hat{P}(Y = 1 | \mathbf{x}_j)$ 4. **Compute LOR**: $\text{LOR} = \text{logit}(\hat{p}_{A|B}) - \text{logit}(\hat{p}_B)$ The standard error uses the delta method, propagating parameter uncertainty through the logit-of-mean transformation. ### Usage ```{r, eval = TRUE, purl = FALSE} # Without integration points (uses AgD means) result_stc <- stc(dat) # With integration points (better marginalization) dat <- add_integration(dat, n_int = 64, age_cat = distr(qbern, prob = age_cat_mean), sex = distr(qbern, prob = sex_mean)) result_stc <- stc(dat) print(result_stc) ``` ### With vs without integration points - **Without integration points**: STC predicts only at the AgD covariate **means** (a single point). This can introduce bias for non-linear models (Jensen's inequality). - **With integration points**: STC marginalizes over the full covariate distribution, giving a more accurate estimate. We recommend always using integration points. For binary outcomes, event-probability confidence intervals are bounded to `[0, 1]`. For Poisson outcomes, STC uses a 0.5 continuity correction for the comparator log rate when the AgD event count is zero; the reported comparator rate remains the observed rate. ### Accessing the GLM fit The underlying logistic regression model is stored in the result: ```{r, eval = TRUE, purl = FALSE} # Coefficients coef(result_stc$glm_fit) # Full GLM summary summary(result_stc$glm_fit) # Predicted probabilities fitted(result_stc$glm_fit) ``` ## Delta method variance The delta method SE for STC accounts for uncertainty in: 1. **Outcome model parameters** (via the GLM's variance-covariance matrix) 2. **AgD event rate** (binomial sampling variance) The gradient of $\text{logit}(\text{mean}(\sigma(\mathbf{X}\boldsymbol{\beta})))$ with respect to $\boldsymbol{\beta}$ is computed analytically: $$ \frac{\partial}{\partial \boldsymbol{\beta}} \text{logit}\left(\frac{1}{N}\sum_j \sigma(\mathbf{x}_j^T\boldsymbol{\beta})\right) = \frac{1}{\bar{p}(1-\bar{p})} \cdot \frac{1}{N} \sum_j \sigma'(\eta_j) \mathbf{x}_j $$ where $\sigma(\eta) = 1/(1 + e^{-\eta})$ and $\sigma'(\eta) = \sigma(\eta)(1 - \sigma(\eta))$. ## Continuous and count outcomes The same `naive()` and `stc()` functions support normal and Poisson outcomes. For normal outcomes, `naive()` compares the IPD mean against the inverse- variance weighted AgD mean, while `stc()` fits a Gaussian GLM and G-computes the index-treatment mean in the comparator population. For Poisson outcomes, both methods report log rate ratios; zero observed event counts use a 0.5 continuity correction on the log-rate scale so estimates remain finite. ```{r, eval = TRUE, purl = FALSE} # Normal-family benchmark ipd_normal <- set_ipd( data.frame( trt = "Drug_A", score = rnorm(120, mean = 3.0, sd = 1.0), age_cat = rbinom(120, 1, 0.40) ), treatment = "trt", outcome = "score", covariates = "age_cat", family = "normal" ) agd_normal <- set_agd( data.frame(trt = "Drug_B", y_mean = 2.7, se = 0.12, age_cat_mean = 0.35), treatment = "trt", family = "normal", outcome_mean = "y_mean", outcome_se = "se", cov_means = "age_cat_mean", cov_types = "binary" ) dat_normal <- combine_data(ipd_normal, agd_normal) naive(dat_normal) stc(dat_normal) ``` ```{r, eval = TRUE, purl = FALSE} # Poisson-family benchmark exposure <- runif(120, 0.5, 2.0) ipd_poisson <- set_ipd( data.frame( trt = "Drug_A", events = rpois(120, exp(0.2) * exposure), person_years = exposure, age_cat = rbinom(120, 1, 0.40) ), treatment = "trt", outcome = "events", covariates = "age_cat", family = "poisson", exposure = "person_years" ) agd_poisson <- set_agd( data.frame(trt = "Drug_B", n_events = 40, person_years = 180, age_cat_mean = 0.35), treatment = "trt", family = "poisson", outcome_r = "n_events", outcome_E = "person_years", cov_means = "age_cat_mean", cov_types = "binary" ) dat_poisson <- combine_data(ipd_poisson, agd_poisson) naive(dat_poisson) stc(dat_poisson) ``` ## Comparing all methods ```{r, eval = TRUE, purl = FALSE} # Fit all three methods naive_result <- naive(dat) stc_result <- stc(dat) mlumr_result <- mlumr( dat, model = "spfa", chains = 2, iter = 500, warmup = 250, seed = 42, refresh = 0, verbose = FALSE ) # Extract LORs for comparison le_naive <- naive_result$link_effect le_stc <- stc_result$link_effect lor_mlumr <- marginal_effects(mlumr_result, effect = "lor", population = "comparator") cat("Method comparison (LOR in comparator population):\n") cat(sprintf(" Naive: %.3f [%.3f, %.3f]\n", naive_result$link_effect, naive_result$ci_lower, naive_result$ci_upper)) cat(sprintf(" STC: %.3f [%.3f, %.3f]\n", stc_result$link_effect, stc_result$ci_lower, stc_result$ci_upper)) cat(sprintf(" ML-UMR: %.3f [%.3f, %.3f]\n", lor_mlumr$mean, lor_mlumr$q2.5, lor_mlumr$q97.5)) ``` ## Confidence level Both `naive()` and `stc()` accept a `conf_level` parameter: ```{r, eval = TRUE, purl = FALSE} # 90% confidence intervals naive_90 <- naive(dat, conf_level = 0.90) stc_90 <- stc(dat, conf_level = 0.90) print(naive_90) print(stc_90) ``` ## Key differences from ML-UMR | Feature | STC | Naive | ML-UMR | |---------|-----|-------|--------| | Covariate adjustment | Outcome model | None | Joint model | | Population weighting | G-computation | None | QMC integration | | Uncertainty | Delta method | Delta method | Posterior | | Effect modification | Not captured | N/A | Relaxed model | | Speed | Instant | Instant | Minutes | | No. parameters | p+1 | 0 | 2+p (SPFA) or 2+2p (Relaxed) | STC is faster but makes stronger modeling assumptions. ML-UMR jointly models both data sources and naturally propagates all sources of uncertainty through the posterior.