--- title: "OnlineSurr: Fitting marginal/conditional models and computing LPTE/CPTE" output: rmarkdown::html_vignette: toc: true toc_depth: 3 bibliography: '`r system.file("REFERENCES.bib", package="OnlineSurr")`' csl: '`r system.file("apalike.csl", package="OnlineSurr")`' vignette: > %\VignetteIndexEntry{OnlineSurr: Fitting marginal/conditional models and computing LPTE/CPTE} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE ) ``` This vignette demonstrates the main workflow of the `OnlineSurr` package: 1. Prepare a longitudinal dataset with equally-spaced measurement times. 2. Fit the marginal and conditional models with `fit.surr()`. 3. Summarize results with `summary()`, visualize with `plot()`. 4. Test time-homogeneity with `time_homo_test()`. The package returns a fitted object of class `fitted_onlinesurr` that stores point estimates and bootstrap draws for treatment-effect trajectories and PTE-based summaries. # Data requirements and conventions `fit.surr()` expects data in **long format** with one row per subject-time measurement. Key requirements enforced by the code: - `id` identifies subjects; there must be **at most one observation per subject-time** combination. - `treat` indicates treatment assignment; it is coerced to a factor and is intended to represent **two treatment levels**. - `time` must be **numeric and equally spaced** across observed time points. If `time` is omitted, the function creates a within-subject index `Time` assuming the data are already ordered and equally spaced. - The surrogate design must not make treatment a linear combination of surrogate terms; otherwise the conditional model is not identifiable. # Package functions used in this vignette - `fit.surr()` fits: - a *marginal* model producing total treatment effects $\Delta(t)$ - a *conditional* model (given surrogate) producing residual treatment effects $\Delta_R(t)$ - stores bootstrap draws for the corresponding fixed-effect parameters. - `plot.fitted_onlinesurr()` plots: - Local PTE: $\text{LPTE}(t) = 1 - \Delta_R(t)/\Delta(t)$ - Cumulative PTE: $\text{CPTE}(t) = 1 - \sum_{h\le t} \Delta_R(h) / \sum_{h\le t} \Delta(h)$ - Treatment effects $\Delta(t)$ and $\Delta_R(t)$ - `time_homo_test()` tests the hypothesis that the PTE is constant over time (implemented via a max-type statistic and Monte Carlo approximation of the null). ## Fitting the models with `fit.surr()` `fit.surr()` requires: - `formula`: outcome mean model. The function will internally add treatment-by-time fixed effects. - `id`: subject identifier (unquoted). - `treat`: treatment variable (unquoted). - `surrogate`: surrogate structure (as a formula or a string). - `time`: numeric time variable (unquoted). ```{r eval=TRUE} library(OnlineSurr) head(sim_onlinesurr) fit <- fit.surr( formula = y ~ 1, # baseline fixed effects; trt*time terms added internally id = id, surrogate = ~s, # surrogate structure treat = trt, data = sim_onlinesurr, time = time, N.boots = 2000, # bootstrap draws stored in the fitted object verbose = 0 # hide progress ) ``` The formulas for the fixed effects and the surrogate structures accept any temporal structure available in the `kDGLM` package (see its vignette for details). Functions that transform the data are also supported. In particular, we provide the `lagged` function, which computes lagged values of its arguments and can be included in a model formula to account for delayed or lingering effects of a predictor over time. We also provide the `s` function, which generates a spline basis for a numeric variable and can be used to model smooth, potentially non-linear effects without having to specify the basis expansion manually. ```{r eval=FALSE} library(OnlineSurr) fit <- fit.surr( formula = y ~ 1, # baseline fixed effects; trt*time terms added internally id = id, surrogate = ~ s(s) + s(lagged(s, 1)) + s(lagged(s, 2)), # surrogate structure treat = trt, data = sim_onlinesurr, time = time, verbose = 0 # hide progress ) ``` ### What `fit.surr()` stores The returned object has class `fitted_onlinesurr` and is a list with (at least): - `fit$T`: number of time points - `fit$N`: number of subjects - `fit$n.fixed`: number of fixed-effect coefficients per subject design (reference size) - `fit$Marginal$point`: point estimates (vector) from the marginal model - `fit$Marginal$smp`: bootstrap draws (matrix) from the marginal model - `fit$Conditional$point`: point estimates (vector) from the conditional model - `fit$Conditional$smp`: bootstrap draws (matrix) from the conditional model The first `T` (in practice, the first `n.fixed`) elements used by plotting/testing methods correspond to the time-indexed treatment-effect parameters. # Summaries and inference ## Printing a summary The package provides an S3 summary method `summary.fitted_onlinesurr()`. - `t` selects the time index. - `cumulative=TRUE` reports cumulative effects up to time `t` (when implemented by the method). - `cumulative=FALSE` reports time-specific quantities at time `t` only. ```{r eval=TRUE} summary(fit, t = 6, cumulative = TRUE) ``` ## Plotting LPTE, CPTE, and treatment effects `plot()` dispatches to `plot.fitted_onlinesurr()`. ```{r eval=TRUE} plot(fit, type = "LPTE") # Local PTE over time plot(fit, type = "CPTE") # Cumulative PTE over time plot(fit, type = "Delta") # Delta and Delta_R over time ``` Interpretation notes: - LPTE measures, at each time, the proportion of the total treatment effect explained by the surrogate, using the ratio $1 - \Delta_R(t)/\Delta(t)$. - CPTE aggregates effects up to time $t$, using cumulative sums. ## Testing time-homogeneity `time_homo_test()` provides a max-type test, using a Monte Carlo approximation of the null distribution. ```{r eval=TRUE} test <- time_homo_test(fit, signif.level = 0.05, N.boots = 50000) test ``` Returned components: - `T`: observed test statistic - `T.crit`: critical value at the requested significance level - `p.value`: Monte Carlo p-value # Practical tips and common pitfalls 1. **Time index must be numeric and equally spaced.** If you have missing measurements, include the missing time points with `NA` outcomes rather than dropping those times, so spacing remains consistent. 2. **One row per subject-time.** If you have duplicates, aggregate first (e.g., average within a time window) or decide which measurement to keep. 3. **Bootstrap size tradeoff.** `fit.surr(N.boots=...)` controls stored bootstrap draws used for confidence intervals associated with the treatment effect, LPTE and CPTE; `time_homo_test(N.boots=...)` controls Monte Carlo draws for the null distribution of the time homogeneity test. See @santos2026causalframeworkevaluatingjointly for details about the theoretical aspects of the package. # Session info ```{r eval=TRUE} sessionInfo() ```