---
title: "OnlineSurr: Fitting marginal/conditional models and computing LPTE/CPTE"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 3
bibliography: '`r system.file("REFERENCES.bib", package="OnlineSurr")`'
csl: '`r system.file("apalike.csl", package="OnlineSurr")`'
vignette: >
%\VignetteIndexEntry{OnlineSurr: Fitting marginal/conditional models and computing LPTE/CPTE}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
message = FALSE,
warning = FALSE
)
```
This vignette demonstrates the main workflow of the `OnlineSurr` package:
1. Prepare a longitudinal dataset with equally-spaced measurement times.
2. Fit the marginal and conditional models with `fit.surr()`.
3. Summarize results with `summary()`, visualize with `plot()`.
4. Test time-homogeneity with `time_homo_test()`.
The package returns a fitted object of class `fitted_onlinesurr` that stores point estimates and bootstrap draws for treatment-effect trajectories and PTE-based summaries.
# Data requirements and conventions
`fit.surr()` expects data in **long format** with one row per subject-time measurement. Key requirements enforced by the code:
- `id` identifies subjects; there must be **at most one observation per subject-time** combination.
- `treat` indicates treatment assignment; it is coerced to a factor and is intended to represent **two treatment levels**.
- `time` must be **numeric and equally spaced** across observed time points. If `time` is omitted, the function creates a within-subject index `Time` assuming the data are already ordered and equally spaced.
- The surrogate design must not make treatment a linear combination of surrogate terms; otherwise the conditional model is not identifiable.
# Package functions used in this vignette
- `fit.surr()` fits:
- a *marginal* model producing total treatment effects $\Delta(t)$
- a *conditional* model (given surrogate) producing residual treatment effects $\Delta_R(t)$
- stores bootstrap draws for the corresponding fixed-effect parameters.
- `plot.fitted_onlinesurr()` plots:
- Local PTE: $\text{LPTE}(t) = 1 - \Delta_R(t)/\Delta(t)$
- Cumulative PTE: $\text{CPTE}(t) = 1 - \sum_{h\le t} \Delta_R(h) / \sum_{h\le t} \Delta(h)$
- Treatment effects $\Delta(t)$ and $\Delta_R(t)$
- `time_homo_test()` tests the hypothesis that the PTE is constant over time (implemented via a max-type statistic and Monte Carlo approximation of the null).
## Fitting the models with `fit.surr()`
`fit.surr()` requires:
- `formula`: outcome mean model. The function will internally add treatment-by-time fixed effects.
- `id`: subject identifier (unquoted).
- `treat`: treatment variable (unquoted).
- `surrogate`: surrogate structure (as a formula or a string).
- `time`: numeric time variable (unquoted).
```{r eval=TRUE}
library(OnlineSurr)
head(sim_onlinesurr)
fit <- fit.surr(
formula = y ~ 1, # baseline fixed effects; trt*time terms added internally
id = id,
surrogate = ~s, # surrogate structure
treat = trt,
data = sim_onlinesurr,
time = time,
N.boots = 2000, # bootstrap draws stored in the fitted object
verbose = 0 # hide progress
)
```
The formulas for the fixed effects and the surrogate structures accept any temporal structure available in the `kDGLM` package (see its vignette for details). Functions that transform the data are also supported.
In particular, we provide the `lagged` function, which computes lagged values of its arguments and can be included in a model formula to account for delayed or lingering effects of a predictor over time. We also provide the `s` function, which generates a spline basis for a numeric variable and can be used to model smooth, potentially non-linear effects without having to specify the basis expansion manually.
```{r eval=FALSE}
library(OnlineSurr)
fit <- fit.surr(
formula = y ~ 1, # baseline fixed effects; trt*time terms added internally
id = id,
surrogate = ~ s(s) + s(lagged(s, 1)) + s(lagged(s, 2)), # surrogate structure
treat = trt,
data = sim_onlinesurr,
time = time,
verbose = 0 # hide progress
)
```
### What `fit.surr()` stores
The returned object has class `fitted_onlinesurr` and is a list with (at least):
- `fit$T`: number of time points
- `fit$N`: number of subjects
- `fit$n.fixed`: number of fixed-effect coefficients per subject design (reference size)
- `fit$Marginal$point`: point estimates (vector) from the marginal model
- `fit$Marginal$smp`: bootstrap draws (matrix) from the marginal model
- `fit$Conditional$point`: point estimates (vector) from the conditional model
- `fit$Conditional$smp`: bootstrap draws (matrix) from the conditional model
The first `T` (in practice, the first `n.fixed`) elements used by plotting/testing methods correspond to the time-indexed treatment-effect parameters.
# Summaries and inference
## Printing a summary
The package provides an S3 summary method `summary.fitted_onlinesurr()`.
- `t` selects the time index.
- `cumulative=TRUE` reports cumulative effects up to time `t` (when implemented by the method).
- `cumulative=FALSE` reports time-specific quantities at time `t` only.
```{r eval=TRUE}
summary(fit, t = 6, cumulative = TRUE)
```
## Plotting LPTE, CPTE, and treatment effects
`plot()` dispatches to `plot.fitted_onlinesurr()`.
```{r eval=TRUE}
plot(fit, type = "LPTE") # Local PTE over time
plot(fit, type = "CPTE") # Cumulative PTE over time
plot(fit, type = "Delta") # Delta and Delta_R over time
```
Interpretation notes:
- LPTE measures, at each time, the proportion of the total treatment effect explained by the surrogate, using the ratio $1 - \Delta_R(t)/\Delta(t)$.
- CPTE aggregates effects up to time $t$, using cumulative sums.
## Testing time-homogeneity
`time_homo_test()` provides a max-type test, using a Monte Carlo approximation of the null distribution.
```{r eval=TRUE}
test <- time_homo_test(fit, signif.level = 0.05, N.boots = 50000)
test
```
Returned components:
- `T`: observed test statistic
- `T.crit`: critical value at the requested significance level
- `p.value`: Monte Carlo p-value
# Practical tips and common pitfalls
1. **Time index must be numeric and equally spaced.**
If you have missing measurements, include the missing time points with `NA` outcomes rather than dropping those times, so spacing remains consistent.
2. **One row per subject-time.**
If you have duplicates, aggregate first (e.g., average within a time window) or decide which measurement to keep.
3. **Bootstrap size tradeoff.**
`fit.surr(N.boots=...)` controls stored bootstrap draws used for confidence intervals associated with the treatment effect, LPTE and CPTE; `time_homo_test(N.boots=...)` controls Monte Carlo draws for the null distribution of the time homogeneity test.
See @santos2026causalframeworkevaluatingjointly for details about the theoretical aspects of the package.
# Session info
```{r eval=TRUE}
sessionInfo()
```