---
title: "The idionomics Pipeline: keeping individual level information to find functionally relevant principles"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{The idionomics Pipeline: keeping individual level information to find functionally relevant principles}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
message = TRUE,
warning = FALSE
)
```
## The idionomic science principle
Classical longitudinal methods (e.g., multilevel models) estimate one set of parameters shared — or partially shared — across all units of an ensemble. If the focus of interest is the trajectory of individuals, this is only sensible under hard-to-meet assumptions, such as exchangeability and/or ergodicity. If these assumptions are not met, ensemble averages may systematically obscure individual differences: an average positive effect may coexist with a significant subset of individuals for whom the effect is negative, nonsignificant, or zero.
Idionomic science inverts the order of operations:
1. **Unit first.** Fit a model to each person's time series independently, capturing that person's unique dynamics, residual structure, and effect sizes.
2. **Group later.** Aggregate the individual estimates with meta-analytic methods that explicitly represent and quantify heterogeneity across people.
The recommended pipeline is:
```
i_screener() → pmstandardize() → i_detrender() → iarimax()/looping_machine() → i_pval() / sden_test()
```
---
## Synthetic data
This vignette walks through each step with a synthetic panel of 12 subjects, each contributing 50 time points. The data includes four variables (`a`, `b`, `c`, `y`) with known relationships, plus three manually created subjects that demonstrate specific pipeline behaviors.
```{r load}
library(idionomics)
set.seed(42)
panel <- do.call(rbind, lapply(1:9, function(id) {
a <- rnorm(50)
b <- 0.4 * a + rnorm(50)
c <- 0.4 * b + rnorm(50)
data.frame(id = as.character(id), time = seq_len(50),
a = a, b = b, c = c,
y = 0.5 * a + rnorm(50),
stringsAsFactors = FALSE)
}))
# Subject 10: near-constant "a" — will be caught by i_screener(min_sd = 0.5)
s10 <- data.frame(id = "10", time = seq_len(50),
a = rep(3, 50), b = rnorm(50), c = rnorm(50),
y = rnorm(50), stringsAsFactors = FALSE)
# Subject 11: full positive loop (a -> b -> c -> a)
a11 <- rnorm(50)
b11 <- 0.6 * a11 + rnorm(50, sd = 0.5)
c11 <- 0.6 * b11 + rnorm(50, sd = 0.5)
s11 <- data.frame(id = "11", time = seq_len(50),
a = a11 + 0.4 * c11, b = b11, c = c11,
y = 0.5 * a11 + rnorm(50), stringsAsFactors = FALSE)
# Subject 12: negative a -> y effect
a12 <- rnorm(50)
s12 <- data.frame(id = "12", time = seq_len(50),
a = a12, b = 0.4 * a12 + rnorm(50),
c = rnorm(50), y = -0.5 * a12 + rnorm(50),
stringsAsFactors = FALSE)
panel <- rbind(panel, s10, s11, s12)
```
The data-generating process:
- Subjects 1--9: `a` is independent noise, `b = 0.4*a + noise`, `c = 0.4*b + noise`, `y = 0.5*a + noise`. The chain `a -> b -> c` provides signal for `looping_machine()`, and `a -> y` provides signal for `iarimax()`. The `c -> a` leg has no true signal — the loop should not close for most subjects.
- Subject 10: `a` is constant (`rep(3, 50)`), so it has zero within-person variance. `i_screener()` will catch this.
- Subject 11: all three loop legs have true positive signal, including `c -> a`. This subject should show a positive directed loop.
- Subject 12: the `a -> y` effect is negative (`-0.5`), creating heterogeneity in the meta-analysis.
---
## Step 1: Data quality screening — `i_screener()`
Before standardizing, screen subjects on their raw data. After `pmstandardize()`, all non-constant series have within-person variance = 1 by construction, making variance-based filters ineffective. `i_screener()` catches low-quality subjects at the right stage.
```{r screener}
panel_clean <- i_screener(panel, cols = c("a", "b", "c", "y"), id_var = "id",
min_n_subject = 20, min_sd = 0.5, verbose = TRUE)
```
Subject 10 is removed because `a` has SD = 0, which falls below `min_sd = 0.5`.
Three criteria are available (all optional except `min_n_subject`):
| Criterion | What it catches |
|---|---|
| `min_n_subject` | Too few observations |
| `min_sd` | Near-constant series (floor/ceiling responders) |
| `max_mode_pct` | "Stuck" responders (e.g. >= 80% of responses identical) |
Use `mode = "report"` to inspect quality metrics without committing to exclusion:
```{r screener-report}
report <- i_screener(panel, cols = c("a", "b", "c", "y"), id_var = "id",
min_n_subject = 20, min_sd = 0.5, max_mode_pct = 0.80,
mode = "report", verbose = TRUE)
print(report)
```
---
## Step 2: Within-person standardization — `pmstandardize()`
Person-mean standardization computes `(x - person_mean) / person_sd` for each subject and column. This removes between-person differences in level and scale, making coefficients comparable across subjects. Output columns are named `
_psd`.
```{r pmstandardize}
panel_std <- pmstandardize(panel_clean, cols = c("a", "b", "c", "y"), id_var = "id",
verbose = TRUE)
head(panel_std[, c("id", "time", "a_psd", "b_psd", "c_psd", "y_psd")])
```
Edge cases are handled automatically: all-NA series remain NA; single-observation series return 0; constant series (zero SD) return 0 and are filtered downstream by `iarimax()`'s `minvar` guard.
---
## Step 3: Linear detrending — `i_detrender()`
`i_detrender()` removes the within-person linear time trend via `lm(col ~ time)` and appends each column with the residuals (`_dt`). This reduces the integration order `auto.arima()` selects, fostering comparability between models.
Filtering is per-column and independent: a subject can produce valid detrended residuals for one variable while receiving `NA` in another if that column fails the variance or observation-count thresholds.
```{r detrender}
panel_dt <- i_detrender(panel_std, cols = c("a_psd", "b_psd", "c_psd", "y_psd"),
id_var = "id", timevar = "time", verbose = TRUE)
head(panel_dt[, c("id", "time", "a_psd_dt", "b_psd_dt", "c_psd_dt", "y_psd_dt")])
```
> **Pipeline order matters.** Detrending after standardization is preferred. Reversing the order (`i_detrender()` then `pmstandardize()`) can amplify near-zero residuals to unit variance, bypassing `iarimax()`'s `minvar` filter.
---
## Step 4a: Per-subject ARIMAX and meta-analysis — `iarimax()`
`iarimax()` fits one `forecast::auto.arima()` model per subject, extracts coefficients via `broom::tidy()`, and pools the focal predictor's coefficients with `metafor::rma()` (REML).
```{r iarimax}
result <- iarimax(panel_dt,
y_series = "y_psd_dt",
x_series = "a_psd_dt",
id_var = "id",
timevar = "time",
verbose = TRUE)
```
By default, `auto.arima()` selects the differencing order `d` independently per subject. When `d` varies across subjects, some coefficients describe effects on levels (`d = 0`) while others describe effects on changes (`d = 1`), making them less comparable in the meta-analysis. Use `fixed_d` to impose a common differencing order:
```{r iarimax-fixed-d, eval = FALSE}
# Fix d = 0 for all subjects (coefficients describe effects on levels)
result_d0 <- iarimax(panel_dt, y_series = "y_psd_dt", x_series = "a_psd_dt",
id_var = "id", timevar = "time", fixed_d = 0)
```
### Summary
```{r iarimax-summary}
summary(result)
```
### Caterpillar plot
```{r iarimax-plot, fig.width = 6, fig.height = 5}
plot(result,
y_series_name = "Outcome (y)",
x_series_name = "Predictor (a)")
```
Each bar is a per-subject confidence interval, colored green (significantly positive), red (significantly negative), or black (crosses zero). The blue band is the 95% CI of the REMA pooled effect. Subject 12 (negative `a -> y` effect) should appear on the negative side.
### Per-subject results
The full individual-level output is in `$results_df`:
```{r results-df}
head(result$results_df[, c("id", "nAR", "nI", "nMA",
"estimate_a_psd_dt", "std.error_a_psd_dt",
"n_valid", "n_params", "raw_cor")])
```
---
## Step 4b: Directed loop detection — `looping_machine()`
`looping_machine()` tests whether three variables form a directed positive loop (a -> b -> c -> a). It fits three I-ARIMAX legs, applies `i_pval()` to each, and computes `Loop_positive_directed`: a 0/1 indicator that is 1 only when all three focal betas are positive *and* significant at `alpha`.
```{r looping-machine}
loop_result <- looping_machine(panel_dt,
a_series = "a_psd_dt",
b_series = "b_psd_dt",
c_series = "c_psd_dt",
id_var = "id",
timevar = "time",
verbose = TRUE)
```
```{r looping-inspect}
# Proportion of subjects with a detected positive directed loop
mean(loop_result$loop_df$Loop_positive_directed, na.rm = TRUE)
# Per-leg I-ARIMAX results are also accessible
summary(loop_result$iarimax_a_to_b)
```
Because the positive directed loop is a conjunction of three one-sided tests, it is very conservative: under the global null and independence, the false positive rate is approximately $(\alpha/2)^3$.
Optional arguments include `covariates` (shared extra predictors for all three legs), `include_third_as_covariate` (adds the third loop variable as a covariate in each leg), and `fixed_d` (fixes the differencing order across all legs for comparability).
---
## Step 5: Per-subject p-values — `i_pval()`
`i_pval()` attaches a `pval_` column to `results_df` using the two-tailed t-distribution with ML-based degrees of freedom (`n_valid - n_params`).
```{r i-pval}
result_pval <- i_pval(result)
result_pval$results_df[, c("id", "estimate_a_psd_dt", "pval_a_psd_dt")]
```
---
## Step 6: Sign Divergence / Equisyncratic Null test — `sden_test()`
`sden_test()` runs a binomial test on the count of individually significant effects. Two variants are available and selected automatically based on the REMA pooled effect:
- **ENT** (Equisyncratic Null Test): tests whether any-direction significant effects exceed the rate expected by chance (`alpha_arimax`). Used when the pooled effect is non-significant.
- **SDT** (Sign Divergence Test): tests whether effects in the counter-pooled direction exceed chance (`alpha_arimax / 2`). Used when the pooled effect is significant.
The auto-selection pivot is fixed at p = 0.05, independent of `alpha_arimax`.
```{r sden}
sden <- sden_test(result_pval)
summary(sden)
```
You can also force a specific test:
```{r sden-force}
sden_ent <- sden_test(result, test = "ENT")
summary(sden_ent)
```