--- title: "State-space modeling in tidyILD with KFAS" author: "Alex Litovchenko" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{State-space modeling in tidyILD with KFAS} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4 ) has_kfas <- requireNamespace("KFAS", quietly = TRUE) ``` ## What is a state-space model? In a **state-space** (or **dynamic linear**) model, you observe a sequence \(y_1,\ldots,y_T\) and posit **latent** states \(\alpha_t\) that evolve over time and drive the observations. A minimal Gaussian **local level** model is: - **State:** \(\alpha_t = \alpha_{t-1} + \eta_t\) with \(\eta_t \sim \mathcal{N}(0, Q)\) (random walk). - **Observation:** \(y_t = \alpha_t + \varepsilon_t\) with \(\varepsilon_t \sim \mathcal{N}(0, H)\). So the "level" of the process drifts slowly; the data are noisy measurements of that level. In **tidyILD**, `ild_kfas(..., state_spec = "local_level")` fits this structure (via **KFAS**) for a **single** time series per call—one distinct `.ild_id` after `ild_prepare()`. ## When use this instead of mixed-model residual correlation? **Multilevel models** (`ild_lme()`, `ild_brms()`) are the right tool when you want **population** inference: fixed and random effects across many persons, within/between decomposition, and **residual** dynamics (e.g. AR1 or CAR1 on the **within-person** residuals) as a **nuisance** correlation structure. **State-space** models in this package focus on **explicit latent dynamics** for **one** series at a time: estimating a **time-varying level** (or, in future specs, trend or AR) in **state space**, with diagnostics built on **one-step-ahead innovations** from the Kalman filter. Conceptual contrast: - **AR1 on residuals (nlme / lme):** correlation among **errors** around a smooth mean structure. - **Local level (KFAS):** the **mean** itself is a **random walk** and is **smoothed**; the "residual" is often summarized as **standardized prediction errors** (innovations). Neither replaces the other in general—choose based on whether your primary goal is **hierarchical population inference** or **structured univariate latent dynamics** for one person (or one series). ## Filtered vs smoothed states In **ILD** terms: - **Filtered** (“online” / **nowcast**): your best estimate of the latent level **at occasion *t* using only measurements up through *t***—what the model would have said about the state **at that moment** as data arrived. Useful when you think about **sequential** self-report or real-time summaries. - **Smoothed** (“offline” / **full-history**): your best estimate of the level **at each occasion using the entire series**—what you report after seeing **all** waves, including revising earlier time points. This is usually what you want for **scientific summaries** of a completed diary or EMA study. Formally, after fitting, **KFAS** runs [`KFS()`](https://CRAN.R-project.org/package=KFAS) to obtain: - **Filtered** state: \(E(\alpha_t \mid y_1,\ldots,y_t)\). In KFAS output this is often `att`. - **Smoothed** state: \(E(\alpha_t \mid y_1,\ldots,y_T)\). In KFAS output this is often `alphahat`. In **tidyILD**, `ild_kfas(..., smoother = TRUE)` requests smoothing in `KFS()`; when `FALSE`, smoothed states may be unavailable or less central. Use `ild_plot_filtered_vs_smoothed()` to compare the first latent state over time. ## Minimal example If the **KFAS** package is installed, you can run: ```{r example, eval = has_kfas} library(tidyILD) set.seed(1) d <- ild_simulate(n_id = 1, n_obs_per = 60, seed = 42) x <- ild_prepare(d, id = "id", time = "time") x <- ild_center(x, y) fit <- suppressWarnings( ild_kfas(x, outcome = "y", state_spec = "local_level", time_units = "sim_steps") ) b <- ild_diagnose(fit) class(b) ild_autoplot(b, section = "residual", type = "acf") ``` If **KFAS** is not installed, install it with `install.packages("KFAS")` and load **tidyILD**; the same code then runs end-to-end. ## What the backend does not yet do Read this section before relying on **KFAS** in a paper or preregistration. The normative scope document `inst/dev/KFAS_V1_BACKEND.md` in the package source has full detail; the points below are the **trust** boundaries for **v1**. **What `ild_kfas()` is:** - **Discrete-time state-space** modeling: the latent state advances **one step per observation** (row order after `ild_prepare()` for that series). This is standard dynamic linear modeling on an **index**, not a continuous-time differential equation. **What it is not (today):** - **Not** **ctsem**-style (or similar) **continuous-time** latent dynamics with unequal physical intervals baked into the transition model. Those workflows are a **later tier**; this backend does not replace them for that use case. - **Not** a **multilevel latent-state model**: there is **no** pooled latent trajectory across persons in v1. Fitting one series per call is the supported semantics; **pooling mode** across IDs is **limited** (see the backend doc). Stacking independent per-person fits is explicit via **`fit_context`** and guardrails—not hierarchical partial pooling of a shared state. - **`local_level` only** in v1; other `state_spec` labels (`local_trend`, `ar1_state`, `regression_local_level`, …) are **reserved** for later releases. - **Optional short-horizon forecasts** and richer **uncertainty** quantification are **planned**; see `?ild_plot_forecast` and `NEWS.md`. **Irregular timing:** see `vignette("kfas-irregular-timing-spacing", package = "tidyILD")`—tidyILD **diagnoses** spacing; the KFAS wrapper fits under **discrete-time** choices and does not, by itself, “solve” irregular measurement in a continuous-time sense. ## See also - `vignette("kfas-irregular-timing-spacing", package = "tidyILD")` — irregular measurement and spacing diagnostics. - `vignette("kfas-choosing-backend", package = "tidyILD")` — **lme/nlme**, **brms**, and **KFAS**. - `vignette("ild-decomposition-and-spacing", package = "tidyILD")` — within/between and **spacing** tools.