---
title: "Missingness in ILD: diagnostics and sensitivity routes"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Missingness in ILD: diagnostics and sensitivity routes}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Why missingness matters in intensive longitudinal data

In EMA and diary studies, missing responses are often **non-ignorable in substance** even when analysts assume **missing at random (MAR)** for estimation: burden, symptom severity, context, or device issues can co-determine both **whether** a prompt is answered and **the outcome**. tidyILD does not replace dedicated missing-data software; it gives **structured diagnostics**, **person-level adherence views**, **time-oriented summaries**, and **hooks** to IPW-based sensitivity workflows already in the package.

**MNAR** (missing not at random) means missingness depends on unobserved values or latent states. No routine plot proves MAR vs MNAR. Use multiple **sensitivity routes** and transparent reporting.

## Types of missingness (useful labels)

- **Unit non-response / attrition:** a person stops contributing (often modeled as **monotone** dropout on the outcome).
- **Intermittent missingness:** gaps with later observed values again; common in EMA.
- **Item missingness:** some variables missing while others are observed on the same prompt.

The **ordinal occasion index** `.ild_seq` (from `ild_prepare()`) is the default backbone for “wave” summaries; it is **not** the same as equal calendar spacing—see `vignette("ild-decomposition-and-spacing", package = "tidyILD")` when timing is irregular.

## Descriptive profiling: `ild_missing_pattern()` and heatmaps

`ild_missing_pattern()` tabulates NA rates by variable and by person, and builds a **person × occasion** heatmap (sequence index on the x-axis). Pass `outcome` to **enrich** `by_id` with compliance metrics from `ild_missing_compliance()` (see below).

```{r pattern}
library(tidyILD)
set.seed(11)
d <- ild_simulate(n_id = 25, n_obs_per = 12, seed = 11)
d$stress <- rnorm(nrow(d))
d$mood <- d$y
miss_i <- sample(nrow(d), 45)
d$mood[miss_i] <- NA
x <- ild_prepare(d, id = "id", time = "time")
mp <- ild_missing_pattern(x, vars = c("mood", "stress"), outcome = "mood")
mp$summary
head(mp$by_id, 3)
```

Plot the same view with `ild_plot(x, type = "missingness", var = "mood")` (see `?ild_plot`).

## Person-level compliance: `ild_missing_compliance()`

`tidyILD::ild_missing_compliance()` returns, per `.ild_id`:

- **`pct_nonmissing_outcome`**, **`longest_run_observed`** (longest streak of observed values in time order),
- **`monotone_missing`**: `TRUE` if, after the first missing outcome, all later values are missing (`NA` if there is no missingness for that person),
- optional **`expected_occasions`** for rough **adherence vs planned N** (`pct_of_expected`, `meets_expected_rows`).

```{r compliance}
cm <- ild_missing_compliance(x, outcome = "mood", expected_occasions = 12L)
summary(cm$pct_nonmissing_outcome)
```

## When to use `ild_missing_model()` and `ild_missing_bias()`

- **`ild_missing_model()`** fits a **logistic** model for `is.na(outcome)` ~ predictors (pooling `glm` or `glmer` with `random = TRUE`). Use it as a **diagnostic** for whether observed covariates predict missingness, not as proof of MAR.
- **`ild_missing_bias()`** is a shortcut for **one numeric predictor** vs missingness (teaching / quick screening).

If predictors are associated with missingness, **complete-case summaries** of the outcome can be **biased** even when a mixed model uses all rows—because the **composition** of who contributes at each occasion may shift. Compare descriptive means by missingness pattern only as exploratory, not causal.

## Complete-case vs mixed models (careful wording)

A **linear mixed model** fitted to all available rows uses the **likelihood contribution from observed outcomes** conditional on random effects. Under **MAR** *and* correct **mean and covariance** specification, inference for the **outcome model** can be appropriate while **ignoring** the missingness mechanism (likelihood-based inference). That statement has **scope limits**:

- It concerns **outcome missingness** in the modeled response, not necessarily intermittent **predictor** missingness handled ad hoc.
- **MNAR** breaks the interpretation; **informative dropout** requires sensitivity analysis.
- **Complete-case analysis** (drop any person-occasion with missing outcome) changes the estimand when missingness is not MCAR.

tidyILD encourages comparing **descriptives** and **fits** on full vs complete-case data as a **coarse** sensitivity check, not a formal test.

## Cohort-level and hazard summaries

- **`ild_missing_cohort()`**: fraction of non-missing outcomes at each `.ild_seq` plus an optional line plot.
- **`ild_missing_hazard_first()`**: discrete **hazard** of being missing on the current row among rows **at risk** (previous occasion observed, or first occasion). Under **intermittent** missingness this is a rough **first-event** summary; under **monotone dropout** it aligns better with a discrete-time dropout hazard.

```{r cohort-hazard}
coh <- ild_missing_cohort(x, outcome = "mood", plot = FALSE)
head(coh$by_occasion)
head(ild_missing_hazard_first(x, outcome = "mood"))
```

## One entry point: `ild_missingness_report()`

`ild_missingness_report()` bundles compliance, `ild_missing_pattern()` (with `outcome` enrichment), cohort and hazard tables, optional `ild_missing_model()`, the same **late-dropout** heuristic used in guardrails (`GR_DROPOUT_LATE_CONCENTRATION`), and short **`snippets`** for methods text.

```{r report}
rpt <- ild_missingness_report(
  x,
  outcome = "mood",
  predictors = "stress",
  fit_missing_model = TRUE,
  random = FALSE,
  cohort_plot = FALSE
)
names(rpt)
rpt$snippets["overview"]
```

## MNAR as sensitivity (no single fix)

tidyILD does **not** fit selection models, pattern-mixture models, or joint models for MNAR. Consider external packages and pre-specified sensitivity analyses. The snippets in `ild_missingness_report()` remind readers that logistic missingness models are **diagnostic / sensitivity**, not proof of MAR.

## IPW and causal tools as **one** sensitivity route

If you fit `ild_missing_model()`, you can feed predicted probabilities into **`ild_ipw_weights()`** and **`ild_ipw_refit()`** for inverse-probability weighting (see `?ild_ipw_weights` and causal vignettes). This addresses **observed** confounding of missingness under a **MAR-like** weighting story; it is **not** a blanket MNAR solution.

```{r ipw-template, eval = FALSE}
mm <- ild_missing_model(x, outcome = "mood", predictors = c("stress"), random = TRUE)
x_w <- ild_ipw_weights(x, mm, stabilize = TRUE)
fit_w <- ild_ipw_refit(mood ~ stress + (1 | id), data = x_w, weights = ".ipw")
```

## Other templates (not evaluated here)

**Compare complete-case vs full mixed model** (same formula):

```{r cc-vs-full, eval = FALSE}
x_cc <- dplyr::filter(x, !is.na(mood))
fit_full <- ild_lme(mood ~ stress + (1 | id), data = x, warn_uncentered = FALSE)
fit_cc <- ild_lme(mood ~ stress + (1 | id), data = x_cc, warn_uncentered = FALSE)
```

**Multiple imputation outside tidyILD**, then `ild_prepare()` per imputed dataset and pool with **mice** / **mitools** / **brms**—keep the imputation model and substantive model aligned with your estimand.

## What tidyILD does **not** do (and where to look)

- **Full MI pipelines** — **mice**, **Amelia**, **jomo**, etc.
- **MNAR selection / pattern-mixture** — specialized books and packages; consult a statistician.
- **Continuous-time event models** for dropout — **survival** / **joint longitudinal-survival** packages.
- **Replacing** domain knowledge about **why** prompts are missed.

## See also

`vignette("tidyILD-workflow", package = "tidyILD")`, `vignette("msm-identification-and-recovery", package = "tidyILD")`, `?ild_diagnose`, `?ild_missing_pattern`, `?ild_missingness_report`.