---
title: "Introduction to smriti: Structural Variance Preservation"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to smriti}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## The Imputation Uncertainty Principle
Modern machine learning imputation algorithms (like `missForest`) excel at minimizing point-wise prediction error (RMSE). However, this point-wise optimization inherently shrinks the variance of the imputed values, causing **structural variance collapse**. In longitudinal Growth Curve Models (GCM), this crushes the latent slope variance ($\sigma^2_S$), destroying the statistical power needed to track patient trajectories over time.

The `smriti` package resolves this by decoupling prediction from structural geometry. It utilizes a two-stage architecture:
1. **Initialization:** Non-parametric imputation bridges the missingness to establish a dense matrix.
2. **Lagrangian Projection:** A C++ gradient descent layer forces the hallucinated data onto a target covariance manifold, constrained by a Lagrangian multiplier ($\lambda$).

## The Robustness-Efficiency Tradeoff
Real-world clinical data often contains heavy-tailed skew or corrupted sensor artifacts. The `smriti_impute()` function handles this via the `robust` routing toggle.

*   `robust = FALSE`: Utilizes standard pairwise complete covariance. Ideal for perfectly Normal data or naturally heavy-tailed biological distributions (e.g., Lognormal structural neuroimaging).
*   `robust = TRUE`: Utilizes the Minimum Covariance Determinant (MCD) estimator. It isolates the densest core of the data, creating a target manifold that is mathematically immune to severe clinical outliers (e.g., broken EHR sensors).

## Core Implementation: Handling Gradient Explosion
To prevent gradient explosion in the C++ backend when projecting high-magnitude clinical markers (e.g., Hippocampal volumes $\approx 7000$), `smriti` enforces internal Z-score standardization. The data is scaled to $\mu=0, \sigma^2=1$ prior to Lagrangian optimization, and un-scaled upon convergence, ensuring absolute numerical stability.

## Example: Shielding Against Corrupted EHR Data
```{r, eval=FALSE}
library(smriti)
library(missForest)

# Load clinical data with structural missingness and sensor artifacts
data <- read.csv("clinical_proxy.csv")

# Execute robust refinement to isolate the structural manifold
clean_data <- smriti_impute(
  data = data, 
  time_cols = c("T1", "T2", "T3", "T4"), 
  robust = TRUE,
  lambda = 0.5
)
```