--- title: "trafficCAR Model Diagnostics and Checking" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{trafficCAR Model Diagnostics and Checking} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette explains how to interpret the diagnostic tools provided by **trafficCAR**. These diagnostics are designed to answer three questions: 1. Are there systematic discrepancies between the model and the observed data? 2. Is there remaining spatial structure that the model has failed to explain? 3. Are the fitted models capable of reproducing key features of the data? The diagnostics are intentionally simple and global. They are meant to flag problems early, not to replace detailed model criticism. --- ## Residual diagnostics The `residuals()` method for a `traffic_fit` object provides three types of residuals: - **Raw residuals**: \[ r_i = y_i - \hat{\mu}_i \] - **Structured residuals** (spatial effect): \[ r_i^{(s)} = \hat{x}_i \] - **Unstructured residuals**: \[ r_i^{(u)} = y_i - (\hat{\mu}_i - \hat{x}_i) \] Raw residuals reflect overall lack of fit. Unstructured residuals are particularly important: they represent the portion of the data that should be approximately independent if the spatial model is adequate. Typical usage: ```r r_raw <- residuals(fit, type = "raw") r_un <- residuals(fit, type = "unstructured") summary(r_raw) summary(r_un) ``` Interpretation guidelines: - Large means or skewness in raw residuals suggest systematic bias. - Heavy tails indicate underestimation of variability. - Unstructured residuals should have smaller variance than raw residuals if the spatial component is contributing meaningfully. --- ## Moran’s I on residuals Spatial autocorrelation in residuals is assessed using Moran’s I via `moran_residuals()`. ```r moran_residuals(fit, type = "unstructured", method = "permutation") ``` Interpretation depends on the residual type: - **Raw residuals**: Significant Moran’s I indicates spatial structure not captured by the mean model. - **Unstructured residuals**: Significant Moran’s I indicates spatial dependence that remains after accounting for the CAR component. - **Structured residuals**: Positive Moran’s I is expected and reflects the imposed spatial smoothing. Permutation-based p-values should be interpreted as global diagnostics. A small p-value for unstructured residuals is a strong indication of model misspecification (e.g., missing covariates or inappropriate neighborhood structure). If residual variance is zero, Moran’s I is undefined and returned as `NA`. This typically occurs in saturated or near-saturated models. --- ## Posterior predictive checks Posterior predictive checks (PPCs) compare observed summary statistics to their distribution under replicated data generated from the fitted model. ```r ppc <- ppc_summary(fit, stats = c("mean", "var", "tail")) print(ppc) ``` The following statistics are reported: - **Mean**: checks overall location - **Variance**: checks dispersion - **Tail probabilities**: checks distributional shape Each statistic is accompanied by a posterior predictive p-value: \[ \text{p-value} = P(T(y^{rep}) \ge T(y) \mid y) \] Interpretation guidelines: - Values near 0 or 1 indicate lack of fit. - Values near 0.5 indicate good agreement. - Systematic failures across multiple statistics suggest model inadequacy. PPCs are not formal hypothesis tests. They are descriptive tools intended to highlight discrepancies between the model and the data. --- ## Practical workflow A recommended diagnostic workflow is: 1. Inspect raw and unstructured residual summaries. 2. Compute Moran’s I on unstructured residuals. 3. Run posterior predictive checks on means and variances. Consistent signals across these diagnostics provide strong evidence for or against model adequacy. --- ## Limitations The diagnostics provided here are intentionally conservative: - They are global rather than local. - They do not identify specific problematic road segments. - They assume correctly specified adjacency structures. These tools are best viewed as a first line of model checking rather than a complete diagnostic framework.