--- title: "Regression-based control charts" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Regression-based control charts} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.2 ) ``` ```{r setup, message = FALSE} library(shewhartr) library(ggplot2) library(dplyr) ``` Classical Shewhart charts assume the process mean is *stationary* — constant in time, apart from chance variation. Many real processes violate this assumption: a curing oven drifts, a microbial culture grows, an epidemic curve rises and falls, sensor calibration shifts. Applying a classical chart to a trended process produces systematically wrong limits and a flood of false alarms (or, equivalently, false reassurances). The right approach is the one Mandel (1969) proposed in the *Journal of Quality Technology*'s very first issue: fit a model for the trend and place control limits around the *fitted curve*, not around a constant centre line. This is what `shewhart_regression()` does. ## A simple example: linear drift `temperature_drift` is 200 minutes of sensor readings on a curing oven. The truth is a slow linear drift superimposed on a periodic component plus noise. ```{r} fit <- shewhart_regression( temperature_drift, value = temp_c, index = minute, model = "linear" ) broom::glance(fit) ``` ```{r} autoplot(fit) ``` The chart's centre line follows the fitted line, and limits are $\pm 3\hat{\sigma}_R$ where $\hat{\sigma}_R$ is estimated from the moving range of the residuals. ## Sigma from residuals A subtlety: the residuals from a regression fit are *correlated* under ordinary least squares, even when the model is correctly specified (adjacent residuals share the influence of the same fitted slope). The classical $\overline{\mathrm{MR}}/d_2$ estimator partially absorbs this, but for short series or autocorrelated noise we recommend checking the residuals first via `shewhart_diagnostics()`: ```{r, eval = FALSE} shewhart_diagnostics(fit) # residuals~fitted, Q-Q, ACF, MR, histogram ``` If the ACF panel shows non-trivial autocorrelation, consider modelling it explicitly (a wider topic; see Box, Jenkins & Reinsel 2008) before relying on the chart. ## Phase detection `shewhart_regression()` can split the series into phases automatically: when a runs rule fires (default Nelson 2 — nine consecutive points on the same side of the fitted curve), a new phase begins and the model is re-fit. This generalises the original v0.1.x behaviour with a cleaner, configurable rule: ```{r} set.seed(1) trended <- tibble::tibble( t = 1:120, y = c( 1:60 * 0.5 + rnorm(60, sd = 0.5), # phase 1 30 + 1:60 * 0.1 + rnorm(60, sd = 0.5)) # phase 2: shift + slowdown ) fit <- shewhart_regression(trended, value = y, index = t, model = "linear", phase_rule = "nelson_2_nine_same") broom::glance(fit) length(fit$fits) # number of phases detected ``` The `phase_rule` argument accepts any rule from `shewhart_rules_available()`. The legacy 7-points rule from `v0.1.x` is still available as `"we_seven_same"`: ```{r} fit_legacy <- shewhart_regression(trended, value = y, index = t, model = "linear", phase_rule = "we_seven_same") length(fit_legacy$fits) ``` The trade-off is straightforward. With Nelson 2 (9 same side), the in-control ARL is about 256 — false phase changes are rare. With the WE 7-same rule, ARL_0 is about 64 — phase changes are detected faster but at a higher false-alarm cost. See the `arl-simulation` vignette for a quantitative comparison. ```{r} autoplot(fit) # Nelson 2 — usually 1–2 phases autoplot(fit_legacy) # WE 7 — typically more phases on the same data ``` ## A worked example with many phases and visible violations The COVID-19 mortality series for Recife is a textbook trended process: a long, irregular climb, a peak, a slow decline. With a single regression line and the legacy `we_seven_same` rule (the one the original SBPO 2020 paper used), the chart partitions the series into phases, fits a local trend in each, and flags the days when the observed value departs sharply from the local trend. ```{r, fig.height = 4.6} fit_recife <- shewhart_regression( cvd_recife, value = new_deaths, index = .t, model = "loglog", phase_rule = "we_seven_same", rules = c("nelson_1_beyond_3s", "we_seven_same"), lower_bound = 0 # death counts can't go negative ) length(fit_recife$fits) # phases detected nrow(fit_recife$violations) # individual flagged observations autoplot(fit_recife) ``` Each shaded band is a phase. The dashed lines are that phase's 3-sigma limits, the solid line is the regression centre line for that phase. Points highlighted in firebrick are the days flagged by the rule set as departing from the local trend — these are the days that motivated investigation in the original analysis. The `violations` table tells you exactly which days fired which rule: ```{r} head(fit_recife$violations, 8) ``` ## The model menu | `model = ...` | Functional form | |----------------|----------------------------------------------------| | `"linear"` | $y = \beta_0 + \beta_1 N$ | | `"log"` | $\log(y + 1) = \beta_0 + \beta_1 N$ | | `"loglog"` | $\log(\log(y/\alpha + 1) + 1) = \beta_0 + \beta_1 N$ | | `"gompertz"` | Gompertz cumulative growth (via `nls`) | | `"logistic"` | Logistic cumulative growth (via `nls`) | | `"auto"` | Box-Cox guidance to choose between linear/log/log-log | The `"auto"` setting calls `shewhart_box_cox()` internally and selects based on the maximum-likelihood lambda. It is a good first try when you don't have strong prior knowledge of the functional form. For full control, supply your own formula: ```{r, eval = FALSE} shewhart_regression(temperature_drift, value = temp_c, index = minute, formula = temp_c ~ poly(minute, 3)) ``` ## A growth-curve example `bacterial_growth` is a 24-hour OD time series whose true mean is a Gompertz curve. ```{r} fit_gomp <- shewhart_regression( bacterial_growth, value = od, index = hour, model = "gompertz" ) broom::glance(fit_gomp) ``` ```{r} autoplot(fit_gomp) ``` The Gompertz parameterisation we use comes from Zwietering et al. (1990) and is in `?Gompertz`. ## Interpreting violations In a regression chart, a violation means an observation departs from the *trend*, not from a constant baseline. This is exactly what we want for trended processes: the question becomes "is the *deviation from the expected trajectory* unusual?", not "is the value high or low compared to a fixed reference?". Phase changes are themselves interpreted as suspected shifts in the underlying process — a re-tuned controller, a new operator, a new batch of raw material. `shewhart_regression()` highlights them in `autoplot()` by colouring each phase distinctly. ## References * Mandel, B. J. (1969). The Regression Control Chart. *Journal of Quality Technology*, 1(1), 1-9. * Perla, R. J., Provost, S. M., Parry, G. J., Little, K., & Provost, L. (2020). Understanding variation in reported COVID-19 deaths with a novel Shewhart chart application. *International Journal for Quality in Health Care*, 32(S1), 49-55. — the three-phase hybrid C/I chart that motivated the multi-phase regression-chart design used by `shewhart_regression()`. * Ferraz, C., Petenate, A. J., Wanderley, A. L., Ospina, R., Torres, J., & Moreira, A. P. (2020). COVID-19: Monitoramento por gráficos de Shewhart. *Revista Brasileira de Estatística*. — the Brazilian adaptation; source of the legacy `we_seven_same` phase rule and the original analysis settings reused in the Recife example above. * Hawkins, D. M. (1991). Multivariate Quality Control Based on Regression-Adjusted Variables. *Technometrics*, 33(1), 61-75. * Wheeler, D. J., & Chambers, D. S. (1992). *Understanding Statistical Process Control* (2nd ed.). SPC Press. * Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. (2008). *Time Series Analysis: Forecasting and Control* (4th ed.). Wiley. * Zwietering, M. H. et al. (1990). Modeling of the Bacterial Growth Curve. *Applied and Environmental Microbiology*, 56(6), 1875-1881. * Box, G. E. P., & Cox, D. R. (1964). An Analysis of Transformations. *Journal of the Royal Statistical Society B*, 26(2), 211-252.