---
title: "Autoscaling in lme4"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Autoscaling in lme4}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: autoscale_ref.bib  
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

Scaling predictors in a linear model (or extended linear models, such as LMMs or GLMMs) so that all of the predictors are on a similar scale (and thus, in general, the estimated parameters should have a similar scale) can improve the behaviour of various computational methods (see e.g. the "Remove eccentricity by scaling" section of @bolker2013strategies). `lmer()` and `glmer()` issue warnings suggesting that users consider re-scaling predictors when the predictor variables are on very different scales. Re-scaling predictors can improve both computation and interpretation [@schielzethSimple2010] of (extended) linear models. However, in cases where researchers want to scale predictors for computational stability, but get parameter estimates on the scale of the original (unscaled) predictors, back-transforming the predictors, covariance matrix, etc., is tedious. `lme4` now allows this scaling to be done automatically, behind the scenes.

An example of a warning message due to disparate predictor scales:

```{r setup}
library(lme4)

set.seed(1)
sleepstudy$var1 = runif(nrow(sleepstudy), 1e6, 1e7)

fit1 <- lmer(Reaction ~ var1 + Days + (Days | Subject), sleepstudy)
```

Instead of re-fitting with scaled predictors we can use a new (as of `lme4` version >= 1.1.37) argument for `(g)lmerControl`, called `autoscale`, where `scale()` is automatically applied to the continuous covariates, but when delivered to the user, the coefficient estimates and the covariances are back-transformed. This maintains the original interpretation and minimizes user intervention.

Hence:

```{r fit}
fit2 <- lmer(Reaction ~ var1 + Days + (Days | Subject), 
             control = lmerControl(autoscale = TRUE), sleepstudy)
all.equal(fixef(fit1), fixef(fit2))
all.equal(vcov(fit2), vcov(fit2))
```

```{r echo = FALSE}
## https://stackoverflow.com/a/67456510/190277
get_val <- function(x) as.numeric(gsub("^.*?(\\d+\\.?\\d+e?[+-]?\\d+).*$", "\\1", x))
get_tol <- function(x,y, dig=1) signif(get_val(all.equal(x, y, tolerance = 0)), dig)
ftol <- get_tol(fixef(fit1), fixef(fit2))
vtol <- get_tol(vcov(fit1), vcov(fit2))
```

The parameters are in fact slightly different (relative difference in fixed effects of $`r vtol`$, and in covariance matrices of $`r ftol`$); these differences will be larger for less numerically stable models (i.e. more complicated or poorly data-constrained).

## Autoscale Mechanism

The base function `scale()` is applied to the model matrix `X` as found in `lFormula()` or `glFormula()` from `R/Modular.R`. 

To back transform, we use the `scaled:center` and `scaled:scale` attributes to reverse the changes found in `fixef.merMod()`, `model.matrix.merMod()`, and `vcov.merMod()` such that the `summary()` output shows the coefficients according to the original scale.

Reverting back the changes for `fixef.merMod()`, the $\beta$ coefficients, is not too difficult. However, reverting said changes for the variance-covariance matrix is a bit more involved. The exact modification can be found from the function `scale_vcov()` (found in `R/utilities.R`), and the derivation is explained below.


Consider the following estimation of a simple linear model: 
\[
\widehat{Y} = \widehat{\beta}_{0} + \widehat{\beta}_{1} X^{*},
\]
where $X^{*}$ represents the scaled version of $X$. That is, if we let $C$ represent a vector that contains the values for centering and $S$ represent a vector that contains the values for scaling, we have:
\[
\widehat{Y} = \widehat{\beta}_{0} + \widehat{\beta}_{1} \left( \frac{X - C}{S} \right)
\]
\[
\widehat{Y} = \widehat{\beta}_{0} - \sum_{i=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}}
+ \sum_{i=1}^{p} \frac{\widehat{\beta}_{i} x_{i}}{s_{i}}.
\]
From the above, it is clear that the new intercept can be represented as:
\[
\widehat{\beta}_{0}' = \widehat{\beta}_{0} - \sum_{i=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}}.
\]
Similarly, the new coefficients are represented as:
\[
\widehat{\beta}_{i}' =  \frac{\widehat{\beta}_{i}}{s_{i}}.
\]
Then, the new variance-covariance matrix can be derived using the following:
\newcommand{\cov}{\textrm{Cov}}}
\[
\cov\left( \frac{\widehat{\beta}_{i}}{s_{i}}, \frac{\widehat{\beta}_{j}}{s_{j}} \right)
= \frac{1}{s_{i}s_{j}} \cov(\widehat{\beta}_{i}, \widehat{\beta_{j}}) 
= \frac{\sigma_{ij}^{2}}{s_{i}s_{j}}
\]
\begin{equation}
    \begin{split}
    \cov\left( \widehat{\beta}_{0} - \sum_{i=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}}, \widehat{\beta}_{0} - \sum_{j=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}} \right)
    &= \cov(\widehat{\beta}_{0}, \widehat{\beta}_{0} ) - 2 \sum_{i=1}^{p} \frac{c_{i}}{s_{i}} \cov(\widehat{\beta}_{0} , \widehat{\beta}_{i}) + \sum_{i=1}^{p} \sum_{j=1}^{p} \frac{c_{i}c_{j}}{s_{i}s_{j}} \cov(\widehat{\beta}_{i}, \widehat{\beta}_{j}) \\
    &= \sigma_{0}^{2} - 2 \sum_{i=1}^{p} \frac{c_{i}}{s_{i}} \sigma_{0i}^{2} + \sum_{i=1}^{p} \sum_{j=1}^{p} \frac{c_{i}c_{j}}{s_{i}s_{j}} \sigma_{ij}^{2}
    \end{split}
\end{equation}
\begin{equation}   
    \begin{split}
    \cov\left( \widehat{\beta}_{0} - \sum_{i=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}}, \frac{\widehat{\beta_{j}}}{s_{j}} \right)
    &= \cov\left( \widehat{\beta}_{0}, \frac{\widehat{\beta}_{j}}{s_{j}} \right) - \sum_{i=1}^{p} \frac{c_{i}}{s_{i}s_{j}} \cov \left( \widehat{\beta}_{i}, \widehat{\beta}_{j} \right) \\
    &= \frac{\sigma_{0j}^{2}}{s_{j}} - \sum_{i=1}^{p} \frac{c_{i}}{s_{i}s_{j}} \sigma_{ij}^{2},
    \end{split}
\end{equation}
for $j = 1, 2, ..., p$.

## References