--- title: "The MLE Ecosystem" author: "Alexander Towell" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{The MLE Ecosystem} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4, eval = FALSE ) ``` ## Introduction `compositional.mle` is part of an ecosystem of R packages that share a single design principle from *Structure and Interpretation of Computer Programs* (SICP): **closure**---the result of combining things is itself the same kind of thing, ready to be combined again. This principle appears at three levels: | Level | Package | Closure property | |-------|---------|------------------| | **Solvers** | `compositional.mle` | Composing solvers yields a solver | | **Estimates** | `algebraic.mle` | Combining MLEs yields an MLE | | **Tests** | `hypothesize` | Combining hypothesis tests yields a test | This vignette walks through that story. All packages referenced below are available on CRAN. ## Composing Solvers The core idea of `compositional.mle` is that solvers are first-class functions that compose via operators: - `%>>%` (sequential chaining) --- pipe the result of one solver into the next - `%|%` (parallel racing) --- run solvers simultaneously, keep the best - `with_restarts()` --- retry a solver from multiple starting points Because each operator returns a solver, you can compose freely: ```{r} library(compositional.mle) # Generate data set.seed(42) y <- rnorm(200, mean = 5, sd = 2) # Define the log-likelihood (numDeriv handles derivatives by default) ll <- function(theta) { mu <- theta[1]; sigma <- theta[2] -length(y) * log(sigma) - 0.5 * sum((y - mu)^2) / sigma^2 } problem <- mle_problem(loglike = ll) # Coarse-to-fine: grid search for a starting region, then L-BFGS-B strategy <- grid_search(lower = c(0, 0.5), upper = c(10, 5), n = 5) %>>% lbfgsb(lower = c(-Inf, 1e-4), upper = c(Inf, Inf)) result <- strategy(problem, theta0 = c(0, 1)) result ``` ``` #> MLE Solver Result #> ----------------- #> Method: grid -> lbfgsb #> Converged: TRUE #> Iterations: 10 #> Log-likelihood: -424.84 #> #> Estimate: #> [1] 5.0609 1.9498 ``` You can race different approaches and let the best log-likelihood win: ```{r} race <- bfgs() %|% nelder_mead() %|% gradient_ascent() result_race <- race(problem, theta0 = c(0, 1)) ``` Or combine restarts with chaining: ```{r} robust <- with_restarts( gradient_ascent() %>>% bfgs(), n = 10, sampler = uniform_sampler(lower = c(-10, 0.1), upper = c(10, 10)) ) result_robust <- robust(problem, theta0 = c(0, 1)) ``` See `vignette("strategy-design")` for a thorough treatment of solver composition. ## The `mle` Result Object Every solver returns an `mle` object (defined by `algebraic.mle`). This is the shared currency of the ecosystem---any package that produces or consumes `mle` objects works with every other package. The `mle` class provides a uniform interface: ```{r} library(algebraic.mle) # Extract estimates and uncertainty params(result) # parameter estimates (theta-hat) se(result) # standard errors vcov(result) # variance-covariance matrix loglik_val(result) # log-likelihood at the MLE confint(result) # 95% confidence intervals ``` ``` #> params(result) #> [1] 5.0609 1.9498 #> #> se(result) #> [1] 0.1379 0.0975 #> #> vcov(result) #> [,1] [,2] #> [1,] 0.019009 -0.000007 #> [2,] -0.000007 0.009506 #> #> loglik_val(result) #> [1] -424.84 #> #> confint(result) #> 2.5 % 97.5 % #> [1,] 4.7906 5.3312 #> [2,] 1.7587 2.1410 ``` The key point: **results from any solver have the same interface**. Whether you used Newton-Raphson, a grid search, or a composed pipeline, `params()`, `se()`, and `confint()` work the same way. ## Combining Independent MLEs Suppose three independent labs each estimate the mean of a Normal$(\mu, \sigma)$ population from their own data. Each produces an `mle` object. `algebraic.mle` provides `mle_weighted()` to combine them via **inverse-variance weighting** using Fisher information: ```{r} # Three independent datasets from the same population set.seed(123) y1 <- rnorm(50, mean = 10, sd = 3) y2 <- rnorm(100, mean = 10, sd = 3) y3 <- rnorm(75, mean = 10, sd = 3) # Fit each independently make_problem <- function(data) { mle_problem(loglike = function(theta) { mu <- theta[1]; sigma <- theta[2] -length(data) * log(sigma) - 0.5 * sum((data - mu)^2) / sigma^2 }) } solver <- lbfgsb(lower = c(-Inf, 1e-4), upper = c(Inf, Inf)) fit1 <- solver(make_problem(y1), theta0 = c(0, 1)) fit2 <- solver(make_problem(y2), theta0 = c(0, 1)) fit3 <- solver(make_problem(y3), theta0 = c(0, 1)) # Combine: inverse-variance weighting via Fisher information combined <- mle_weighted(list(fit1, fit2, fit3)) ``` ```{r} # Individual SEs vs combined data.frame( Source = c("Lab 1 (n=50)", "Lab 2 (n=100)", "Lab 3 (n=75)", "Combined"), Estimate = c(params(fit1)[1], params(fit2)[1], params(fit3)[1], params(combined)[1]), SE = c(se(fit1)[1], se(fit2)[1], se(fit3)[1], se(combined)[1]) ) ``` ``` #> Source Estimate SE #> 1 Lab 1 (n=50) 10.207 0.4243 #> 2 Lab 2 (n=100) 9.869 0.2998 #> 3 Lab 3 (n=75) 9.753 0.3464 #> 4 Combined 9.934 0.1995 ``` The combined estimate has a smaller standard error than any individual estimate---this is the power of information pooling. And because `mle_weighted()` returns an `mle` object, you can pass it to any function that accepts MLEs: `confint()`, `se()`, further weighting, or hypothesis testing. **Combining MLEs yields an MLE** (closure). ## Hypothesis Testing with `hypothesize` The [`hypothesize`](https://CRAN.R-project.org/package=hypothesize) package provides composable hypothesis tests. Like solvers and estimates, **combining tests yields a test**. ### Creating tests from MLE results `wald_test()` tests whether a single parameter equals a hypothesised value. `lrt()` compares nested models via the likelihood ratio: ```{r} library(hypothesize) # Wald test: is mu = 10? w <- wald_test(estimate = params(combined)[1], se = se(combined)[1], null_value = 10) w ``` ``` #> Hypothesis test ( wald_test ) #> ----------------------------- #> Test statistic: 0.1094 #> P-value: 0.7409 #> Degrees of freedom: 1 #> Significant at 5% level: FALSE ``` ```{r} # Likelihood ratio test: does sigma differ from 3? # Compare full model (mu, sigma free) vs restricted (mu free, sigma = 3) ll_full <- loglik_val(fit2) # full model log-likelihood ll_null <- sum(dnorm(y2, mean = params(fit2)[1], sd = 3, log = TRUE)) lr <- lrt(null_loglik = ll_null, alt_loglik = ll_full, dof = 1) lr ``` ``` #> Hypothesis test ( likelihood_ratio_test ) #> ------------------------------------------ #> Test statistic: 0.2851 #> P-value: 0.5934 #> Degrees of freedom: 1 #> Significant at 5% level: FALSE ``` ### Combining independent tests When you have independent tests (e.g., each lab tests $H_0: \mu = 10$), `fisher_combine()` merges them using Fisher's method: ```{r} # Each lab tests H0: mu = 10 w1 <- wald_test(params(fit1)[1], se(fit1)[1], null_value = 10) w2 <- wald_test(params(fit2)[1], se(fit2)[1], null_value = 10) w3 <- wald_test(params(fit3)[1], se(fit3)[1], null_value = 10) # Combine independent tests combined_test <- fisher_combine(w1, w2, w3) combined_test ``` ``` #> Hypothesis test ( fisher_combined_test ) #> ----------------------------------------- #> Test statistic: 3.4287 #> P-value: 0.7534 #> Degrees of freedom: 6 #> Significant at 5% level: FALSE ``` ```{r} # Multiple testing correction adjusted <- adjust_pval(list(w1, w2, w3), method = "BH") ``` The combined test is itself a `hypothesis_test` object---you can extract `pval()`, check `is_significant_at()`, or combine it further. This is the closure property at work. ### The testing interface All `hypothesis_test` objects share a uniform interface, mirroring how all `mle` objects share `params()` / `se()`: ```{r} pval(w) # extract p-value test_stat(w) # extract test statistic dof(w) # degrees of freedom is_significant_at(w, 0.05) # check significance at alpha = 0.05 confint(w) # confidence interval (for Wald tests) ``` ## Automatic Differentiation Derivatives are pluggable in `mle_problem()`. The solver never knows (or cares) where they come from: ```{r} # Strategy 1: numDeriv (default) --- zero effort p1 <- mle_problem(loglike = ll) # Strategy 2: hand-coded analytic derivatives p2 <- mle_problem( loglike = ll, score = function(theta) { mu <- theta[1]; sigma <- theta[2]; n <- length(y) c(sum(y - mu) / sigma^2, -n / sigma + sum((y - mu)^2) / sigma^3) }, fisher = function(theta) { n <- length(y); sigma <- theta[2] matrix(c(n / sigma^2, 0, 0, 2 * n / sigma^2), 2, 2) } ) # Strategy 3: automatic differentiation # Any AD package that provides score(f, theta) and hessian(f, theta) # can be plugged in here, e.g.: # p3 <- mle_problem( # loglike = ll, # score = function(theta) ad_pkg::score(ll, theta), # fisher = function(theta) ad_pkg::hessian(ll, theta) # ) ``` Each strategy has trade-offs: | Approach | Accuracy | Effort | Speed | |----------|----------|--------|-------| | Numerical (`numDeriv`) | $O(h^2)$ truncation error | None (default) | Slow (extra evaluations) | | Hand-coded analytic | Exact | High (error-prone) | Fast | | Automatic differentiation (AD package) | Exact (machine precision) | Low | Moderate | The general recommendation: **start with the default** (no explicit derivatives) during model development, then **switch to an AD package** or hand-coded derivatives when you need accuracy or performance. For derivative-free problems, use `nelder_mead()`. | Situation | Recommended | Reason | |-----------|-------------|--------| | Quick prototyping | `numDeriv` (default) | Zero effort, just write the log-likelihood | | Production / accuracy-critical | AD package or hand-coded | Exact derivatives | | Complex models (mixtures, hierarchical) | AD package | Hand-coding is error-prone and tedious | | Non-differentiable objective | `nelder_mead()` | Derivative-free simplex method | ## The Ecosystem at a Glance The packages form a pipeline: **define** a model, **solve** for the MLE, **combine** estimates, and **test** hypotheses---with closure at every level. | Package | CRAN | Purpose | |---------|------|---------| | [`algebraic.mle`](https://CRAN.R-project.org/package=algebraic.mle) | Yes | MLE algebra: `params()`, `se()`, `confint()`, `mle_weighted()` | | [`algebraic.dist`](https://CRAN.R-project.org/package=algebraic.dist) | Yes | Distribution algebra: convolution, mixture, truncation | | [`compositional.mle`](https://github.com/queelius/compositional.mle) | --- | Solver composition: `%>>%`, `%|%`, `with_restarts()` | | [`hypothesize`](https://CRAN.R-project.org/package=hypothesize) | Yes | Test composition: `wald_test()`, `lrt()`, `fisher_combine()` | | [`dfr.dist`](https://github.com/queelius/dfr.dist) | --- | Distribution-free reliability methods | The typical workflow: 1. **Define** your model as a log-likelihood and pass it to `mle_problem()` 2. **Solve** with a composed solver strategy from `compositional.mle` --- get an `mle` object 3. **Combine** independent estimates with `mle_weighted()` from `algebraic.mle` --- get a (better) `mle` object 4. **Test** hypotheses with `hypothesize` --- get a `hypothesis_test` object 5. **Combine** tests with `fisher_combine()` --- get a (stronger) `hypothesis_test` object At every step, the result is the same type as the input. That is the SICP closure property, and it is what makes the ecosystem composable.