---
title: "Exponential Lifetime Model"
output:
    rmarkdown::html_vignette:
        toc: true
vignette: >
  %\VignetteIndexEntry{Exponential Lifetime Model}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.width = 6,
    fig.height = 4
)
old_opts <- options(digits = 4)
```

## Introduction

The `exponential_lifetime` model is a reference implementation demonstrating
how to build a specialized likelihood model with:

- **Closed-form MLE**: The `fit()` method computes the MLE directly as
  $\hat\lambda = d/T$, bypassing `optim` entirely.
- **Analytical derivatives**: Score, Hessian, and Fisher information matrix
  in closed form.
- **Right-censoring**: Natural handling via the sufficient statistic $(d, T)$.
- **`rdata()` method**: For Monte Carlo validation and bootstrap studies.

The log-likelihood for Exponential($\lambda$) data with right-censoring is:

$$\ell(\lambda) = d \log \lambda - \lambda T$$

where $d$ is the number of exact (uncensored) observations and $T$ is the
total observation time (sum of all times, whether censored or not).

```{r load, message=FALSE, warning=FALSE}
library(likelihood.model)
```

## Uncensored Data

The simplest case: all observations are exact.

```{r uncensored}
set.seed(42)
true_lambda <- 2.5
df <- data.frame(t = rexp(200, rate = true_lambda))

model <- exponential_lifetime("t")
assumptions(model)
```

### Closed-Form MLE

Unlike most models that require `optim`, the exponential model computes the
MLE directly. No initial guess is needed:

```{r fit-uncensored}
mle <- fit(model)(df)

cat("MLE:", coef(mle), "(true:", true_lambda, ")\n")
cat("SE:", se(mle), "\n")
cat("95% CI:", confint(mle)[1, ], "\n")
cat("Score at MLE:", score_val(mle), "(exactly zero by construction)\n")
```

### How the Closed-Form MLE Works

The MLE is $\hat\lambda = n / \sum x_i$ for uncensored data. We can verify
this against the manual calculation:

```{r verify-mle}
n <- nrow(df)
total_time <- sum(df$t)
manual_mle <- n / total_time

cat("fit() result:   ", coef(mle), "\n")
cat("Manual n/T:     ", manual_mle, "\n")
cat("Match:", all.equal(unname(coef(mle)), manual_mle), "\n")
```

## Right-Censored Data

In reliability and survival analysis, observations are often right-censored:
we know the item survived past a certain time, but not when it actually failed.

```{r censored-data}
set.seed(42)
true_lambda <- 2.0
censor_time <- 0.5

# Generate latent failure times
x <- rexp(300, rate = true_lambda)
censored <- x > censor_time

df_cens <- data.frame(
  t = pmin(x, censor_time),
  status = ifelse(censored, "right", "exact")
)

cat("Sample size:", nrow(df_cens), "\n")
cat("Censoring rate:", round(mean(censored) * 100, 1), "%\n")
cat("Exact observations (d):", sum(!censored), "\n")
cat("Total time (T):", round(sum(df_cens$t), 2), "\n")
```

### Fitting with Censoring

```{r fit-censored}
model_cens <- exponential_lifetime("t", censor_col = "status")
assumptions(model_cens)

mle_cens <- fit(model_cens)(df_cens)

cat("\nMLE:", coef(mle_cens), "(true:", true_lambda, ")\n")
cat("SE:", se(mle_cens), "\n")
cat("95% CI:", confint(mle_cens)[1, ], "\n")
```

### Ignoring Censoring Leads to Bias

A common mistake is to treat censored times as if they were exact observations.
This biases the rate estimate upward (equivalently, the mean lifetime downward):

```{r censoring-bias}
# Wrong: treat all observations as exact
model_wrong <- exponential_lifetime("t")
mle_wrong <- fit(model_wrong)(df_cens)

cat("Comparison:\n")
cat("  True lambda:          ", true_lambda, "\n")
cat("  MLE (correct):        ", coef(mle_cens), "\n")
cat("  MLE (ignoring censor):", coef(mle_wrong), "\n")
cat("\nIgnoring censoring overestimates the failure rate!\n")
```

## Analytical Derivatives

The model provides score and Hessian in closed form, which we can verify
against numerical differentiation:

```{r derivatives}
set.seed(99)
df_small <- data.frame(t = rexp(50, rate = 3))
model_small <- exponential_lifetime("t")
lambda_test <- 2.5

# Analytical score
score_fn <- score(model_small)
analytical_score <- score_fn(df_small, lambda_test)

# Numerical score (via numDeriv)
ll_fn <- loglik(model_small)
numerical_score <- numDeriv::grad(
  function(p) ll_fn(df_small, p), lambda_test
)

cat("Score at lambda =", lambda_test, ":\n")
cat("  Analytical:", analytical_score, "\n")
cat("  Numerical: ", numerical_score, "\n")
cat("  Match:", all.equal(unname(analytical_score), numerical_score), "\n")

# Analytical Hessian
hess_fn <- hess_loglik(model_small)
analytical_hess <- hess_fn(df_small, lambda_test)

numerical_hess <- numDeriv::hessian(
  function(p) ll_fn(df_small, p), lambda_test
)

cat("\nHessian at lambda =", lambda_test, ":\n")
cat("  Analytical:", analytical_hess[1, 1], "\n")
cat("  Numerical: ", numerical_hess[1, 1], "\n")
cat("  Match:", all.equal(analytical_hess[1, 1], numerical_hess[1, 1]), "\n")
```

## Fisher Information Matrix

The expected Fisher information for Exponential($\lambda$) is
$I(\lambda) = n / \lambda^2$. The model provides this analytically:

```{r fim}
fim_fn <- fim(model_small)
n_obs <- nrow(df_small)
lambda_hat <- coef(fit(model_small)(df_small))

fim_analytical <- fim_fn(lambda_hat, n_obs)

cat("FIM at MLE (lambda =", lambda_hat, "):\n")
cat("  Analytical n/lambda^2:", n_obs / lambda_hat^2, "\n")
cat("  fim() result:         ", fim_analytical[1, 1], "\n")
cat("  Match:", all.equal(n_obs / unname(lambda_hat)^2, fim_analytical[1, 1]), "\n")
```

## Fisherian Likelihood Inference

The package supports pure likelihood-based inference. Instead of confidence
intervals (which make probability statements about parameters), likelihood
intervals identify the set of parameter values that are "well supported"
by the data.

```{r fisherian}
set.seed(42)
df_fish <- data.frame(t = rexp(100, rate = 2.0))
model_fish <- exponential_lifetime("t")
mle_fish <- fit(model_fish)(df_fish)

# Support function: S(lambda) = logL(lambda) - logL(lambda_hat)
# Support at MLE is always 0
s_mle <- support(mle_fish, coef(mle_fish), df_fish, model_fish)
cat("Support at MLE:", s_mle, "\n")

# At a different value, support is negative
s_alt <- support(mle_fish, c(lambda = 1.5), df_fish, model_fish)
cat("Support at lambda=1.5:", s_alt, "\n")

# Relative likelihood: R(lambda) = L(lambda)/L(lambda_hat) = exp(S)
rl_alt <- relative_likelihood(mle_fish, c(lambda = 1.5), df_fish, model_fish)
cat("Relative likelihood at lambda=1.5:", rl_alt, "\n")
```

### Likelihood Intervals

A 1/k likelihood interval contains all $\lambda$ where
$R(\lambda) \geq 1/k$. The 1/8 interval is roughly analogous to a 95%
confidence interval:

```{r likelihood-interval}
li <- likelihood_interval(mle_fish, df_fish, model_fish, k = 8)
print(li)

cat("\nWald 95% CI for comparison:\n")
print(confint(mle_fish))
```

### Profile Log-Likelihood

```{r profile, fig.width=6, fig.height=4}
prof <- profile_loglik(mle_fish, df_fish, model_fish, param = 1, n_grid = 100)

plot(prof$lambda, prof$support, type = "l", lwd = 2,
     xlab = expression(lambda), ylab = "Support",
     main = "Profile Support Function")
abline(h = -log(8), lty = 2, col = "red")
abline(v = coef(mle_fish), lty = 3, col = "blue")
legend("topright",
       legend = c("Support S(lambda)", "1/8 cutoff", "MLE"),
       lty = c(1, 2, 3), col = c("black", "red", "blue"), lwd = c(2, 1, 1))
```

## Monte Carlo Validation with `rdata()`

The `rdata()` method generates synthetic data from the model, enabling
Monte Carlo studies. Here we verify that the MLE is unbiased and that the
asymptotic standard error matches the empirical standard deviation:

```{r monte-carlo, cache=TRUE}
set.seed(42)
true_lambda <- 3.0
n_obs <- 100
n_sim <- 1000

model_mc <- exponential_lifetime("t")
gen <- rdata(model_mc)

# Simulate n_sim datasets and fit each
mle_vals <- replicate(n_sim, {
  sim_df <- gen(true_lambda, n_obs)
  coef(fit(model_mc)(sim_df))
})

cat("Monte Carlo results (", n_sim, "simulations, n=", n_obs, "):\n")
cat("  True lambda:       ", true_lambda, "\n")
cat("  Mean of MLEs:      ", mean(mle_vals), "\n")
cat("  Bias:              ", mean(mle_vals) - true_lambda, "\n")
cat("  Empirical SE:      ", sd(mle_vals), "\n")
cat("  Theoretical SE:    ", true_lambda / sqrt(n_obs), "\n")
cat("  (SE = lambda/sqrt(n) from Fisher information)\n")
```

## Cross-Validation with `likelihood_name("exp")`

The analytical model should produce the same log-likelihood as the generic
distribution-wrapping approach:

```{r crossval}
set.seed(42)
df_cv <- data.frame(t = rexp(100, rate = 2.0))

model_analytical <- exponential_lifetime("t")
model_generic <- likelihood_name("exp", ob_col = "t",
                                  censor_col = "censor")

# Need a censor column for likelihood_name
df_cv_generic <- data.frame(t = df_cv$t, censor = rep("exact", 100))

lambda_test <- 2.3
ll_analytical <- loglik(model_analytical)(df_cv, lambda_test)
ll_generic <- loglik(model_generic)(df_cv_generic, lambda_test)

cat("Log-likelihood at lambda =", lambda_test, ":\n")
cat("  Analytical model:", ll_analytical, "\n")
cat("  Generic model:   ", ll_generic, "\n")
cat("  Match:", all.equal(ll_analytical, ll_generic), "\n")
```

## Summary

The `exponential_lifetime` model demonstrates several design patterns
available in the `likelihood.model` framework:

| Feature | How it's implemented |
|---|---|
| Closed-form MLE | Override `fit()` to return `fisher_mle` directly |
| Analytical derivatives | Implement `score()`, `hess_loglik()`, `fim()` |
| Right-censoring | Sufficient statistics $(d, T)$ handle censoring naturally |
| Data generation | `rdata()` method for Monte Carlo validation |
| Fisherian inference | Works out of the box via `support()`, `likelihood_interval()`, etc. |

These patterns apply to any distribution where you want tighter integration
than `likelihood_name()` provides. See `weibull_uncensored` for another
example with analytical derivatives (but without a closed-form MLE).

## Session Info

```{r session}
sessionInfo()
```

```{r cleanup, include=FALSE}
options(old_opts)
```