--- title: "Dynamic Failure Rate Distributions" output: rmarkdown::html_vignette: toc: true vignette: > %\VignetteIndexEntry{Dynamic Failure Rate Distributions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction The `flexhaz` package provides a flexible framework for working with survival distributions defined through their **hazard (failure rate) functions**. Instead of choosing from a fixed catalog of distributions, you directly specify the hazard function itself, giving complete flexibility to model systems with complex, time-varying failure patterns. ### Why hazard-based parameterization? Traditional approaches use parametric families (Weibull, exponential, log-normal) that impose strong assumptions on failure rate behavior. The hazard function $h(t)$ provides a more intuitive parameterization for reliability: - **Constant hazard**: Exponential distribution (memoryless failures) - **Increasing hazard**: Wear-out phenomena - **Decreasing hazard**: Burn-in or infant mortality - **Bathtub curve**: Common in electronics and human mortality ### Mathematical background For a lifetime random variable $T$, the instantaneous failure rate (hazard) is: $$ h(t) = \lim_{\Delta t \to 0} \frac{\Pr\{T \leq t + \Delta t | T > t\}}{\Delta t} = \frac{f(t)}{S(t)} $$ The cumulative hazard integrates the instantaneous rate: $$ H(t) = \int_0^t h(u) \, du $$ From this, all other quantities follow: - **Survival function**: $S(t) = \exp(-H(t))$ - **CDF**: $F(t) = 1 - S(t)$ - **PDF**: $f(t) = h(t) \cdot S(t)$ ## Getting Started ```{r setup} library(flexhaz) ``` ### Using Built-in Distributions The package provides convenient constructors for classic survival distributions: ```{r} # Exponential: constant hazard h(t) = lambda exp_dist <- dfr_exponential(lambda = 0.5) # Weibull: power-law hazard h(t) = (k/sigma)(t/sigma)^(k-1) weib_dist <- dfr_weibull(shape = 2, scale = 3) # Gompertz: exponentially increasing hazard h(t) = a*exp(b*t) gomp_dist <- dfr_gompertz(a = 0.01, b = 0.1) # Log-logistic: non-monotonic hazard (increases then decreases) ll_dist <- dfr_loglogistic(alpha = 10, beta = 2) print(exp_dist) is_dfr_dist(exp_dist) ``` ### Creating Custom Distributions For non-standard hazard patterns, use `dfr_dist()` directly: ```{r} # Custom: linear increasing hazard h(t) = a + b*t linear_dist <- dfr_dist( rate = function(t, par, ...) { a <- par[1] b <- par[2] a + b * t }, par = c(a = 0.1, b = 0.01) ) ``` The `rate` function must accept: - `t`: time (scalar or vector) - `par`: parameter vector - `...`: additional arguments See `vignette("custom_distributions")` for detailed guidance on creating optimized custom distributions. ## Distribution Methods All standard distribution functions are available: ### Hazard and cumulative hazard ```{r} h <- hazard(exp_dist) H <- cum_haz(exp_dist) # Evaluate at specific times h(1) # Hazard at t=1 h(c(1, 2, 3)) # Vectorized H(2) # Cumulative hazard at t=2 ``` ### Survival and CDF ```{r} S <- surv(exp_dist) F <- cdf(exp_dist) # Verify S(t) + F(t) = 1 t <- 2 c(survival = S(t), cdf = F(t), sum = S(t) + F(t)) ``` ### PDF (Density) ```{r} # Use stats::density which flexhaz implements as density.dfr_dist pdf_fn <- density(exp_dist) # For exponential: f(t) = lambda * exp(-lambda * t) t <- 1 lambda <- 0.5 c(computed = pdf_fn(t), analytical = lambda * exp(-lambda * t)) ``` ### Quantile function (inverse CDF) ```{r} Q <- inv_cdf(exp_dist) # Median of exponential is log(2)/lambda median_computed <- Q(0.5) median_analytical <- log(2) / 0.5 c(computed = median_computed, analytical = median_analytical) ``` ### Sampling ```{r} samp <- sampler(exp_dist) set.seed(42) samples <- samp(1000) # Compare sample mean to theoretical mean (1/lambda = 2) c(sample_mean = mean(samples), theoretical = 1/0.5) ``` ### Overriding parameters All methods accept a `par` argument to override default parameters: ```{r} h <- hazard(exp_dist) # Use default lambda = 0.5 h(1) # Override with lambda = 2 h(1, par = c(2)) ``` ## Likelihood Model Interface The `dfr_dist` class implements the [likelihood.model](https://CRAN.R-project.org/package=likelihood.model) interface, enabling maximum likelihood estimation with survival data. ### Log-likelihood for survival data For exact observations (failures at known times): $$ \log L_i = \log h(t_i) - H(t_i) $$ For right-censored observations (survived past time $t$): $$ \log L_i = -H(t_i) $$ ### Creating test data ```{r} # Simulate exact failure times from exponential(lambda=1) set.seed(123) true_lambda <- 1 n <- 50 times <- rexp(n, rate = true_lambda) # Create data frame with standard survival format # delta = 1 means exact observation, delta = 0 means censored df_exact <- data.frame(t = times, delta = rep(1, n)) head(df_exact) ``` ### Computing log-likelihood ```{r} dist <- dfr_dist( rate = function(t, par, ...) rep(par[1], length(t)), par = NULL # No default - must be supplied ) ll <- loglik(dist) # Evaluate at different parameter values ll(df_exact, par = c(0.5)) # lambda = 0.5 ll(df_exact, par = c(1.0)) # lambda = 1.0 (true value) ll(df_exact, par = c(2.0)) # lambda = 2.0 ``` The log-likelihood should be highest near the true parameter value. ### Score function (gradient) The score function is the gradient of the log-likelihood with respect to parameters. It's computed numerically by default: ```{r} s <- score(dist) s(df_exact, par = c(1.0)) # Should be close to 0 at MLE ``` ### Hessian of log-likelihood ```{r} H_ll <- hess_loglik(dist) hess <- H_ll(df_exact, par = c(1.0)) hess # Should be negative (concave at maximum) ``` ## Maximum Likelihood Estimation The `fit()` function provides MLE estimation: ```{r, warning=FALSE} solver <- fit(dist) # Find MLE starting from initial guess result <- solver(df_exact, par = c(0.5), method = "BFGS") # Extract fitted parameters (the fisher_mle class from likelihood.model uses coef()) coef(result) # Compare to analytical MLE: lambda_hat = n / sum(t) analytical_mle <- n / sum(times) c(fitted = coef(result), analytical = analytical_mle, true = true_lambda) ``` ### Working with censored data Real survival data often includes censoring. Here's an example with mixed data: ```{r, warning=FALSE} # Some observations are censored (patient still alive at study end) df_mixed <- data.frame( t = c(1, 2, 3, 4, 5, 6, 7, 8), delta = c(1, 1, 1, 0, 0, 1, 1, 0) # 0 = censored ) ll <- loglik(dist) # Fit with censored data solver <- fit(dist) result <- solver(df_mixed, par = c(0.5), method = "BFGS") coef(result) ``` ## Example: Weibull MLE ```{r, warning=FALSE} # Create Weibull DFR weibull <- dfr_dist( rate = function(t, par, ...) { k <- par[1] sigma <- par[2] (k / sigma) * (t / sigma)^(k - 1) } ) # Simulate Weibull data (shape=2, scale=3) set.seed(456) true_shape <- 2 true_scale <- 3 n <- 100 # Use inverse CDF sampling u <- runif(n) weibull_times <- true_scale * (-log(u))^(1/true_shape) df_weibull <- data.frame(t = weibull_times, delta = rep(1, n)) # Fit solver <- fit(weibull) result <- solver(df_weibull, par = c(1.5, 2.5), method = "BFGS") c(fitted_shape = coef(result)[1], true_shape = true_shape) c(fitted_scale = coef(result)[2], true_scale = true_scale) ``` ## Custom Hazard Functions The real power of `flexhaz` is modeling complex, non-standard hazard patterns. ### Bathtub hazard A classic bathtub curve has three phases: 1. **Infant mortality**: High initial failure rate that decreases 2. **Useful life**: Low, relatively constant failure rate 3. **Wear-out**: Increasing failure rate as components age ```{r, fig.alt="Bathtub hazard curve showing three phases: high infant mortality at t=0 that decreases, then a constant useful life period, followed by increasing wear-out hazard."} # h(t) = a * exp(-b*t) + c + d * t^k # Three components: infant mortality + baseline + wear-out bathtub <- dfr_dist( rate = function(t, par, ...) { a <- par[1] # infant mortality magnitude b <- par[2] # infant mortality decay rate c <- par[3] # baseline hazard (useful life) d <- par[4] # wear-out coefficient k <- par[5] # wear-out exponent a * exp(-b * t) + c + d * t^k }, par = c(a = 1, b = 2, c = 0.02, d = 0.001, k = 2) ) # Plot the hazard function t_seq <- seq(0.01, 15, length.out = 200) h <- hazard(bathtub) plot(t_seq, sapply(t_seq, h), type = "l", xlab = "Time", ylab = "Hazard rate", main = "Bathtub hazard curve") ``` ### Time-covariate interaction Hazards can depend on covariates that modify the time effect: ```{r} # Proportional hazards with covariate x # h(t, x) = h0(t) * exp(beta * x) # where h0(t) = Weibull baseline ph_model <- dfr_dist( rate = function(t, par, x = 0, ...) { k <- par[1] sigma <- par[2] beta <- par[3] baseline <- (k / sigma) * (t / sigma)^(k - 1) baseline * exp(beta * x) }, par = c(shape = 2, scale = 3, beta = 0.5) ) h <- hazard(ph_model) # Hazard for different covariate values h(2, x = 0) # Baseline h(2, x = 1) # Higher risk group ``` ## Integration with algebraic.dist The `dfr_dist` class inherits from [algebraic.dist](https://CRAN.R-project.org/package=algebraic.dist) classes, providing access to additional functionality: ```{r} # Support is (0, Inf) for all DFR distributions support <- sup(exp_dist) print(support) # Parameters params(exp_dist) ``` ## Model Diagnostics After fitting a model, it's essential to verify goodness-of-fit. The package provides diagnostic tools based on residual analysis. ### Cox-Snell Residuals Cox-Snell residuals are defined as r_i = H(t_i), the cumulative hazard at each observation time. If the model is correctly specified, these residuals follow an Exp(1) distribution. ```{r, warning=FALSE, fig.alt="Cox-Snell residuals Q-Q plot showing model fit assessment."} # Fit a model and check residuals set.seed(99) test_times <- rexp(80, rate = 0.3) test_df <- data.frame(t = test_times, delta = 1) # Fit exponential exp_fitted <- dfr_exponential() solver <- fit(exp_fitted) fit_result <- solver(test_df, par = c(0.5)) lambda_hat <- coef(fit_result) # Create fitted distribution with estimated parameters exp_final <- dfr_exponential(lambda = lambda_hat) # Q-Q plot of Cox-Snell residuals qqplot_residuals(exp_final, test_df) ``` Points falling along the diagonal indicate good fit. Systematic departures suggest model misspecification. ### Martingale Residuals Martingale residuals (M_i = delta_i - H(t_i)) are useful for identifying individual observations that are poorly fit: ```{r} mart_resid <- residuals(exp_final, test_df, type = "martingale") summary(mart_resid) # Large positive values: failed earlier than expected # Large negative values: survived longer than expected ``` For more detailed diagnostic workflows, see `vignette("reliability_engineering")`. ## Summary The `flexhaz` package provides: 1. **Flexible specification**: Define distributions through hazard functions 2. **Complete distribution interface**: hazard, survival, CDF, PDF, quantiles, sampling 3. **Likelihood model support**: Log-likelihood, score, Hessian for MLE 4. **Censoring support**: Handle exact, right-censored, and left-censored survival data 5. **Numerical integration**: Automatic computation of cumulative hazard 6. **Model diagnostics**: Residuals and Q-Q plots for fit assessment This makes it ideal for: - Custom reliability models - Survival analysis with non-standard hazard patterns - Maximum likelihood estimation of hazard function parameters - Simulation studies with flexible failure distributions ## Next Steps - **`vignette("flexhaz-package")`** - Package introduction and quick start - **`vignette("reliability_engineering")`** - Five real-world case studies - **`vignette("custom_distributions")`** - The three-level optimization paradigm - **`vignette("custom_derivatives")`** - Supplying analytical score and Hessian functions