README

Overview

TrustworthyMLR is an R package designed to help data scientists, machine learning engineers, and researchers evaluate the trustworthiness of their predictive models. In production environments and academic research alike, it is critical to understand not only how well a model performs, but how reliably it performs under varying conditions.

This package provides two core metrics:

Metric	Purpose	Output
Stability Index	Measures consistency of predictions across multiple training runs or resamples	0–1 (1 = perfectly stable)
Classification Stability	Consistency of predicted classes (labels) adjusted for chance	0–1 (1 = perfect agreement)
Robustness Score	Measures resilience of predictions under small input perturbations	0–1 (1 = perfectly robust)
Visualizations	Decay curves and stability plots for deep diagnostic insights	plots

Stability Index

Measures consistency of predictions across multiple training runs or resamples

0–1 (1 = perfectly stable)

Classification Stability

Consistency of predicted classes (labels) adjusted for chance

0–1 (1 = perfect agreement)

Robustness Score

Measures resilience of predictions under small input perturbations

0–1 (1 = perfectly robust)

Visualizations

Decay curves and stability plots for deep diagnostic insights

plots

Why Trustworthiness Matters

Modern ML pipelines often focus exclusively on accuracy metrics (RMSE, AUC, F1). However, a model that achieves high accuracy on one training run but produces substantially different predictions on another is not reliable for deployment. Similarly, a model whose predictions change dramatically with tiny input perturbations is not robust enough for real-world use.

TrustworthyMLR addresses this gap by providing principled, easy-to-use diagnostics that complement traditional performance metrics. These tools are essential for:

Installation

# install.packages("devtools")
devtools::install_github("your-username/TrustworthyMLR")

Usage

Stability Index

library(TrustworthyMLR)

# Simulate predictions from 5 independent model runs
set.seed(42)
base_predictions <- rnorm(100)
prediction_matrix <- matrix(
  rep(base_predictions, 5) + rnorm(500, sd = 0.1),
  ncol = 5
)

# Compute stability (1 = perfectly consistent)
stability_index(prediction_matrix)
#> [1] 0.9950...

Robustness Score

# Define a prediction function (e.g., wrapping a trained model)
predict_fn <- function(X) X %*% c(1.5, -0.8, 2.3)

# Generate sample input data
set.seed(42)
X <- matrix(rnorm(300), ncol = 3)

# Compute robustness under 5% Gaussian noise
robustness_score(predict_fn, X, noise_level = 0.05, n_rep = 20)
#> [1] 0.9975...

### Visual Diagnostics

Visualize how model performance decays as noise increases:

```r
plot_robustness(predict_fn, X, main = "Robustness Decay Curve")

plot_stability(prediction_matrix, main = "Model Prediction Stability")

### Real-World Workflow Example ```r library(TrustworthyMLR) # Step 1: Train multiple models (e.g., via cross-validation) set.seed(1) n <- 200 p <- 5 X <- matrix(rnorm(n * p), ncol = p) y <- X %*% rnorm(p) + rnorm(n, sd = 0.5) # Collect predictions from 10 bootstrap resamples predictions <- replicate(10, { idx <- sample(n, replace = TRUE) fit <- lm(y[idx] ~ X[idx, ]) predict(fit, newdata = data.frame(X)) }) # Step 2: Assess stability cat("Stability Index:", stability_index(predictions), "\n") # Step 3: Assess robustness model <- lm(y ~ X) pred_fn <- function(newX) { as.numeric(cbind(1, newX) %*% coef(model)) } cat("Robustness Score:", robustness_score(pred_fn, X, noise_level = 0.05), "\n")

Functions Reference

Function	Description
`stability_index()`	Compute the stability of predictions across multiple runs
`robustness_score()`	Compute robustness of a model under input perturbations

stability_index()

Compute the stability of predictions across multiple runs

robustness_score()

Compute robustness of a model under input perturbations

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

TrustworthyMLR