bioLeak is a leakage-aware modeling toolkit for biomedical and machine-learning analyses. Its purpose is to prevent and diagnose information leakage across resampling workflows where training and evaluation data are not truly independent because samples share subjects, batches, studies, or time.
Standard workflows are often insufficient. Random, row-wise cross-validation assumes samples are independent. Global preprocessing (imputation, scaling, feature selection) done before resampling lets test-fold information shape the training process. These choices inflate performance and can lead to incorrect biomarker discovery, misleading clinical signals, or models that fail in deployment.
Data leakage means any direct or indirect use of evaluation data in training or feature engineering. In biomedical data, leakage commonly appears as:
bioLeak addresses these issues with leakage-aware splitting, guarded preprocessing that is fit only on training data, and audit diagnostics that surface overlaps, confounding, and duplicates.
The sections below walk through a leakage-aware workflow from data setup to audits. Each step includes a leaky failure mode and a corrected alternative.
library(bioLeak)
set.seed(123)
n <- 160
subject <- rep(seq_len(40), each = 4)
batch <- sample(paste0("B", 1:6), n, replace = TRUE)
study <- sample(paste0("S", 1:4), n, replace = TRUE)
time <- seq_len(n)
x1 <- rnorm(n)
x2 <- rnorm(n)
x3 <- rnorm(n)
linpred <- 0.7 * x1 - 0.4 * x2 + 0.2 * x3 + rnorm(n, sd = 0.5)
p <- stats::plogis(linpred)
outcome <- factor(ifelse(runif(n) < p, "case", "control"),
levels = c("control", "case"))
df <- data.frame(
subject = subject,
batch = batch,
study = study,
time = time,
outcome = outcome,
x1 = x1,
x2 = x2,
x3 = x3
)
df_leaky <- within(df, {
leak_subject <- ave(as.numeric(outcome == "case"), subject, FUN = mean)
leak_batch <- ave(as.numeric(outcome == "case"), batch, FUN = mean)
leak_global <- mean(as.numeric(outcome == "case"))
})
df_time <- df
df_time$leak_future <- c(as.numeric(df_time$outcome == "case")[-1], 0)
predictors <- c("x1", "x2", "x3")
# Example data (first 6 rows)
head(df)
#> subject batch study time outcome x1 x2 x3
#> 1 1 B3 S4 1 control 0.21444531 1.0149432 -0.09736927
#> 2 1 B6 S3 2 case -0.32468591 -1.9927485 0.21615254
#> 3 1 B3 S1 3 case 0.09458353 -0.4272793 0.88246516
#> 4 1 B2 S3 4 control -0.89536336 0.1166373 0.20559750
#> 5 2 B2 S4 5 control -1.31080153 -0.8932076 -0.61643584
#> 6 2 B6 S2 6 case 1.99721338 0.3339029 -0.73479925
# Outcome class counts
as.data.frame(table(df$outcome))
#> Var1 Freq
#> 1 control 91
#> 2 case 69The table preview displays the metadata columns
(subject, batch, study,
time), the binary outcome, and three numeric predictors
(x1, x2, x3).
The class count table shows the baseline prevalence. Stratified splits try to preserve these proportions (for grouped modes, stratification is applied at the group level).
make_split_plan()make_split_plan() is the foundation. It returns a
LeakSplits object with explicit train/test indices (or
compact fold assignments) and metadata. For data.frame inputs, a unique
row_id column is added automatically, so
group = "row_id" is the explicit way to request sample-wise
CV. It assumes that the grouping columns you provide are complete and
that samples sharing a group must not cross folds. Grouped
stratification uses the majority class per group and is ignored for
study_loocv, time_series, and survival
outcomes. Misuse to avoid:
group = "row_id" (sample-wise CV) when subjects
repeattime_series will always return exactly
v foldsLeaky example: row-wise CV when subjects repeat
leaky_splits <- make_split_plan(
df,
outcome = "outcome",
mode = "subject_grouped",
group = "row_id",
v = 5,
repeats = 1,
stratify = TRUE
)
cat("Leaky splits summary (sample-wise CV):\n")
#> Leaky splits summary (sample-wise CV):
leaky_splits
#> LeakSplits object (mode = subject_grouped, v = 5, repeats = 1)
#> Outcome: outcome | Stratified: TRUE | Nested: FALSE
#> ------------------------------------------------------
#> fold repeat_id train_n test_n
#> 1 1 1 127 33
#> 2 2 1 128 32
#> 3 3 1 128 32
#> 4 4 1 128 32
#> 5 5 1 129 31
#> ------------------------------------------------------
#> Total folds: 5 | Hash: b1c98ff810948d48352dc8426a350983The printed LeakSplits summary reports the split mode,
number of folds, and fold-level training and test set sizes.
Because group = "row_id", each sample is treated as its
own group. As a result, repeated samples from the same subject may
appear in both the training and test sets, which introduces information
leakage.
Leakage-safe alternative: group by subject
safe_splits <- make_split_plan(
df,
outcome = "outcome",
mode = "subject_grouped",
group = "subject",
v = 5,
repeats = 1,
stratify = TRUE,
seed = 10
)
cat("Leakage-safe splits summary (subject-grouped CV):\n")
#> Leakage-safe splits summary (subject-grouped CV):
safe_splits
#> LeakSplits object (mode = subject_grouped, v = 5, repeats = 1)
#> Outcome: outcome | Stratified: TRUE | Nested: FALSE
#> ------------------------------------------------------
#> fold repeat_id train_n test_n
#> 1 1 1 124 36
#> 2 2 1 128 32
#> 3 3 1 128 32
#> 4 4 1 128 32
#> 5 5 1 132 28
#> ------------------------------------------------------
#> Total folds: 5 | Hash: 0b297788da70728ebdae0ff2f83ee0d9Here, each subject is confined to a single fold. Test set sizes are multiples of the number of samples per subject, confirming that subjects were not split across folds.
The fold sizes remain comparable because stratify = TRUE
balances outcome proportions across folds while respecting subject-level
boundaries.
Other leakage-aware modes
batch_splits <- make_split_plan(
df,
outcome = "outcome",
mode = "batch_blocked",
batch = "batch",
v = 4,
stratify = TRUE
)
cat("Batch-blocked splits summary:\n")
#> Batch-blocked splits summary:
batch_splits
#> LeakSplits object (mode = batch_blocked, v = 4, repeats = 1)
#> Outcome: outcome | Stratified: TRUE | Nested: FALSE
#> ------------------------------------------------------
#> fold repeat_id train_n test_n
#> 1 1 1 73 87
#> 2 2 1 138 22
#> 3 3 1 132 28
#> 4 4 1 137 23
#> ------------------------------------------------------
#> Total folds: 4 | Hash: eeacae466015baafa3b81a09cf6150fd
study_splits <- make_split_plan(
df,
outcome = "outcome",
mode = "study_loocv",
study = "study"
)
cat("Study leave-one-out splits summary:\n")
#> Study leave-one-out splits summary:
study_splits
#> LeakSplits object (mode = study_loocv, v = 5, repeats = 1)
#> Outcome: outcome | Stratified: FALSE | Nested: FALSE
#> ------------------------------------------------------
#> fold repeat_id train_n test_n
#> 1 1 1 96 64
#> 2 2 1 121 39
#> 3 3 1 131 29
#> 4 4 1 132 28
#> ------------------------------------------------------
#> Total folds: 4 | Hash: 08feeefcf0e5ceb839fe3f4f198dfab6
time_splits <- make_split_plan(
df,
outcome = "outcome",
mode = "time_series",
time = "time",
v = 4,
horizon = 2
)
cat("Time-series splits summary:\n")
#> Time-series splits summary:
time_splits
#> LeakSplits object (mode = time_series, v = 4, repeats = 1)
#> Outcome: outcome | Stratified: FALSE | Nested: FALSE
#> ------------------------------------------------------
#> fold repeat_id train_n test_n
#> 1 2 1 39 40
#> 2 3 1 79 40
#> 3 4 1 119 40
#> ------------------------------------------------------
#> Total folds: 3 | Hash: a73baf6d3a817601acb7e15d8028b441
nested_splits <- make_split_plan(
df,
outcome = "outcome",
mode = "subject_grouped",
group = "subject",
v = 3,
nested = TRUE,
stratify = TRUE
)
cat("Nested CV splits summary:\n")
#> Nested CV splits summary:
nested_splits
#> LeakSplits object (mode = subject_grouped, v = 3, repeats = 1)
#> Outcome: outcome | Stratified: TRUE | Nested: TRUE
#> ------------------------------------------------------
#> fold repeat_id train_n test_n
#> 1 1 1 104 56
#> 2 2 1 104 56
#> 3 3 1 112 48
#> ------------------------------------------------------
#> Total folds: 3 | Hash: 97e5fbd9cdcc675432bef43384a02337Each summary reports the number of folds and the fold sizes for the corresponding split strategy.
Batch & Study: The test set sizes vary
because batches/studies are not evenly sized. study_loocv
uses one study per fold, so v and repeats are
ignored.
Time series: The training set size increases
across folds while the test set size follows the block size. Early
blocks with no prior training data (or too-small test windows) are
skipped, so you may see fewer than v folds.
Assumptions and intent by mode
subject_grouped: keeps all samples from a subject
togetherbatch_blocked: keeps batches/centers together
(leave-one-group-out when v is at least the number of
batches; otherwise multiple batches per fold)study_loocv: holds out each study in turn for external
validationtime_series: rolling-origin splits with an optional
horizon to prevent lookaheadUse repeats for repeated CV in grouped/batch modes (it
is ignored for study_loocv and time_series),
stratify = TRUE to balance outcome proportions at the group
level when applicable, and nested = TRUE to attach inner
folds (one repeat). For large datasets, progress = TRUE
reports progress; storing explicit indices can be memory intensive.
Handling Large Datasets (Compact Mode)
For large datasets (e.g., \(N >
50,000\)) with many repeats, storing explicit integer indices for
every fold can consume gigabytes of memory. Use
compact = TRUE to store a lightweight vector of fold
assignments instead. fit_resample() will automatically
reconstruct the indices on the fly. Compact mode is not supported when
nested = TRUE (it falls back to full indices). For
time_series compact splits, the time column must be present
in the stored split metadata so folds can be reconstructed.
# Efficient storage for large N
big_splits <- make_split_plan(
df,
outcome = "outcome",
mode = "subject_grouped",
group = "subject",
v = 5,
compact = TRUE # <--- Saves memory
)
cat("Compact-mode splits summary:\n")
#> Compact-mode splits summary:
big_splits
#> LeakSplits object (mode = subject_grouped, v = 5, repeats = 1)
#> Outcome: outcome | Stratified: FALSE | Nested: FALSE
#> ------------------------------------------------------
#> fold repeat_id train_n test_n
#> 1 1 1 128 32
#> 2 2 1 128 32
#> 3 3 1 128 32
#> 4 4 1 128 32
#> 5 5 1 128 32
#> ------------------------------------------------------
#> Total folds: 5 | Hash: 0c2f23413d312968be7b81c660011fc9
cat(sprintf("Compact storage enabled: %s\n", big_splits@info$compact))
#> Compact storage enabled: TRUEThe summary is identical to a regular split, but the underlying
storage is a compact fold-assignment vector. Use the
compact flag to confirm the memory- saving mode is
active.
bioLeak uses guarded preprocessing to prevent leakage
from global imputation, scaling, and feature selection. The low-level
API is:
.guard_fit() to fit preprocessing on training data
onlypredict_guard() to apply the trained guard to new
data.guard_ensure_levels() to align factor levels across
train/testimpute_guarded() as a convenience wrapper for
leakage-safe imputation.guard_fit() fits a pipeline that can winsorize, impute,
normalize, filter, and select features. It one-hot encodes non-numeric
columns and carries factor levels forward to new data. Assumptions and
misuse to avoid:
fs = "ttest" requires a binary outcome with enough
samples per classfs = "lasso" requires glmnetimpute = "mice" is not supported for guarded
predictionimpute = "none" with missing values will trigger a
median fallback and missingness indicators to avoid errorsSupported preprocessing options include imputation
(median, knn, missForest,
none), normalization (zscore,
robust, none), filtering by variance or IQR
(optionally min_keep), and feature selection
(ttest, lasso, pca).
Winsorization is controlled via impute$winsor and
impute$winsor_k in guarded steps; in
impute_guarded(), use winsor and
winsor_thresh. impute_guarded() returns a
LeakImpute object with guarded data, diagnostics, and the
fitted guard state.
Leaky example: global scaling before CV
df_leaky_scaled <- df
df_leaky_scaled[predictors] <- scale(df_leaky_scaled[predictors])
scaled_summary <- data.frame(
feature = predictors,
mean = colMeans(df_leaky_scaled[predictors]),
sd = apply(df_leaky_scaled[predictors], 2, stats::sd)
)
scaled_summary$mean <- round(scaled_summary$mean, 3)
scaled_summary$sd <- round(scaled_summary$sd, 3)
# Leaky global scaling: means ~0 and SDs ~1 computed on all samples
scaled_summary
#> feature mean sd
#> x1 x1 0 1
#> x2 x2 0 1
#> x3 x3 0 1The summary shows that scaling used the full dataset, so test-fold statistics influenced the training transformation. This violates the train/test separation and can bias performance estimates.
Leakage-safe alternative: fit preprocessing on training only
fold1 <- safe_splits@indices[[1]]
train_x <- df[fold1$train, predictors]
test_x <- df[fold1$test, predictors]
guard <- .guard_fit(
X = train_x,
y = df$outcome[fold1$train],
steps = list(
impute = list(method = "median", winsor = TRUE),
normalize = list(method = "zscore"),
filter = list(var_thresh = 0, iqr_thresh = 0),
fs = list(method = "none")
),
task = "binomial"
)
train_x_guarded <- predict_guard(guard, train_x)
test_x_guarded <- predict_guard(guard, test_x)
cat("GuardFit object:\n")
#> GuardFit object:
guard
#> Guarded preprocessing pipeline
#> - Imputation: median
#> - Normalization: zscore
#> - Filter: var>=0, iqr>=0
#> - Feature selection: none
#> - Output features: 3
cat("\nGuardFit summary:\n")
#>
#> GuardFit summary:
summary(guard)
#> GuardFit summary
#> n_train=124, p_out=3, removed_by_filter=0
#> Steps:
#> step action
#> 1 winsor: k=5
#> 2 impute: none (no NA)
#> 3 normalize: zscore
#> 4 filter: var>=0, iqr>=0
#> 5 fs: none
#> hash: bacf825a215c48894b993150135d3249
# Guarded training data (first 6 rows)
head(train_x_guarded)
#> x1 x2 x3
#> 5 -1.4287254 -0.846746034 -0.58336326
#> 6 1.9282279 0.352168157 -0.70450573
#> 7 0.5110636 0.427913734 -0.08735148
#> 8 -1.3683145 -0.006339485 0.36484181
#> 9 -0.7187394 -2.383299658 -1.01654516
#> 10 -1.3015499 2.538309224 -0.14109035
# Guarded test data (first 6 rows)
head(test_x_guarded)
#> x1 x2 x3
#> 1 0.119085787 1.0175596 -0.05210949
#> 2 -0.428021276 -1.9210219 0.26877352
#> 3 -0.002549215 -0.3915238 0.95073054
#> 4 -1.007141255 0.1398948 0.25797066
#> 25 -1.105722181 0.5243484 2.03908390
#> 26 1.601954989 -0.5499985 0.86726497The GuardFit object records the preprocessing steps and
the number of features retained after filtering. The summary reports how
many features were removed and the preprocessing audit trail. The
guarded train/test previews show that missing values were imputed and
(if requested) scaled using training-only statistics; the test data
never influences these values.
Factor level guard (advanced)
raw_levels <- data.frame(
site = c("A", "B", "B"),
status = c("yes", "no", "yes"),
stringsAsFactors = FALSE
)
level_state <- .guard_ensure_levels(raw_levels)
# Aligned factor data with consistent levels
level_state$data
#> site status
#> 1 A yes
#> 2 B no
#> 3 B yes
# Levels map
level_state$levels
#> $site
#> [1] "A" "B"
#>
#> $status
#> [1] "no" "yes"The returned data keeps factor levels consistent across
folds, while the levels list records the training-time
levels (including any dummy levels added to preserve one-hot columns).
Use these when transforming new data to avoid misaligned model
matrices.
Leaky example: imputation using train and test together
train <- data.frame(a = c(1, 2, NA, 4), b = c(NA, 1, 1, 0))
test <- data.frame(a = c(NA, 5), b = c(1, NA))
all_median <- vapply(rbind(train, test),
function(col) median(col, na.rm = TRUE),
numeric(1))
train_leaky <- as.data.frame(Map(function(col, m) { col[is.na(col)] <- m; col },
train, all_median))
test_leaky <- as.data.frame(Map(function(col, m) { col[is.na(col)] <- m; col },
test, all_median))
# Leaky medians computed on train + test
data.frame(feature = names(all_median), median = all_median)
#> feature median
#> a a 3
#> b b 1
# Leaky-imputed training data
train_leaky
#> a b
#> 1 1 1
#> 2 2 1
#> 3 3 1
#> 4 4 0
# Leaky-imputed test data
test_leaky
#> a b
#> 1 3 1
#> 2 5 1The medians above were computed using both train and test data, so the test set influences the imputation values. This is a classic leakage pathway because test information directly alters the training features.
Leakage-safe alternative:
impute_guarded()
imp <- impute_guarded(
train = train,
test = test,
method = "median",
winsor = FALSE
)
# Guarded-imputed training data
imp$train
#> a b
#> 1 1 1
#> 2 2 1
#> 3 2 1
#> 4 4 0
# Guarded-imputed test data
imp$test
#> a b
#> 1 2 1
#> 2 5 1Here, the imputation statistics are computed from the training set
only. The missing value in the test set (column a) is
replaced by 2, which is the training median. In contrast, in the leaky
example above, it was replaced by 3, the global median.
This confirms that the test set is transformed using fixed values
learned from the training data, preserving a clean separation between
training and evaluation. The LeakImpute object also
contains missingness diagnostics in
imp$summary$diagnostics, and guarded outputs use the same
encoding as .guard_fit() (categorical variables are one-hot
encoded). Use vars to impute only a subset of columns when
needed.
fit_resample()fit_resample() combines leakage-aware splits with
guarded preprocessing and model fitting. It supports:
glmnet, ranger)parsnip model specifications or
workflows::workflow objects (recommended)recipes::recipe preprocessing (prepped on training
folds only)rsample resamples
(rset/rsplit) via splits, with
split_cols to map metadatacustom_learners
(legacy/advanced)Assumptions and misuse to avoid:
Surv
column or time/event columns)positive_class must be a single value that exists in
the outcome levelsclass_weights only applies to binomial/multiclass
tasks; workflows must handle weights internallyUse learner_args to pass model-specific arguments. For
custom learners (advanced), you can supply separate fit and
predict argument lists. Set parallel = TRUE to
use future.apply for fold-level parallelism when available. When
learner is a parsnip specification,
learner_args and custom_learners are ignored.
When learner is a workflow, preprocess and
learner_args are ignored. When a recipe or workflow is
used, the built-in guarded preprocessing list is bypassed, so ensure the
recipe/workflow itself is leakage-safe.
Metrics used in this vignette:
Always report the mean and variability across folds, not a single
fold value. When using yardstick metrics, the positive class is the
second factor level; set positive_class to control
this.
Parsnip model specification (recommended)
This spec uses base R glm under the hood and does not
require extra model packages. Use it in the examples below; custom
learners are covered in the Advanced section.
Leaky example: leaky features and row-wise splits
fit_leaky <- fit_resample(
df_leaky,
outcome = "outcome",
splits = leaky_splits,
learner = spec,
metrics = c("auc", "accuracy"),
preprocess = list(
impute = list(method = "median"),
normalize = list(method = "zscore"),
filter = list(var_thresh = 0),
fs = list(method = "none")
)
)
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
cat("Leaky fit summary:\n")
#> Leaky fit summary:
summary(fit_leaky)
#>
#> ===========================
#> bioLeak Model Fit Summary
#> ===========================
#>
#> Task: binomial
#> Outcome: outcome
#> Positive class: case
#> Learners: logistic_reg/glm
#> Total folds: 5
#> Refit performed: Yes
#> Hash: ha16bdf5a
#>
#> Cross-validated metrics (mean ± SD):
#> learner auc_mean auc_sd accuracy_mean accuracy_sd
#> 1 logistic_reg/glm 0.75 0.059 0.687 0.054
#>
#> Audit overview:
#> fold n_train n_test learner features_final
#> 1 127 33 logistic_reg/glm 17
#> 2 128 32 logistic_reg/glm 17
#> 3 128 32 logistic_reg/glm 17
#> 4 128 32 logistic_reg/glm 17
#> 5 129 31 logistic_reg/glm 17
metrics_leaky <- as.data.frame(fit_leaky@metric_summary)
num_cols <- vapply(metrics_leaky, is.numeric, logical(1))
metrics_leaky[num_cols] <- lapply(metrics_leaky[num_cols], round, digits = 3)
# Leaky fit: mean and SD of metrics across folds
metrics_leaky
#> learner auc_mean auc_sd accuracy_mean accuracy_sd
#> 1 logistic_reg/glm 0.75 0.059 0.687 0.054The summary reports cross-validated performance. AUC ranges from 0.5 (random) to 1.0 (perfect), while accuracy is the fraction of correct predictions. Because the splits are leaky, these metrics can be artificially high and should not be used for reporting.
Leakage-safe alternative: grouped splits and clean predictors
fit_safe <- fit_resample(
df,
outcome = "outcome",
splits = safe_splits,
learner = spec,
metrics = c("auc", "accuracy"),
preprocess = list(
impute = list(method = "median"),
normalize = list(method = "zscore"),
filter = list(var_thresh = 0),
fs = list(method = "none")
),
positive_class = "case",
class_weights = c(control = 1, case = 1),
refit = TRUE
)
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
cat("Leakage-safe fit summary:\n")
#> Leakage-safe fit summary:
summary(fit_safe)
#>
#> ===========================
#> bioLeak Model Fit Summary
#> ===========================
#>
#> Task: binomial
#> Outcome: outcome
#> Positive class: case
#> Learners: logistic_reg/glm
#> Total folds: 5
#> Refit performed: Yes
#> Hash: hba29ffa5
#>
#> Cross-validated metrics (mean ± SD):
#> learner auc_mean auc_sd accuracy_mean accuracy_sd
#> 1 logistic_reg/glm 0.713 0.103 0.619 0.126
#>
#> Audit overview:
#> fold n_train n_test learner features_final
#> 1 124 36 logistic_reg/glm 13
#> 2 128 32 logistic_reg/glm 13
#> 3 128 32 logistic_reg/glm 13
#> 4 128 32 logistic_reg/glm 13
#> 5 132 28 logistic_reg/glm 13
metrics_safe <- as.data.frame(fit_safe@metric_summary)
num_cols <- vapply(metrics_safe, is.numeric, logical(1))
metrics_safe[num_cols] <- lapply(metrics_safe[num_cols], round, digits = 3)
# Leakage-safe fit: mean and SD of metrics across folds
metrics_safe
#> learner auc_mean auc_sd accuracy_mean accuracy_sd
#> 1 logistic_reg/glm 0.713 0.103 0.619 0.126
# Per-fold metrics (first 6 rows)
head(fit_safe@metrics)
#> fold learner auc accuracy
#> 1 1 logistic_reg/glm 0.7956656 0.6666667
#> 2 2 logistic_reg/glm 0.8178138 0.7500000
#> 3 3 logistic_reg/glm 0.6078431 0.5625000
#> 4 4 logistic_reg/glm 0.7460317 0.6875000
#> 5 5 logistic_reg/glm 0.6000000 0.4285714fit_resample() returns a LeakFit object.
Use summary(fit_safe) for a compact report and inspect
fit_safe@metrics and fit_safe@predictions for
details. When refit = TRUE, the final model and
preprocessing state are stored in fit_safe@info$final_model
and fit_safe@info$final_preprocess.
The mean +/- SD table is the primary performance summary to report, while the per-fold metrics reveal variability and potential instability across folds.
By default, fit_resample() stores refit inputs in
fit_safe@info$perm_refit_spec to enable
audit_leakage(perm_refit = "auto"). Set
store_refit_data = FALSE to reduce memory and provide
perm_refit_spec manually when you want refit-based
permutations. When multiple learners are passed,
refit = TRUE refits only the first learner; set
refit = FALSE if you do not need a final model.
The examples above use a parsnip model specification. To swap in
other models, replace spec with another parsnip spec (see
the gradient boosting example below).
Multiclass classification (optional)
if (requireNamespace("ranger", quietly = TRUE)) {
set.seed(11)
df_multi <- df
df_multi$outcome3 <- factor(sample(c("A", "B", "C"),
nrow(df_multi), replace = TRUE))
multi_splits <- make_split_plan(
df_multi,
outcome = "outcome3",
mode = "subject_grouped",
group = "subject",
v = 4,
stratify = TRUE,
seed = 11
)
fit_multi <- fit_resample(
df_multi,
outcome = "outcome3",
splits = multi_splits,
learner = "ranger",
metrics = c("accuracy", "macro_f1", "log_loss"),
refit = FALSE
)
cat("Multiclass fit summary:\n")
summary(fit_multi)
} else {
cat("ranger not installed; skipping multiclass example.\n")
}
#> | | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
#> Multiclass fit summary:
#>
#> ===========================
#> bioLeak Model Fit Summary
#> ===========================
#>
#> Task: multiclass
#> Outcome: outcome3
#> Learners: ranger
#> Total folds: 4
#> Refit performed: No
#> Hash: hc88c6471
#>
#> Cross-validated metrics (mean ± SD):
#> learner accuracy_mean accuracy_sd macro_f1_mean macro_f1_sd log_loss_mean
#> 1 ranger 0.288 0.075 0.279 0.069 1.11
#> log_loss_sd
#> 1 0.013
#>
#> Audit overview:
#> fold n_train n_test learner features_final
#> 1 116 44 ranger 14
#> 2 120 40 ranger 14
#> 3 120 40 ranger 14
#> 4 124 36 ranger 14Multiclass fits compute accuracy, macro-F1, and log loss when class
probabilities are available. positive_class is ignored for
multiclass tasks.
Survival outcomes (optional)
if (requireNamespace("survival", quietly = TRUE) &&
requireNamespace("glmnet", quietly = TRUE)) {
set.seed(12)
df_surv <- df
df_surv$time_to_event <- rexp(nrow(df_surv), rate = 0.1)
df_surv$event <- rbinom(nrow(df_surv), 1, 0.7)
surv_splits <- make_split_plan(
df_surv,
outcome = c("time_to_event", "event"),
mode = "subject_grouped",
group = "subject",
v = 4,
stratify = FALSE,
seed = 12
)
fit_surv <- fit_resample(
df_surv,
outcome = c("time_to_event", "event"),
splits = surv_splits,
learner = "glmnet",
metrics = "cindex",
refit = FALSE
)
cat("Survival fit summary:\n")
summary(fit_surv)
} else {
cat("survival or glmnet not installed; skipping survival example.\n")
}
#> | | | 0% | |================== | 25% | |=================================== | 50% | |==================================================== | 75% | |======================================================================| 100%
#> Survival fit summary:
#>
#> ===========================
#> bioLeak Model Fit Summary
#> ===========================
#>
#> Task: survival
#> Outcome: time_to_event
#> Outcome: event
#> Learners: glmnet
#> Total folds: 4
#> Refit performed: No
#> Hash: h041adef3
#>
#> Cross-validated metrics (mean ± SD):
#> learner cindex_mean cindex_sd
#> 1 glmnet 0.5 0
#>
#> Audit overview:
#> fold n_train n_test learner features_final
#> 1 120 40 glmnet 14
#> 2 120 40 glmnet 14
#> 3 120 40 glmnet 14
#> 4 120 40 glmnet 14For survival tasks, supply
outcome = c(time_col, event_col) or a Surv
column; stratify is ignored for time/event outcomes and
class_weights are not used.
Passing learner-specific arguments (optional)
if (requireNamespace("glmnet", quietly = TRUE)) {
fit_glmnet <- fit_resample(
df,
outcome = "outcome",
splits = safe_splits,
learner = "glmnet",
metrics = "auc",
learner_args = list(glmnet = list(alpha = 0.5)),
preprocess = list(
impute = list(method = "median"),
normalize = list(method = "zscore"),
filter = list(var_thresh = 0),
fs = list(method = "none")
)
)
cat("GLMNET summary with learner-specific arguments:\n")
summary(fit_glmnet)
} else {
cat("glmnet not installed; skipping learner_args example.\n")
}
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> GLMNET summary with learner-specific arguments:
#>
#> ===========================
#> bioLeak Model Fit Summary
#> ===========================
#>
#> Task: binomial
#> Outcome: outcome
#> Positive class: case
#> Learners: glmnet
#> Total folds: 5
#> Refit performed: Yes
#> Hash: hba29ffa5
#>
#> Cross-validated metrics (mean ± SD):
#> learner auc_mean auc_sd
#> 1 glmnet 0.713 0.088
#>
#> Audit overview:
#> fold n_train n_test learner features_final
#> 1 124 36 glmnet 13
#> 2 128 32 glmnet 13
#> 3 128 32 glmnet 13
#> 4 128 32 glmnet 13
#> 5 132 28 glmnet 13This summary reflects the same guarded preprocessing but a different
model configuration (here, alpha = 0.5). Use the mean +/-
SD metrics to compare learner settings under identical splits.
SummarizedExperiment inputs (optional)
if (requireNamespace("SummarizedExperiment", quietly = TRUE)) {
se <- SummarizedExperiment::SummarizedExperiment(
assays = list(counts = t(as.matrix(df[, predictors]))),
colData = df[, c("subject", "batch", "study", "time", "outcome"), drop = FALSE]
)
se_splits <- make_split_plan(
se,
outcome = "outcome",
mode = "subject_grouped",
group = "subject",
v = 3
)
se_fit <- fit_resample(
se,
outcome = "outcome",
splits = se_splits,
learner = spec,
metrics = "auc"
)
cat("SummarizedExperiment fit summary:\n")
summary(se_fit)
} else {
cat("SummarizedExperiment not installed; skipping SE example.\n")
}
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> SummarizedExperiment fit summary:
#>
#> ===========================
#> bioLeak Model Fit Summary
#> ===========================
#>
#> Task: binomial
#> Outcome: outcome
#> Positive class: case
#> Learners: logistic_reg/glm
#> Total folds: 3
#> Refit performed: Yes
#> Hash: h3f97c442
#>
#> Cross-validated metrics (mean ± SD):
#> learner auc_mean auc_sd
#> 1 logistic_reg/glm 0.673 0.062
#>
#> Audit overview:
#> fold n_train n_test learner features_final
#> 1 104 56 logistic_reg/glm 3
#> 2 108 52 logistic_reg/glm 3
#> 3 108 52 logistic_reg/glm 3The summary is identical in structure to the data.frame
case because fit_resample() extracts predictors and
metadata from the SummarizedExperiment object.
Note that features_final here reflects only the assay
predictors (x1, x2, x3), because
the assay was constructed without metadata columns.
bioLeak integrates with rsample,
recipes, workflows, and
yardstick:
splits can be an rsample rset/rsplit; use
split_cols (default "auto") or
bioLeak_mode / bioLeak_perm_mode attributes to
map group/batch/study/time metadata.preprocess can be a recipes::recipe
(prepped on training folds only).learner can be a workflows::workflow.metrics can be a
yardstick::metric_set.as_rsample() converts LeakSplits to an
rsample object.library(bioLeak)
library(parsnip)
library(recipes)
library(yardstick)
set.seed(123)
N <- 60
df <- data.frame(
subject = factor(rep(paste0("S", 1:20), length.out = N)),
outcome = factor(rep(c("ClassA", "ClassB"), length.out = N)),
x1 = rnorm(N),
x2 = rnorm(N),
x3 = rnorm(N)
)
spec <- logistic_reg() |> set_engine("glm")
# Use bioLeak's native split planner to avoid conversion errors
set.seed(13)
# Use make_split_plan instead of rsample::group_vfold_cv
# This creates a subject-grouped CV directly compatible with fit_resample
splits <- make_split_plan(
df,
outcome = "outcome",
mode = "subject_grouped",
group = "subject",
v = 3
)
rec <- recipes::recipe(outcome ~ x1 + x2 + x3, data = df) |>
recipes::step_impute_median(recipes::all_numeric_predictors()) |>
recipes::step_normalize(recipes::all_numeric_predictors())
metrics_set <- yardstick::metric_set(yardstick::roc_auc, yardstick::accuracy)
fit_rs <- fit_resample(
df,
outcome = "outcome",
splits = splits,
learner = spec,
preprocess = rec,
metrics = metrics_set,
refit = FALSE
)
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
if (exists("as_rsample", where = asNamespace("bioLeak"), mode = "function")) {
rs_export <- as_rsample(fit_rs@splits, data = df)
print(rs_export)
}
#> # Manual resampling
#> # A tibble: 3 × 2
#> splits id
#> <list> <chr>
#> 1 <split [39/21]> Fold1
#> 2 <split [39/21]> Fold2
#> 3 <split [42/18]> Fold3Use split_cols to ensure split-defining metadata are
dropped from predictors when using rsample inputs.
if (requireNamespace("workflows", quietly = TRUE)) {
wf <- workflows::workflow() |>
workflows::add_model(spec) |>
workflows::add_formula(outcome ~ x1 + x2 + x3)
fit_wf <- fit_resample(
df,
outcome = "outcome",
splits = safe_splits,
learner = wf,
metrics = "auc",
refit = FALSE
)
cat("Workflow fit summary:\n")
summary(fit_wf)
} else {
cat("workflows not installed; skipping workflow example.\n")
}
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> Workflow fit summary:
#>
#> ===========================
#> bioLeak Model Fit Summary
#> ===========================
#>
#> Task: binomial
#> Outcome: outcome
#> Positive class: ClassB
#> Learners: workflow_1
#> Total folds: 5
#> Refit performed: No
#> Hash: hba29ffa5
#>
#> Cross-validated metrics (mean ± SD):
#> learner auc_mean auc_sd
#> 1 workflow_1 0.696 0.055
#>
#> Audit overview:
#> fold n_train n_test learner features_final
#> 1 124 36 workflow_1 3
#> 2 128 32 workflow_1 3
#> 3 128 32 workflow_1 3
#> 4 128 32 workflow_1 3
#> 5 132 28 workflow_1 3When auditing rsample-based fits, pass
perm_mode = "subject_grouped" (or set
attr(rs, "bioLeak_perm_mode") <- "subject_grouped") so
restricted permutations respect the intended split design.
tune_resample()tune_resample() runs leakage-aware nested tuning with
tidymodels. It expects outer splits that include inner folds (use
nested = TRUE in make_split_plan()), and a
parsnip model spec or workflow. Survival tasks are not yet supported.
Use inner_v, inner_repeats, and
inner_seed to control inner resampling when inner folds are
not precomputed.
library(bioLeak)
library(parsnip)
library(recipes)
library(tune)
library(glmnet)
# --- 1. Create Data ---
set.seed(123)
N <- 60
df <- data.frame(
subject = factor(rep(paste0("S", 1:20), length.out = N)),
# FIX: Use sample() to randomize outcome and avoid perfect subject-confounding
outcome = factor(sample(c("ClassA", "ClassB"), N, replace = TRUE)),
x1 = rnorm(N),
x2 = rnorm(N),
x3 = rnorm(N)
)
# --- 2. Generate Nested Splits ---
set.seed(1)
nested_splits <- make_split_plan(
df,
outcome = "outcome",
mode = "subject_grouped",
group = "subject",
v = 3,
nested = TRUE
)
# --- 3. Define Recipe & Model ---
rec <- recipes::recipe(outcome ~ x1 + x2 + x3, data = df) |>
recipes::step_impute_median(recipes::all_numeric_predictors()) |>
recipes::step_normalize(recipes::all_numeric_predictors())
spec_tune <- parsnip::logistic_reg(
penalty = tune::tune(),
mixture = 1,
mode = "classification"
) |>
parsnip::set_engine("glmnet")
# --- 4. Run Tuning ---
tuned <- tune_resample(
df,
outcome = "outcome",
splits = nested_splits,
learner = spec_tune,
preprocess = rec,
inner_v = 2,
grid = 3,
metrics = c("auc", "accuracy"),
seed = 14
)
#> | | | 0% | |======================================================================| 100%
#> | | | 0% | |======================================================================| 100%
#> | | | 0% | |======================================================================| 100%
summary(tuned)
#>
#> ============================
#> bioLeak Tuning Summary
#> ============================
#>
#> Task: binomial
#> Outcome: outcome
#> Positive class: ClassB
#> Tuning Grid: 3
#> Selection Rule: best (Metric: roc_auc)
#> Outer Folds: 3
#>
#> Outer Loop Metrics (mean ± SD):
#> learner accuracy_mean accuracy_sd roc_auc_mean roc_auc_sd
#> 1 logistic_reg/glmnet 0.585 0.053 0.546 0.173
#>
#> Best Parameters (First 5 Folds):
#> penalty fold
#> 0 1
#> 0 2
#> 0 3bioLeak natively supports tidymodels
specifications. You can pass a parsnip model specification
directly to fit_resample(). This allows you to use
state-of-the-art algorithms like XGBoost, LightGBM, or SVMs while
ensuring all preprocessing remains guarded.
if (requireNamespace("parsnip", quietly = TRUE) &&
requireNamespace("xgboost", quietly = TRUE) &&
requireNamespace("recipes", quietly = TRUE)) {
# 1. Define the model spec
xgb_spec <- parsnip::boost_tree(
mode = "classification",
trees = 100,
tree_depth = 6,
learn_rate = 0.01
) |>
parsnip::set_engine("xgboost")
# 2. Define a Recipe (CORRECTED)
# FIX: Define the recipe on data WITHOUT 'subject'.
# This matches the data bioLeak passes (which has subject removed).
df_for_rec <- df[, !names(df) %in% "subject"]
rec_xgb <- recipes::recipe(outcome ~ ., data = df_for_rec) |>
recipes::step_dummy(recipes::all_nominal_predictors()) |>
recipes::step_impute_median(recipes::all_numeric_predictors())
# Note: No need for step_rm(subject) because it's already gone!
# 3. Fit
fit_xgb <- fit_resample(
df,
outcome = "outcome",
splits = nested_splits,
learner = xgb_spec,
metrics = "auc",
preprocess = rec_xgb
)
cat("XGBoost parsnip fit summary:\n")
print(summary(fit_xgb))
} else {
cat("parsnip/xgboost/recipes not installed.\n")
}
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> XGBoost parsnip fit summary:
#>
#> ===========================
#> bioLeak Model Fit Summary
#> ===========================
#>
#> Task: binomial
#> Outcome: outcome
#> Positive class: ClassB
#> Learners: boost_tree/xgboost
#> Total folds: 3
#> Refit performed: Yes
#> Hash: h825b7ff0
#>
#> Cross-validated metrics (mean ± SD):
#> learner auc_mean auc_sd
#> 1 boost_tree/xgboost 0.616 0.058
#>
#> Audit overview:
#> fold n_train n_test learner features_final
#> 1 39 21 boost_tree/xgboost 3
#> 2 39 21 boost_tree/xgboost 3
#> 3 42 18 boost_tree/xgboost 3
#>
#> learner auc_mean auc_sd
#> 1 boost_tree/xgboost 0.6157407 0.05800909The summary reports cross-validated AUC for a non-linear gradient boosting model. Use the mean \(\pm\) SD table to compare against baseline models, and confirm that any gains do not coincide with leakage signals in the audit diagnostics.
Custom learners are used when a model is not available as a parsnip specification or when a lightweight wrapper around base R is needed.
Each custom learner must define fit and
predict components. The fit function must
accept x, y, task, and
weights as inputs, while the predict function
must accept object and newdata.
custom_learners <- list(
glm = list(
fit = function(x, y, task, weights, ...) {
df_fit <- data.frame(y = y, x, check.names = FALSE)
stats::glm(y ~ ., data = df_fit,
family = stats::binomial(), weights = weights)
},
predict = function(object, newdata, task, ...) {
as.numeric(stats::predict(object, newdata = as.data.frame(newdata),
type = "response"))
}
)
)
cat("Custom learner names:\n")
#> Custom learner names:
names(custom_learners)
#> [1] "glm"
cat("Custom learner components (fit/predict):\n")
#> Custom learner components (fit/predict):
lapply(custom_learners, names)
#> $glm
#> [1] "fit" "predict"This output confirms that each custom learner provides both a
fit and a predict function.
Custom learners can be used with fit_resample() as
follows:
bioLeak includes plotting helpers for quick diagnostic checks:
plot_fold_balance() shows class balance per fold.plot_overlap_checks() highlights train/test overlap for
a metadata column.plot_time_acf() checks autocorrelation in time-series
predictions.plot_calibration() shows reliability of binomial
probabilities.plot_confounder_sensitivity() summarizes performance by
batch/study strata.Tabular helpers are also available:
calibration_summary() returns calibration curve data
and ECE/MCE/Brier metrics.confounder_sensitivity() returns per-stratum
performance summaries.Misuse to avoid
plot_overlap_checks() when the column is not
present in the split metadata.plot_time_acf() without a time column or for
non-temporal data.plot_calibration() outside binomial tasks.plot_confounder_sensitivity() with a metric
unsupported by the task.For classification, plot_fold_balance() uses the
positive class recorded in fit@info$positive_class, or
defaults to the second factor level if this is not set. For multiclass
tasks, it shows per-class counts without a proportion line.
if (requireNamespace("ggplot2", quietly = TRUE)) {
plot_fold_balance(fit_safe)
} else {
cat("ggplot2 not installed; skipping fold balance plot.\n")
}The bar chart shows the counts of Positives (blue) and Negatives (tan) in each fold. The dashed blue line represents the proportion of the positive class.
In a well-stratified fit, this line remains relatively stable across folds. Large fluctuations indicate poor stratification, which can lead to unstable fold-level performance estimates.
if (requireNamespace("ggplot2", quietly = TRUE)) {
plot_calibration(fit_safe, bins = 10)
} else {
cat("ggplot2 not installed; skipping calibration plot.\n")
}Calibration curves compare predicted probabilities to observed event
rates. Large deviations from the diagonal indicate miscalibration. Use
min_bin_n to suppress tiny bins and learner =
to select a model when multiple learners are stored in the fit.
if (requireNamespace("ggplot2", quietly = TRUE)) {
plot_confounder_sensitivity(fit_safe, confounders = c("batch", "study"),
metric = "auc", min_n = 3)
} else {
cat("ggplot2 not installed; skipping confounder sensitivity plot.\n")
}Confounder sensitivity plots highlight whether performance varies substantially across batch or study strata.
cal <- calibration_summary(fit_safe, bins = 10, min_bin_n = 5)
conf_tbl <- confounder_sensitivity(fit_safe,
confounders = c("batch", "study"),
metric = "auc",
min_n = 3)
# Calibration metrics
cal$metrics
#> n bins min_bin_n ece mce brier
#> 1 160 10 5 0.06014931 0.6653969 0.2335112
# Confounder sensitivity table (first 6 rows)
head(conf_tbl)
#> confounder level metric direction value n positive_rate
#> 1 batch B1 auc higher 0.7569444 26 0.3076923
#> 2 batch B2 auc higher 0.7348485 23 0.4782609
#> 3 batch B3 auc higher 0.5538462 28 0.4642857
#> 4 batch B4 auc higher 0.7152778 24 0.5000000
#> 5 batch B5 auc higher 0.5446429 22 0.3636364
#> 6 batch B6 auc higher 0.6764706 37 0.4594595if (requireNamespace("ggplot2", quietly = TRUE)) {
plot_overlap_checks(fit_leaky, column = "subject")
plot_overlap_checks(fit_safe, column = "subject")
} else {
cat("ggplot2 not installed; skipping overlap plots.\n")
}The overlap plots compare the number of unique subjects appearing in the training and test sets for each fold.
Top plot (fit_leaky)
The red line shows high overlap counts, accompanied by the explicit
“WARNING: Overlaps detected!” annotation. This confirms that
the same subjects appear in both the training and test sets, indicating
information leakage.
Bottom plot (fit_safe)
The overlap line remains flat at zero across all folds. This confirms
that subject_grouped splitting successfully keeps subjects
isolated, ensuring that no subject appears in both sets
simultaneously.
audit_leakage()audit_leakage() combines four diagnostics in one
object:
Assumptions and misuse to avoid:
coldata (or stored split metadata) must align with
prediction IDs; rownames or a row_id column help
alignmentX_ref must align to prediction IDs (rownames,
row_id, or matching order)plot_perm_distribution() requires
return_perm = TRUEtarget_threshold should be set high enough to flag only
strong proxiestarget_scan_multivariate, *_components,
*_interactions, *_B); proxies outside
X_ref can still pass undetectedperm_mode (or
bioLeak_perm_mode) so restricted permutations respect the
intended designtime_block / block_len as neededInterpretation Note: The label-permutation test
refits models by default when refit inputs are available
(perm_refit = "auto" with
store_refit_data = TRUE) and
B <= perm_refit_auto_max. Otherwise it keeps predictions
fixed, so the p-value quantifies prediction-label association rather
than a full refit null. A large gap indicates a non-random association.
To determine if that signal is real or leaked, check
the Batch Association (confounding), Target
Leakage Scan (proxy features), and Duplicate
Detection (memorization) tables. Use
perm_refit = FALSE to force fixed-prediction shuffles or
perm_refit = TRUE with perm_refit_spec to
always refit.
Use feature_space = "rank" to compare samples by rank
profiles, and sim_method (cosine or
pearson) to control similarity. For large n,
nn_k and max_pairs limit duplicate searches.
Use duplicate_scope = "all" to include within-fold
duplicates (data-quality checks). Use ci_method = "if"
(influence-function) or ci_method = "bootstrap" with
boot_B to obtain a confidence interval for the permutation
gap; include_z controls whether a z-score is reported. Set
perm_stratify = "auto" to stratify permutations only when
outcomes support it.
Leakage-aware audit
X_ref <- df[, predictors]
X_ref[c(1, 5), ] <- X_ref[1, ]
audit <- audit_leakage(
fit_safe,
metric = "auc",
B = 20,
perm_stratify = TRUE,
batch_cols = c("batch", "study"),
X_ref = X_ref,
sim_method = "cosine",
sim_threshold = 0.995,
return_perm = TRUE
)
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
cat("Leakage audit summary:\n")
#> Leakage audit summary:
summary(audit)
#>
#> ==============================
#> bioLeak Leakage Audit Summary
#> ==============================
#>
#> Task: binomial | Outcome: outcome | Splitting mode: subject_grouped | Positive class: case
#> Hash: hba29ffa5 | Folds: 5 | Repeats: 1
#>
#> Label-Permutation Association Test:
#> Method: refit per permutation (auto)
#> Observed metric: 0.656
#> Permuted mean ± SD: 0.544 ± 0.032
#> Gap: 0.112 (larger gap = stronger non-random signal)
#> This test does NOT diagnose information leakage. Use the Batch Association,
#> Target Leakage Scan, and Duplicate Detection sections to check for leakage.
#>
#> Batch / Study Association:
#> batch (repeat 1): χ² = 17.043 (df = 20.000), p = 0.650
#> study (repeat 1): χ² = 9.487 (df = 12.000), p = 0.661
#>
#> Target Leakage Scan: not available.
#>
#> Multivariate Target Scan: not available.
#>
#> No near-duplicates detected.
#>
#> Interpretation:
#> ✓ Strong non-random signal.
if (!is.null(audit@permutation_gap) && nrow(audit@permutation_gap) > 0) {
# Permutation significance results
audit@permutation_gap
}
#> metric_obs perm_mean perm_sd gap z p_value n_perm
#> 1 0.656474 0.54449 0.031651 0.111984 2.611722 0.047619 20
if (!is.null(audit@batch_assoc) && nrow(audit@batch_assoc) > 0) {
# Batch/study association with folds (Cramer's V)
audit@batch_assoc
} else {
cat("No batch or study associations detected.\n")
}
#> batch_col repeat_id stat df pval cramer_v
#> batch batch 1 17.043088 20 0.650174 0.1631865
#> study study 1 9.487022 12 0.660865 0.1405867
if (!is.null(audit@target_assoc) && nrow(audit@target_assoc) > 0) {
# Top features by target association score
head(audit@target_assoc)
} else {
cat("No target leakage scan results available.\n")
}
#> No target leakage scan results available.
if (!is.null(audit@duplicates) && nrow(audit@duplicates) > 0) {
# Top duplicate/near-duplicate pairs by similarity
head(audit@duplicates)
} else {
cat("No near-duplicates detected.\n")
}
#> No near-duplicates detected.The permutation table reports the observed metric, the mean under random label permutation, the gap (difference), and a permutation p-value. For metrics where higher values indicate better performance, larger gaps reflect stronger non-random signal.
The batch association table reports chi-square statistics and Cramer’s V. Large p-values and small V values indicate that folds are not aligned with batch or study labels (which is the desired outcome when these should be independent).
The target scan table lists features with the strongest associations with the outcome. For numeric features, the score is \(|\mathrm{AUC} - 0.5| \times 2\) for classification or the absolute correlation for regression. For categorical features, the score is Cramér’s V or eta-squared. Scores closer to 1 indicate stronger outcome association.
The duplicate table lists pairs of samples with near-identical
profiles that cross train/test folds (by default). In this setup, the
artificially duplicated pair
(X_ref[c(1, 5), ] <- X_ref[1, ]) should appear near the
top of the list. Use duplicate_scope = "all" to include
within-fold duplicates.
if (requireNamespace("ggplot2", quietly = TRUE)) {
plot_perm_distribution(audit)
} else {
cat("ggplot2 not installed; skipping permutation plot.\n")
}The histogram shows the null distribution (gray bars) of the performance metric under random label permutation.
The blue dashed line represents the average performance of a random model (the permuted mean). The red solid line represents the observed performance of the fitted model.
A genuine signal is indicated when the red line lies well to the right of the gray distribution; overlap suggests weak or unstable signal.
Audit per learner
if (requireNamespace("ranger", quietly = TRUE) &&
requireNamespace("parsnip", quietly = TRUE)) {
# 1. Define specs
# Standard GLM (no tuning)
spec_glm <- parsnip::logistic_reg() |>
parsnip::set_engine("glm")
# Random Forest
spec_rf <- parsnip::rand_forest(
mode = "classification",
trees = 100
) |>
parsnip::set_engine("ranger")
# 2. Fit with CORRECT splits
fit_multi <- fit_resample(
df,
outcome = "outcome",
splits = nested_splits, # <--- FIX: Use the valid split object
learner = list(glm = spec_glm, rf = spec_rf),
metrics = "auc"
)
# 3. Run the audit
audits <- audit_leakage_by_learner(fit_multi, metric = "auc", B = 20)
cat("Per-learner audit summary:\n")
print(audits)
} else {
cat("ranger/parsnip not installed.\n")
}
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> Per-learner audit summary:
#> LeakAuditList with 2 learners
#> learner metric_obs perm_mean gap p_value z n_perm batch_p_min
#> glm 0.537 0.609 -0.071 0.857 -0.800 20 NA
#> rf 0.559 0.586 -0.027 0.667 -0.308 20 NA
#> batch_col_min_p batch_v_max batch_col_max_v dup_pairs dup_threshold
#> <NA> NA <NA> NA 0.995
#> <NA> NA <NA> NA 0.995
#> dup_max_sim
#> NA
#> NAThis example uses ranger for the random forest specification; if it is not installed, the code chunk is skipped.
Use parallel_learners = TRUE to audit learners
concurrently when future.apply is available.
The printed table summarizes each learner’s observed metric,
permutation gap, p-value, and key batch/duplicate summaries. Use it to
compare signal strength across models while checking for leakage risks.
Pass learners = to audit a subset, or
mc.cores = to control parallel workers.
HTML audit report
audit_report() accepts either a LeakAudit
or a LeakFit object. When a LeakFit is
provided, it first runs audit_leakage() and then forwards
any additional arguments to the audit step. If multiple learners were
fit, pass learner = via ... to select one. Use
open = TRUE to open the report in a browser after
rendering.
if (requireNamespace("rmarkdown", quietly = TRUE) && rmarkdown::pandoc_available()) {
report_path <- audit_report(audit, output_dir = ".")
cat("HTML report written to:\n", report_path, "\n")
} else {
cat("rmarkdown or pandoc not available; skipping audit report rendering.\n")
}The report path points to a standalone HTML file containing the same audit tables and plots, suitable for sharing with collaborators or archiving as a quality control record.
Time-series data require special handling. Random splits can leak
information from the future into the past. Use
mode = "time_series" with a prediction
horizon, and audit with block permutations. Choose
time_block = "circular" or "stationary"; when
block_len is NULL, the audit uses a default block length
(~10% of the test block size, minimum 5).
Leaky example: lookahead feature
time_splits <- make_split_plan(
df_time,
outcome = "outcome",
mode = "time_series",
time = "time",
v = 4,
horizon = 1
)
cat("Time-series splits summary:\n")
#> Time-series splits summary:
time_splits
#> LeakSplits object (mode = time_series, v = 4, repeats = 1)
#> Outcome: outcome | Stratified: FALSE | Nested: FALSE
#> ------------------------------------------------------
#> fold repeat_id train_n test_n
#> 1 2 1 40 40
#> 2 3 1 80 40
#> 3 4 1 120 40
#> ------------------------------------------------------
#> Total folds: 3 | Hash: 412182f8014315f1879c89587fd76f1e
fit_time_leaky <- fit_resample(
df_time,
outcome = "outcome",
splits = time_splits,
learner = spec,
metrics = "auc"
)
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
cat("Time-series leaky fit summary:\n")
#> Time-series leaky fit summary:
summary(fit_time_leaky)
#>
#> ===========================
#> bioLeak Model Fit Summary
#> ===========================
#>
#> Task: binomial
#> Outcome: outcome
#> Positive class: case
#> Learners: logistic_reg/glm
#> Total folds: 3
#> Refit performed: Yes
#> Hash: h3b18276c
#>
#> Cross-validated metrics (mean ± SD):
#> learner auc_mean auc_sd
#> 1 logistic_reg/glm 0.661 0.021
#>
#> Audit overview:
#> fold n_train n_test learner features_final
#> 1 40 40 logistic_reg/glm 14
#> 2 80 40 logistic_reg/glm 14
#> 3 120 40 logistic_reg/glm 14Time-series splitting trains on growing windows, so performance can
differ from standard CV simply because early folds have smaller training
sets. Regardless of the score, the presence of the
leak_future feature makes this estimate methodologically
invalid.
Leakage-safe alternative: remove lookahead and audit with blocks
df_time_safe <- df_time
df_time_safe$leak_future <- NULL
fit_time_safe <- fit_resample(
df_time_safe,
outcome = "outcome",
splits = time_splits,
learner = spec,
metrics = "auc"
)
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
cat("Time-series safe fit summary:\n")
#> Time-series safe fit summary:
summary(fit_time_safe)
#>
#> ===========================
#> bioLeak Model Fit Summary
#> ===========================
#>
#> Task: binomial
#> Outcome: outcome
#> Positive class: case
#> Learners: logistic_reg/glm
#> Total folds: 3
#> Refit performed: Yes
#> Hash: h3b18276c
#>
#> Cross-validated metrics (mean ± SD):
#> learner auc_mean auc_sd
#> 1 logistic_reg/glm 0.661 0.021
#>
#> Audit overview:
#> fold n_train n_test learner features_final
#> 1 40 40 logistic_reg/glm 13
#> 2 80 40 logistic_reg/glm 13
#> 3 120 40 logistic_reg/glm 13
audit_time <- audit_leakage(
fit_time_safe,
metric = "auc",
B = 20,
time_block = "stationary",
block_len = 5
)
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
#> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100%
cat("Time-series leakage audit summary:\n")
#> Time-series leakage audit summary:
summary(audit_time)
#>
#> ==============================
#> bioLeak Leakage Audit Summary
#> ==============================
#>
#> Task: binomial | Outcome: outcome | Splitting mode: time_series | Positive class: case
#> Hash: h3b18276c | Folds: 3 | Repeats: 1
#>
#> Label-Permutation Association Test:
#> Method: refit per permutation (auto)
#> Observed metric: 0.645
#> Permuted mean ± SD: 0.529 ± 0.032
#> Gap: 0.116 (larger gap = stronger non-random signal)
#> This test does NOT diagnose information leakage. Use the Batch Association,
#> Target Leakage Scan, and Duplicate Detection sections to check for leakage.
#>
#> Batch / Study Association:
#> batch (repeat 1): χ² = 8.237 (df = 10.000), p = 0.606
#> study (repeat 1): χ² = 2.383 (df = 6.000), p = 0.881
#>
#> Target Leakage Scan: not available.
#>
#> Multivariate Target Scan: not available.
#>
#> No near-duplicates detected.
#>
#> Interpretation:
#> ✓ Strong non-random signal.
if (!is.null(audit_time@permutation_gap) && nrow(audit_time@permutation_gap) > 0) {
# Time-series permutation significance results
audit_time@permutation_gap
}
#> metric_obs perm_mean perm_sd gap z p_value n_perm
#> 1 0.645362 0.529196 0.03175 0.116166 2.22862 0.047619 20
if (requireNamespace("ggplot2", quietly = TRUE)) {
plot_time_acf(fit_time_safe, lag.max = 20)
} else {
cat("ggplot2 not installed; skipping ACF plot.\n")
}The safe fit summary provides the leakage-resistant performance
estimate. Compare leaky vs. safe fits to gauge inflation risk;
features_final should drop when the lookahead feature is
removed.
The ACF plot shows the autocorrelation of out-of-fold predictions by
lag. The dashed red lines represent the 95% confidence interval for
white noise; large bars outside the bands indicate residual temporal
dependence. plot_time_acf() requires numeric predictions
and time metadata aligned to the fit.
bioLeak uses the future framework for
parallelism (Windows, macOS, Linux). fit_resample(),
audit_leakage(), audit_leakage_by_learner(),
and simulate_leakage_suite() honor the active plan when
parallel = TRUE (or
parallel_learners = TRUE).
library(future)
# Use multiple cores (works on all OS)
plan(multisession, workers = 4)
# Run a heavy simulation
sim <- simulate_leakage_suite(..., parallel = TRUE)
# Parallel folds or audits
# fit_resample(..., parallel = TRUE)
# audit_leakage(..., parallel = TRUE)
# audit_leakage_by_learner(..., parallel_learners = TRUE)
# Return to sequential processing
plan(sequential)This chunk only configures the parallel plan, so it does not produce a result object. Use it as a template before running compute-heavy functions.
simulate_leakage_suite() runs Monte Carlo simulations
that inject specific leakage mechanisms and then evaluates detection
with the audit pipeline. It is useful for validating your leakage checks
before applying them to real data. Available leakage types include
none, subject_overlap,
batch_confounded, peek_norm, and
lookahead. Use signal_strength,
prevalence, rho, K,
repeats, horizon, and preprocess
to control the simulation. The output is a LeakSimResults
data frame with one row per seed. Metrics are selected automatically
based on outcome type (AUC for binary classification).
if (requireNamespace("glmnet", quietly = TRUE)) {
sim <- simulate_leakage_suite(
n = 80,
p = 6,
mode = "subject_grouped",
learner = "glmnet",
leakage = "subject_overlap",
seeds = 1:2,
B = 20,
parallel = FALSE,
signal_strength = 1
)
# Simulation results (first 6 rows)
head(sim)
} else {
cat("glmnet not installed; skipping simulation suite example.\n")
}
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60%
#> | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> | | | 0% | |============== | 20% | |============================ | 40% | |========================================== | 60% | |======================================================== | 80% | |======================================================================| 100%
#> seed metric_obs gap p_value leakage mode
#> 1 1 0.828914 0.238567 0.047619 subject_overlap subject_grouped
#> 2 2 0.825397 0.210306 0.047619 subject_overlap subject_groupedEach row corresponds to one simulation seed.
metric_obs is the observed performance (AUC for this
simulation). Higher values suggest stronger apparent signal, which can
be inflated by leakage.
gap and p_value indicate that the
leaked signal is statistically distinguishable from random
noise.
Use metric_obs to gauge the magnitude of the leakage
effect, and gap to assess detection sensitivity.
bioLeak uses S4 classes and list-like results to capture provenance and diagnostics:
LeakSplits: returned by make_split_plan(),
printed with show()LeakFit: returned by fit_resample(),
summarized with summary()LeakAudit: returned by audit_leakage(),
summarized with summary()LeakAuditList: returned by
audit_leakage_by_learner(), printed directlyGuardFit: returned by .guard_fit(),
printed and summarizedLeakImpute: returned by impute_guarded()
(guarded data + diagnostics)LeakTune: returned by tune_resample()
(nested tuning summaries)LeakSimResults: returned by
simulate_leakage_suite() (per-seed results)These objects store hashes and metadata that help reproduce and audit
the workflow. Inspect their slots (@metrics,
@predictions, @permutation_gap,
@duplicates) to drill into specific leakage signals.