Title: | Machine Learning Model Explainer |
Version: | 1.0.2 |
Description: | It enables detailed interpretation of complex classification and regression models through Shapley analysis including data-driven characterization of subgroups of individuals. Furthermore, it facilitates multi-measure model evaluation, model fairness, and decision curve analysis. Additionally, it offers enhanced visualizations with interactive elements. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
URL: | https://persimune.github.io/explainer/, https://github.com/PERSIMUNE/explainer |
BugReports: | https://github.com/PERSIMUNE/explainer/issues |
RoxygenNote: | 7.2.1 |
Imports: | cvms, data.table, dplyr, egg, ggplot2, ggpmisc, ggpubr, magrittr, plotly, tibble, tidyr, writexl, gridExtra, scales |
Suggests: | cowplot, mlr3, mlr3learners, knitr, broom, iml, forcats, mlr3viz, plotROC, psych, reshape2, remotes, mlbench, ranger, precrec, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2024-09-23 11:47:47 UTC; rzar0002 |
Author: | Ramtin Zargari Marandi
|
Maintainer: | Ramtin Zargari Marandi <ramtin.zargari.marandi@regionh.dk> |
Repository: | CRAN |
Date/Publication: | 2024-09-30 17:30:02 UTC |
explainer: Machine Learning Model Explainer
Description
It enables detailed interpretation of complex classification and regression models through Shapley analysis including data-driven characterization of subgroups of individuals. Furthermore, it facilitates multi-measure model evaluation, model fairness, and decision curve analysis. Additionally, it offers enhanced visualizations with interactive elements.
Author(s)
Maintainer: Ramtin Zargari Marandi ramtin.zargari.marandi@regionh.dk (ORCID)
See Also
Useful links:
Report bugs at https://github.com/PERSIMUNE/explainer/issues
SHAP clustering
Description
SHAP values are used to cluster data samples using the k-means method to identify subgroups of individuals with specific patterns of feature contributions.
Usage
SHAPclust(
task,
trained_model,
splits,
shap_Mean_wide,
shap_Mean_long,
num_of_clusters = 4,
seed = 246,
subset = 1,
algorithm = "Hartigan-Wong",
iter.max = 1000
)
Arguments
task |
an mlr3 task for binary classification |
trained_model |
an mlr3 trained learner object |
splits |
an mlr3 object defining data splits for train and test sets |
shap_Mean_wide |
the data frame of SHAP values in wide format from eSHAP_plot.R |
shap_Mean_long |
the data frame of SHAP values in long format from eSHAP_plot.R |
num_of_clusters |
number of clusters to make based on SHAP values, default: 4 |
seed |
an integer for reproducibility, Default to 246 |
subset |
what percentage of the instances to use from 0 to 1 where 1 means all |
algorithm |
k-means algorithm character: "Hartigan-Wong", "Lloyd", "Forgy", "MacQueen". |
iter.max |
maximum number of iterations allowed |
Value
A list containing four elements:
shap_plot_onerow |
An interactive plot displaying the SHAP values for each feature, clustered by the specified number of clusters. Each cluster is shown in a facet. |
combined_plot |
A ggplot2 figure combining confusion matrices for each cluster, providing insights into the model's performance within each identified subgroup. |
kmeans_fvals_desc |
A summary table containing statistical descriptions of the clusters based on feature values. |
shap_Mean_wide_kmeans |
A data frame containing clustered SHAP values along with predictions and ground truth information. |
kmeans_info |
Information about the k-means clustering process, including cluster centers and assignment details. |
References
Zargari Marandi, R., 2024. ExplaineR: an R package to explain machine learning models. Bioinformatics advances, 4(1), p.vbae049, https://doi.org/10.1093/bioadv/vbae049.
See Also
Other functions to visualize and interpret machine learning models: eSHAP_plot
.
Examples
library("explainer")
seed <- 246
set.seed(seed)
# Load necessary packages
if (!requireNamespace("mlbench", quietly = TRUE)) stop("mlbench not installed.")
if (!requireNamespace("mlr3learners", quietly = TRUE)) stop("mlr3learners not installed.")
if (!requireNamespace("ranger", quietly = TRUE)) stop("ranger not installed.")
# Load BreastCancer dataset
utils::data("BreastCancer", package = "mlbench")
target_col <- "Class"
positive_class <- "malignant"
mydata <- BreastCancer[, -1]
mydata <- na.omit(mydata)
sex <- sample(
c("Male", "Female"),
size = nrow(mydata),
replace = TRUE
)
mydata$age <- as.numeric(sample(
seq(18, 60),
size = nrow(mydata),
replace = TRUE
))
mydata$sex <- factor(
sex,
levels = c("Male", "Female"),
labels = c(1, 0)
)
maintask <- mlr3::TaskClassif$new(
id = "my_classification_task",
backend = mydata,
target = target_col,
positive = positive_class
)
splits <- mlr3::partition(maintask)
mylrn <- mlr3::lrn(
"classif.ranger",
predict_type = "prob"
)
mylrn$train(maintask, splits$train)
SHAP_output <- eSHAP_plot(
task = maintask,
trained_model = mylrn,
splits = splits,
sample.size = 2, # also 30 or more
seed = seed,
subset = 0.02 # up to 1
)
shap_Mean_wide <- SHAP_output[[2]]
shap_Mean_long <- SHAP_output[[3]]
SHAP_plot_clusters <- SHAPclust(
task = maintask,
trained_model = mylrn,
splits = splits,
shap_Mean_wide = shap_Mean_wide,
shap_Mean_long = shap_Mean_long,
num_of_clusters = 3, # your choice
seed = seed,
subset = 0.02, # match with eSHAP_plot
algorithm = "Hartigan-Wong",
iter.max = 10
)
SHAP Values versus Feature Values
Description
SHAP values in association with feature values
Usage
ShapFeaturePlot(shap_Mean_long)
Arguments
shap_Mean_long |
the data frame containing SHAP values in long format |
Value
an interactive plot of SHAP values in association with feature values
Examples
library("explainer")
seed <- 246
set.seed(seed)
# Load necessary packages
if (!requireNamespace("mlbench", quietly = TRUE)) stop("mlbench not installed.")
if (!requireNamespace("mlr3learners", quietly = TRUE)) stop("mlr3learners not installed.")
if (!requireNamespace("ranger", quietly = TRUE)) stop("ranger not installed.")
# Load BreastCancer dataset
utils::data("BreastCancer", package = "mlbench")
target_col <- "Class"
positive_class <- "malignant"
mydata <- BreastCancer[, -1]
mydata <- na.omit(mydata)
sex <- sample(
c("Male", "Female"),
size = nrow(mydata),
replace = TRUE
)
mydata$age <- as.numeric(sample(
seq(18, 60),
size = nrow(mydata),
replace = TRUE
))
mydata$sex <- factor(
sex,
levels = c("Male", "Female"),
labels = c(1, 0)
)
maintask <- mlr3::TaskClassif$new(
id = "my_classification_task",
backend = mydata,
target = target_col,
positive = positive_class
)
splits <- mlr3::partition(maintask)
mylrn <- mlr3::lrn(
"classif.ranger",
predict_type = "prob"
)
mylrn$train(maintask, splits$train)
SHAP_output <- eSHAP_plot(
task = maintask,
trained_model = mylrn,
splits = splits,
sample.size = 2, # also 30 or more
seed = seed,
subset = 0.02 # up to 1
)
shap_Mean_long <- SHAP_output[[3]]
myplot <- ShapFeaturePlot(shap_Mean_long)
SHAP Partial Plot
Description
Generates an interactive partial dependence plot based on SHAP values, visualizing the marginal effect of one or two features on the predicted outcome of a machine learning model.
Usage
ShapPartialPlot(shap_Mean_long)
Arguments
shap_Mean_long |
data frame containing SHAP values in long format |
Value
an interactive partial dependence plot
Examples
library("explainer")
seed <- 246
set.seed(seed)
# Load necessary packages
if (!requireNamespace("mlbench", quietly = TRUE)) stop("mlbench not installed.")
if (!requireNamespace("mlr3learners", quietly = TRUE)) stop("mlr3learners not installed.")
if (!requireNamespace("ranger", quietly = TRUE)) stop("ranger not installed.")
# Load BreastCancer dataset
utils::data("BreastCancer", package = "mlbench")
target_col <- "Class"
positive_class <- "malignant"
mydata <- BreastCancer[, -1]
mydata <- na.omit(mydata)
sex <- sample(
c("Male", "Female"),
size = nrow(mydata),
replace = TRUE
)
mydata$age <- as.numeric(sample(
seq(18, 60),
size = nrow(mydata),
replace = TRUE
))
mydata$sex <- factor(
sex,
levels = c("Male", "Female"),
labels = c(1, 0)
)
maintask <- mlr3::TaskClassif$new(
id = "my_classification_task",
backend = mydata,
target = target_col,
positive = positive_class
)
splits <- mlr3::partition(maintask)
mylrn <- mlr3::lrn(
"classif.ranger",
predict_type = "prob"
)
mylrn$train(maintask, splits$train)
SHAP_output <- eSHAP_plot(
task = maintask,
trained_model = mylrn,
splits = splits,
sample.size = 2, # also 30 or more
seed = seed,
subset = 0.02 # up to 1
)
shap_Mean_long <- SHAP_output[[3]]
myplot <- ShapPartialPlot(shap_Mean_long)
Enhanced Confusion Matrix Plot
Description
This function generates an enhanced confusion matrix plot using the CVMS package. The plot includes visualizations of sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).
Usage
eCM_plot(task, trained_model, splits, add_sums = TRUE, palette = "Green")
Arguments
task |
mlr3 task object specifying the task details |
trained_model |
mlr3 trained learner (model) object obtained after training |
splits |
mlr3 object defining data splits for train and test sets |
add_sums |
logical, indicating whether total numbers should be displayed in the plot (default: TRUE) |
palette |
character, the color palette for the confusion matrix (default: "Green") |
Value
A confusion matrix plot visualizing sensitivity, specificity, PPV, and NPV
Examples
library("explainer")
seed <- 246
set.seed(seed)
# Load necessary packages
if (!requireNamespace("mlbench", quietly = TRUE)) stop("mlbench not installed.")
if (!requireNamespace("mlr3learners", quietly = TRUE)) stop("mlr3learners not installed.")
if (!requireNamespace("ranger", quietly = TRUE)) stop("ranger not installed.")
# Load BreastCancer dataset
utils::data("BreastCancer", package = "mlbench")
target_col <- "Class"
positive_class <- "malignant"
mydata <- BreastCancer[, -1]
mydata <- na.omit(mydata)
sex <- sample(
c("Male", "Female"),
size = nrow(mydata),
replace = TRUE
)
mydata$age <- as.numeric(sample(
seq(18, 60),
size = nrow(mydata),
replace = TRUE
))
mydata$sex <- factor(
sex,
levels = c("Male", "Female"),
labels = c(1, 0)
)
maintask <- mlr3::TaskClassif$new(
id = "my_classification_task",
backend = mydata,
target = target_col,
positive = positive_class
)
splits <- mlr3::partition(maintask)
mylrn <- mlr3::lrn(
"classif.ranger",
predict_type = "prob"
)
mylrn$train(maintask, splits$train)
myplot <- eCM_plot(
task = maintask,
trained_model = mylrn,
splits = splits
)
Decision Curve Plot
Description
Decision curve analysis is a statistical method used in medical research to evaluate and compare the clinical utility of different diagnostic or predictive models. It assesses the net benefit of a model across a range of decision thresholds, aiding in the selection of the most informative and practical approach for guiding clinical decisions.
Usage
eDecisionCurve(task, trained_model, splits, seed = 246)
Arguments
task |
mlr3 task object specifying the task details |
trained_model |
mlr3 trained learner (model) object obtained after training |
splits |
mlr3 object defining data splits for train and test sets |
seed |
numeric, seed for reproducibility (default: 246) |
Value
An interactive decision curve plot
Examples
library("explainer")
seed <- 246
set.seed(seed)
# Load necessary packages
if (!requireNamespace("mlbench", quietly = TRUE)) stop("mlbench not installed.")
if (!requireNamespace("mlr3learners", quietly = TRUE)) stop("mlr3learners not installed.")
if (!requireNamespace("ranger", quietly = TRUE)) stop("ranger not installed.")
# Load BreastCancer dataset
utils::data("BreastCancer", package = "mlbench")
target_col <- "Class"
positive_class <- "malignant"
mydata <- BreastCancer[, -1]
mydata <- na.omit(mydata)
sex <- sample(
c("Male", "Female"),
size = nrow(mydata),
replace = TRUE
)
mydata$age <- as.numeric(sample(
seq(18, 60),
size = nrow(mydata),
replace = TRUE
))
mydata$sex <- factor(
sex,
levels = c("Male", "Female"),
labels = c(1, 0)
)
maintask <- mlr3::TaskClassif$new(
id = "my_classification_task",
backend = mydata,
target = target_col,
positive = positive_class
)
splits <- mlr3::partition(maintask)
mylrn <- mlr3::lrn(
"classif.ranger",
predict_type = "prob"
)
mylrn$train(maintask, splits$train)
myplot <- eDecisionCurve(
task = maintask,
trained_model = mylrn,
splits = splits,
seed = seed
)
Enhanced Fairness Analysis
Description
This function generates Precision-Recall and ROC curves for sample subgroups, facilitating fairness analysis of a binary classification model.
Usage
eFairness(task, trained_model, splits, target_variable, var_levels)
Arguments
task |
mlr3 binary classification task object specifying the task details |
trained_model |
mlr3 trained learner (model) object obtained after training |
splits |
mlr3 object defining data splits for train and test sets |
target_variable |
character, the variable from the dataset used to test the model's performance against |
var_levels |
list, defining the levels for the specified variable |
Value
Model performance metrics for user-specified subgroups using Precision-Recall and ROC curves
Examples
library("explainer")
seed <- 246
set.seed(seed)
# Load necessary packages
if (!requireNamespace("mlbench", quietly = TRUE)) stop("mlbench not installed.")
if (!requireNamespace("mlr3learners", quietly = TRUE)) stop("mlr3learners not installed.")
if (!requireNamespace("ranger", quietly = TRUE)) stop("ranger not installed.")
# Load BreastCancer dataset
utils::data("BreastCancer", package = "mlbench")
target_col <- "Class"
positive_class <- "malignant"
mydata <- BreastCancer[, -1]
mydata <- na.omit(mydata)
sex <- sample(
c("Male", "Female"),
size = nrow(mydata),
replace = TRUE
)
mydata$age <- as.numeric(sample(
seq(18, 60),
size = nrow(mydata),
replace = TRUE
))
mydata$sex <- factor(
sex,
levels = c("Male", "Female"),
labels = c(1, 0)
)
maintask <- mlr3::TaskClassif$new(
id = "my_classification_task",
backend = mydata,
target = target_col,
positive = positive_class
)
splits <- mlr3::partition(maintask)
mylrn <- mlr3::lrn(
"classif.ranger",
predict_type = "prob"
)
mylrn$train(maintask, splits$train)
# sex is chosen for fairness analysis
Fairness_results <- eFairness(
task = maintask,
trained_model = mylrn,
splits = splits,
target_variable = "sex",
var_levels = c("Male", "Female")
)
Enhanced Performance Evaluation
Description
This function generates Precision-Recall and ROC curves, including threshold information for binary classification models.
Usage
ePerformance(task, trained_model, splits)
Arguments
task |
mlr3 binary classification task object specifying the task details |
trained_model |
mlr3 trained learner (model) object obtained after training |
splits |
mlr3 object defining data splits for train and test sets |
Value
ROC and Precision-Recall curves with threshold information
Examples
# Set environment variables for reproducibility
Sys.setenv(LANG = "en") # Change R language to English!
RNGkind("L'Ecuyer-CMRG") # Change to L'Ecuyer-CMRG instead of the default "Mersenne-Twister"
# Load required libraries
library("explainer")
# Set seed for reproducibility
seed <- 246
set.seed(seed)
# Load necessary packages
if (!requireNamespace("mlbench", quietly = TRUE)) stop("mlbench not installed.")
if (!requireNamespace("mlr3learners", quietly = TRUE)) stop("mlr3learners not installed.")
if (!requireNamespace("ranger", quietly = TRUE)) stop("ranger not installed.")
# Load BreastCancer dataset
utils::data("BreastCancer", package = "mlbench")
# Keep the target column as "Class"
target_col <- "Class"
# Change the positive class to "malignant"
positive_class <- "malignant"
# Keep only the predictor variables and outcome
mydata <- BreastCancer[, -1] # 1 is ID
# Remove rows with missing values
mydata <- na.omit(mydata)
# Create a vector of sex categories
sex <- sample(c("Male", "Female"), size = nrow(mydata), replace = TRUE)
# Create a vector of age categories
mydata$age <- as.numeric(sample(seq(18, 60), size = nrow(mydata), replace = TRUE))
# Add a sex column to the mydata data frame (for fairness analysis)
mydata$sex <- factor(sex, levels = c("Male", "Female"), labels = c(1, 0))
# Create a classification task
maintask <- mlr3::TaskClassif$new(
id = "my_classification_task",
backend = mydata,
target = target_col,
positive = positive_class
)
# Create a train-test split
set.seed(seed)
splits <- mlr3::partition(maintask)
# Add a learner (machine learning model base)
# Here we use random forest for example (you can use any other available model)
mylrn <- mlr3::lrn("classif.ranger", predict_type = "prob")
# Train the model
mylrn$train(maintask, splits$train)
# Make predictions on new data
mylrn$predict(maintask, splits$test)
ePerformance(task = maintask, trained_model = mylrn, splits = splits)
Enhanced ROC and Precision-Recall Plots
Description
This function generates Precision-Recall and ROC curves for binary classification models.
Usage
eROC_plot(task, trained_model, splits)
Arguments
task |
mlr3 binary classification task object specifying the task details |
trained_model |
mlr3 trained learner (model) object obtained after training |
splits |
mlr3 object defining data splits for train and test sets |
Value
ROC and Precision-Recall curves
Examples
library("explainer")
seed <- 246
set.seed(seed)
# Load necessary packages
if (!requireNamespace("mlbench", quietly = TRUE)) stop("mlbench not installed.")
if (!requireNamespace("mlr3learners", quietly = TRUE)) stop("mlr3learners not installed.")
if (!requireNamespace("ranger", quietly = TRUE)) stop("ranger not installed.")
# Load BreastCancer dataset
utils::data("BreastCancer", package = "mlbench")
target_col <- "Class"
positive_class <- "malignant"
mydata <- BreastCancer[, -1]
mydata <- na.omit(mydata)
sex <- sample(
c("Male", "Female"),
size = nrow(mydata),
replace = TRUE
)
mydata$age <- as.numeric(sample(
seq(18, 60),
size = nrow(mydata),
replace = TRUE
))
mydata$sex <- factor(
sex,
levels = c("Male", "Female"),
labels = c(1, 0)
)
maintask <- mlr3::TaskClassif$new(
id = "my_classification_task",
backend = mydata,
target = target_col,
positive = positive_class
)
splits <- mlr3::partition(maintask)
mylrn <- mlr3::lrn(
"classif.ranger",
predict_type = "prob"
)
mylrn$train(maintask, splits$train)
myplot <- eROC_plot(
task = maintask,
trained_model = mylrn,
splits = splits
)
Enhanced SHAP Analysis for Binary Classification Models
Description
The SHAP plot for classification models is a visualization tool that uses the Shapley value, an approach from cooperative game theory, to compute feature contributions for single predictions. The Shapley value fairly distributes the difference of the instance’s prediction and the datasets average prediction among the features. This method is available from the iml package.
Usage
eSHAP_plot(
task,
trained_model,
splits,
sample.size = 30,
seed = 246,
subset = 1
)
Arguments
task |
mlr3 task object for binary classification |
trained_model |
mlr3 trained learner object |
splits |
mlr3 object defining data splits for train and test sets |
sample.size |
numeric, default to 30. The larger the value, the slower but more accurate the estimate of SHAP values |
seed |
numeric, an integer for reproducibility. Default to 246 |
subset |
numeric, what percentage of the instances to use from 0 to 1 where 1 means all |
Value
A list containing:
shap_plot |
An enhanced SHAP plot with user interactive elements. |
shap_Mean_wide |
A matrix of SHAP values. |
shap_Mean |
A data.table with aggregated SHAP values. |
shap |
Raw SHAP values. |
shap_pred_plot |
A plot depicting SHAP values versus predicted probabilities. |
References
Zargari Marandi, R., 2024. ExplaineR: an R package to explain machine learning models. Bioinformatics advances, 4(1), p.vbae049. Molnar C, Casalicchio G, Bischl B. iml: An R package for interpretable machine learning. Journal of Open Source Software. 2018 Jun 27;3(26):786.
See Also
Other classification:
eSHAP_plot_multiclass()
Other SHAP:
eSHAP_plot_multiclass()
Examples
library("explainer")
seed <- 246
set.seed(seed)
# Load necessary packages
if (!requireNamespace("mlbench", quietly = TRUE)) stop("mlbench not installed.")
if (!requireNamespace("mlr3learners", quietly = TRUE)) stop("mlr3learners not installed.")
if (!requireNamespace("ranger", quietly = TRUE)) stop("ranger not installed.")
# Load BreastCancer dataset
utils::data("BreastCancer", package = "mlbench")
target_col <- "Class"
positive_class <- "malignant"
mydata <- BreastCancer[, -1]
mydata <- na.omit(mydata)
sex <- sample(c("Male", "Female"), size = nrow(mydata), replace = TRUE)
mydata$age <- as.numeric(sample(seq(18, 60), size = nrow(mydata), replace = TRUE))
mydata$sex <- factor(sex, levels = c("Male", "Female"), labels = c(1, 0))
maintask <- mlr3::TaskClassif$new(
id = "my_classification_task",
backend = mydata,
target = target_col,
positive = positive_class
)
splits <- mlr3::partition(maintask)
mylrn <- mlr3::lrn("classif.ranger", predict_type = "prob")
mylrn$train(maintask, splits$train)
SHAP_output <- eSHAP_plot(
task = maintask,
trained_model = mylrn,
splits = splits,
sample.size = 2, # also 30 or more
seed = seed,
subset = 0.02 # up to 1
)
Enhanced SHAP Analysis for Multi-Class Classification Models
Description
The SHAP plot for multi-class classification models is a visualization tool that uses the Shapley value to compute feature contributions for single predictions across multiple classes.
Usage
eSHAP_plot_multiclass(
task,
trained_model,
splits,
sample.size = 30,
seed = 246,
subset = 1
)
Arguments
task |
mlr3 task object for multi-class classification |
trained_model |
mlr3 trained learner object |
splits |
mlr3 object defining data splits for train and test sets |
sample.size |
numeric, default to 30. The larger the value, the slower but more accurate the estimate of SHAP values |
seed |
numeric, an integer for reproducibility. Default to 246 |
subset |
numeric, what percentage of the instances to use from 0 to 1 where 1 means all |
Value
A list containing:
combined_plots |
SHAP plot depicting the SHAP values for each class |
shap_data |
A matrix of SHAP values for each class. |
combined_all_classes |
overall SHAP plot depicting the SHAP values for all classes on a single plot |
See Also
Other classification:
eSHAP_plot()
Other SHAP:
eSHAP_plot()
Examples
library("explainer")
seed <- 246
set.seed(seed)
# Load necessary packages and data...
Enhanced SHAP Analysis for Regression Models
Description
The SHAP plot for regression models is a visualization tool that uses the Shapley value, an approach from cooperative game theory, to compute feature contributions for single predictions. The Shapley value fairly distributes the difference of the instance’s prediction and the datasets average prediction among the features. This method is available from the iml package.
Usage
eSHAP_plot_reg(
task,
trained_model,
splits,
sample.size = 30,
seed = 246,
subset = 1
)
Arguments
task |
mlr3 regression task object specifying the task details |
trained_model |
mlr3 trained learner (model) object obtained after training |
splits |
mlr3 object defining data splits for train and test sets |
sample.size |
numeric, number of samples to calculate SHAP values (default: 30) |
seed |
numeric, seed for reproducibility (default: 246) |
subset |
numeric, proportion of the test set to use for visualization (default: 1) |
Value
A list of two objects:
An enhanced SHAP plot with user interactive elements,
A matrix of SHAP values
Examples
library("explainer")
seed <- 246
set.seed(seed)
# Load necessary packages
if (!requireNamespace("mlbench", quietly = TRUE)) stop("mlbench not installed.")
if (!requireNamespace("mlr3learners", quietly = TRUE)) stop("mlr3learners not installed.")
if (!requireNamespace("ranger", quietly = TRUE)) stop("ranger not installed.")
# Load BreastCancer dataset
utils::data("BreastCancer", package = "mlbench")
mydata <- BreastCancer[, -1]
mydata <- na.omit(mydata)
sex <- sample(c("Male", "Female"), size = nrow(mydata), replace = TRUE)
mydata$age <- sample(seq(18, 60), size = nrow(mydata), replace = TRUE)
mydata$sex <- factor(sex, levels = c("Male", "Female"), labels = c(1, 0))
mydata$Class <- NULL
mydata$Cl.thickness <- as.numeric(mydata$Cl.thickness)
target_col <- "Cl.thickness"
maintask <- mlr3::TaskRegr$new(
id = "my_regression_task",
backend = mydata,
target = target_col
)
splits <- mlr3::partition(maintask)
mylrn <- mlr3::lrn("regr.ranger", predict_type = "response")
mylrn$train(maintask, splits$train)
reg_model_outputs <- mylrn$predict(maintask, splits$test)
SHAP_output <- eSHAP_plot_reg(
task = maintask,
trained_model = mylrn,
splits = splits,
sample.size = 2, # also 30 or more
seed = seed,
subset = 0.02 # up to 1
)
myplot <- SHAP_output[[1]]
Data scale to 0 and 1
Description
Scale the data to the range of 0 to 1. It uses the Hampel filter to adjust outliers, followed by min-max normalization.
Usage
range01(x)
Arguments
x |
Vector or array of numbers to be normalized |
Value
Normalized vector
References
Pearson, R. K. (1999). “Data cleaning for dynamic modeling and control”. European Control Conference, ETH Zurich, Switzerland.
See Also
Examples
normalized_vector <- range01(seq(-10:1000))
Regression Model Evaluation
Description
Provides calculations of measures to evaluate regression models.
Usage
regressmdl_eval(task, trained_model, splits)
Arguments
task |
mlr3 regression task object |
trained_model |
mlr3 trained learner (model) object |
splits |
mlr3 object defining data splits for train and test sets |
Value
Data frame containing regression evaluation measures
References
Lang M, Binder M, Richter J, Schratz P, Pfisterer F, Coors S, Au Q, Casalicchio G, Kotthoff L, Bischl B. mlr3: A modern object-oriented machine learning framework in R. Journal of Open Source Software. 2019 Dec 11;4(44):1903.
See Also
Examples
library("explainer")
seed <- 246
set.seed(seed)
# Load necessary packages
if (!requireNamespace("mlbench", quietly = TRUE)) stop("mlbench not installed.")
if (!requireNamespace("mlr3learners", quietly = TRUE)) stop("mlr3learners not installed.")
if (!requireNamespace("ranger", quietly = TRUE)) stop("ranger not installed.")
# Load BreastCancer dataset
utils::data("BreastCancer", package = "mlbench")
mydata <- BreastCancer[, -1]
mydata <- na.omit(mydata)
sex <- sample(
c("Male", "Female"),
size = nrow(mydata),
replace = TRUE
)
mydata$age <- sample(
seq(18, 60),
size = nrow(mydata),
replace = TRUE
)
mydata$sex <- factor(
sex,
levels = c("Male", "Female"),
labels = c(1, 0)
)
mydata$Class <- NULL
mydata$Cl.thickness <- as.numeric(mydata$Cl.thickness)
target_col <- "Cl.thickness"
maintask <- mlr3::TaskRegr$new(
id = "my_regression_task",
backend = mydata,
target = target_col
)
splits <- mlr3::partition(maintask)
mylrn <- mlr3::lrn(
"regr.ranger",
predict_type = "response"
)
mylrn$train(maintask, splits$train)
regressmdl_eval_results <- regressmdl_eval(
task = maintask,
trained_model = mylrn,
splits = splits
)