Title: | Interactive Assessments of Models |
Version: | 0.1.1 |
Description: | Launch a 'shiny' application for 'tidymodels' results. For classification or regression models, the app can be used to determine if there is lack of fit or poorly predicted points. |
License: | MIT + file LICENSE |
URL: | https://shinymodels.tidymodels.org, https://github.com/tidymodels/shinymodels |
BugReports: | https://github.com/tidymodels/shinymodels/issues |
Depends: | ggplot2, R (≥ 2.10) |
Imports: | dplyr, DT, generics (≥ 0.1.0), glue, htmltools, magrittr, parsnip, plotly, purrr, rlang, scales, shiny, shinydashboard, stats, tidyr, tidyselect, tune, yardstick |
Suggests: | covr, finetune, knitr, markdown, modeldata, rmarkdown, shinytest, spelling, testthat (≥ 3.0.0), vdiffr, withr |
Config/Needs/website: | tidyverse/tidytemplate |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
Language: | en-US |
LazyData: | true |
RoxygenNote: | 7.3.0 |
NeedsCompilation: | no |
Packaged: | 2024-01-31 14:25:03 UTC; simoncouch |
Author: | Max Kuhn |
Maintainer: | Simon Couch <simon.couch@posit.co> |
Repository: | CRAN |
Date/Publication: | 2024-01-31 15:10:05 UTC |
shinymodels: Interactive Assessments of Models
Description
Launch a 'shiny' application for 'tidymodels' results. For classification or regression models, the app can be used to determine if there is lack of fit or poorly predicted points.
Author(s)
Maintainer: Simon Couch simon.couch@posit.co (ORCID)
Authors:
Max Kuhn max@posit.co (ORCID)
Shisham Adhikari shadhikari@ucdavis.edu
Julia Silge julia.silge@posit.co (ORCID)
Other contributors:
Posit Software, PBC [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/tidymodels/shinymodels/issues
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling rhs(lhs)
.
Iterative optimization of neural network
Description
This object has the results when a neural network was tuned using Bayesian optimization and a validation set.
Details
The code used to produce this object:
data(ames) ames <- ames %>% select(Sale_Price, Neighborhood, Longitude, Latitude, Year_Built) %>% mutate(Sale_Price = log10(ames$Sale_Price)) set.seed(1) ames_rs <- validation_split(ames) ames_rec <- recipe(Sale_Price ~ ., data = ames) %>% step_dummy(all_nominal_predictors()) %>% step_zv(all_predictors()) %>% step_normalize(all_predictors()) mlp_spec <- mlp(hidden_units = tune(), penalty = tune(), epochs = tune()) %>% set_mode("regression") set.seed(1) ames_mlp_itr <- mlp_spec %>% tune_bayes( ames_rec, resamples = ames_rs, initial = 5, iter = 4, control = control_bayes(save_pred = TRUE) )
Value
An object with primary class iteration_results
.
Resampled bagged tree results
Description
This object has the results when a bagged regression tree was resampled using 10-fold cross-validation.
Details
The code used to produce this object:
library(tidymodels) library(baguette) tidymodels_prefer() # ------------------------------------------------------------------------------ ctrl_rs <- control_resamples(save_pred = TRUE) # ------------------------------------------------------------------------------ set.seed(1) cars_rs <- vfold_cv(mtcars) cars_bag_vfld <- bag_tree() %>% set_engine("rpart", times = 5) %>% set_mode("regression") %>% fit_resamples( mpg ~ ., resamples = cars_rs, control = ctrl_rs )
Value
An object with primary class resample_results
.
A CART classification tree tuned via racing
Description
This object has the results when a CART classification tree model was tuned over the cost-complexity parameter using racing.
Details
To reduce the object size, a smaller subset of the data were used.
The code used to produce this object:
library(tidymodels) library(finetune) tidymodels_prefer() ctrl_rc <- control_race(save_pred = TRUE) # ------------------------------------------------------------------------------ data(cells) set.seed(1) cells <- cells %>% select(-case) %>% sample_n(200) # ------------------------------------------------------------------------------ set.seed(2) cell_rs <- vfold_cv(cells) # ------------------------------------------------------------------------------ set.seed(3) cell_race <- decision_tree(cost_complexity = tune()) %>% set_mode("classification") %>% tune_race_anova( class ~ ., resamples = cell_rs, grid = tibble(cost_complexity = 10^seq(-2, -1, by = 0.2)), control = ctrl_rc )
Value
An object with primary class tune_race
.
Gets the config and translate to a sentence with the parameter values
Description
This function takes result of organize_data, predictions across all models, and the names of the tuning parameters to return a sentence with the default parameter values.
Usage
display_selected(x, performance, predictions, tuning_param, input)
Arguments
x |
The |
performance |
The dataframe with performance metrics for each candidate model. |
predictions |
The dataframe with predictions across all models. |
tuning_param |
The names of the tuning parameters. |
input |
The DT::datatable object. |
Value
A sentence.
Explore model results
Description
explore()
launches a Shiny application to interact with results from some
tidymodels functions.
To investigate model fit(s), explore()
can be used on objects produced by
The application starts in a new window and allows users to see how predicted values align with the true, observed data. There are 2-3 tabs in the application (depending on the object):
-
Tuning Parameters enables users to choose a specific set of tuning parameters. These results are shown in the Plots tab. The default configuration is based on the optimal value of the first performance metric used during the creation of the object.
-
Plots shows various panels that can visualize how well the model fits. Specific points can be highlighted by clicking on them (as long as the
hover_only = FALSE
option was used). To reset the highlighted points, double on the graph background. -
About gives information on the application as well as links to get help or file bug reports/feature requests.
To quit the Shiny application, use the Esc
key.
Usage
## Default S3 method:
explore(x, ...)
## S3 method for class 'tune_results'
explore(x, hover_cols = NULL, hover_only = FALSE, ...)
Arguments
x |
An object with class |
... |
Other parameters not currently used. |
hover_cols |
The columns to display while hovering in the Shiny app. This argument can be:
|
hover_only |
A logical to determine if interactive highlighting of points is enabled (the default) or not. This can be helpful for very large data sets. |
Details
For resampling methods that produce more than one hold-out prediction per row (e.g. the bootstrap, repeated V-fold cross-validation), the predicted values shown in the plots are averages of the predictions for that specific row.
The ggplot2 theme used in the Shiny application corresponds to the current
theme in the R session. Run ggplot2::theme_set()
to change the theme for
the plots in the Shiny application.
For classification models, there is a toggle on the bottom left of
the application to choose between "Unscaled (i.e. linear)" and
"Logit scaled" probability scaling. The first options plots the raw
probabilities while the logit scaling uses scales::logit_trans()
to rescale
the axis. This can be helpful when a model with a linear predictor is used
(e.g. logistic or multinomial regression) since it can show linear effects
from a feature more easily.
When using the application, there may be warnings printed in the console about "event tied a source ID ... not registered". These can be ignored.
When racing results are explored, the shiny application will only allow tuning parameter combinations that were fully resampled. As a result, parameter combinations that were discarded during the race will not be able to be selected.
Value
A shiny application.
Examples
data(ames_mlp_itr)
if (interactive()) {
explore(ames_mlp_itr, hover_cols = dplyr::contains("tude"))
}
Returns the name of predictions column for the first level variable
Description
This function takes prediction data, the event level, and the outcome name as arguments and returns the predictions column for the first level variable.
Usage
first_class_prob_name(dat, event_level, y_name)
Arguments
dat |
The predictions data frame in the |
event_level |
A single character value for the level corresponding to the event. |
y_name |
The y/response variable for the model. |
Value
A symbol.
Returns the first level of a classification model
Description
This function takes data, event_level
and y_name
, as arguments and
returns the first level in a classification data.
Usage
first_level(dat, event_level = c("first", "second"), y_name)
Arguments
dat |
The predictions data frame in the |
event_level |
A single character value for the level corresponding to the event. |
y_name |
The y/response variable for the model. |
Value
A string.
Returns the hover columns to be displayed in interactive plots
Description
This function takes .hover
argument and returns the output that can
be used as a test aesthetics in a ggplot2::ggplot()
object to customize tooltip.
Usage
format_hover(x, ...)
Arguments
x |
A data frame with columns to be displayed in the hover. |
... |
Arguments passed to |
Value
A character vector.
Extract data from objects to use in a shiny app
Description
This function joins the result of tune::fit_resamples()
to the original
dataset to give a list that can be an input for the Shiny app.
Usage
organize_data(x, hover_cols = NULL, ...)
## Default S3 method:
organize_data(x, hover_cols = NULL, ...)
## S3 method for class 'tune_results'
organize_data(x, hover_cols = NULL, ...)
Arguments
x |
The |
hover_cols |
The columns to display while hovering. |
... |
Other parameters not currently used. |
Details
The default configuration is based on the optimal value of the first metric.
Value
A list with elements data frame and character vectors. The data frame includes
an outcome variable .outcome
, a prediction variable .pred
, model
configuration variable .config
, and hovering columns .hover
.
This function takes result of organize_data to calculate and reformat performance metrics for each candidate model.
Description
This function takes result of organize_data to calculate and reformat performance metrics for each candidate model.
Usage
performance_object(x)
Arguments
x |
The |
Value
A dataframe.
Visualizing the confusion matrix for a classification model
Description
This function plots the confusion matrix for a classification model.
Usage
plot_multiclass_conf_mat(dat)
Arguments
dat |
The predictions data frame in the |
Value
A plotly::ggplotly()
object.
Visualizing predicted probability vs. true class for a multi-class classification model
Description
This function plots the predicted probabilities against the observed class based on tidymodels results for a multi-class classification model.
Usage
plot_multiclass_obs_pred(dat, y_name, prob_bins = 0.05)
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
prob_bins |
The desired |
Value
A plotly::ggplotly()
object.
Visualizing the PR curve for a classification model
Description
This function plots the full precision recall curve.
Usage
plot_multiclass_pr(dat, y_name)
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
Value
A plotly::ggplotly()
object.
Visualizing the predicted probabilities vs. a factor variable for a classification model
Description
This function plots the predicted probabilities against a factor column based on tidymodels results for a multi-class classification model.
Usage
plot_multiclass_pred_factorcol(
dat,
y_name,
factorcol,
alpha = 1,
size = 1,
prob_scaling = FALSE,
prob_eps = 0.001,
source = NULL
)
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
factorcol |
The factor column to plot against the predicted probabilities. |
alpha |
The opacity for the geom points. |
size |
The size for the geom points. |
prob_scaling |
The boolean to turn on or off the logit scale for probability. |
prob_eps |
A small numerical constant to prevent division by zero. |
source |
A character string of length 1 that matches the source argument in event_data(). |
Value
A plotly::ggplotly()
object.
Visualizing the predicted probabilities vs. a numeric column for a classification model
Description
This function plots the predicted probabilities against a numeric column based on tidymodels results for a multi-class classification model.
Usage
plot_multiclass_pred_numcol(
dat,
y_name,
numcol,
alpha = 1,
size = 1,
prob_scaling = FALSE,
prob_eps = 0.001,
source = NULL
)
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
numcol |
The numerical column to plot against the predicted probabilities. |
alpha |
The opacity for the geom points. |
size |
The size for the geom points. |
prob_scaling |
The boolean to turn on or off the logit scale for probability. |
prob_eps |
A small numerical constant to prevent division by zero. |
source |
A character string of length 1 that matches the source argument in event_data(). |
Value
A plotly::ggplotly()
object.
Visualizing the ROC curve for a classification model
Description
This function plots the ROC curve for a classification model.
Usage
plot_multiclass_roc(dat, y_name)
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
Value
A plotly::ggplotly()
object.
Visualizing observed vs. predicted values for a regression model
Description
This function plots the predicted values against the observed values based on tidymodels results for a regression model.
Usage
plot_numeric_obs_pred(dat, y_name, alpha = 1, size = 1, source = NULL)
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
alpha |
The opacity for the geom points. |
size |
The size for the geom points. |
source |
A character string of length 1 that matches the source argument in event_data(). |
Value
A plotly::ggplotly()
object.
Visualizing residuals vs. a factor column for a regression model
Description
This function plots the residuals against a factor column based on tidymodels results for a regression model.
Usage
plot_numeric_res_factorcol(
dat,
y_name,
factorcol,
alpha = 1,
size = 1,
source = NULL
)
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
factorcol |
The factor column to plot against the residuals. |
alpha |
The opacity for the geom points. |
size |
The size for the geom points. |
source |
A character string of length 1 that matches the source argument in event_data(). |
Value
A plotly::ggplotly()
object.
Visualizing residuals vs. a numeric column for a regression model
Description
This function plots the residuals against a numeric column based on tidymodels results for a regression model.
Usage
plot_numeric_res_numcol(
dat,
y_name,
numcol,
alpha = 1,
size = 1,
source = NULL
)
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
numcol |
The numerical column to plot against the residuals. |
alpha |
The opacity for the geom points. |
size |
The size for the geom points. |
source |
A character string of length 1 that matches the source argument in event_data(). |
Value
A plotly::ggplotly()
object.
Visualizing residuals vs. predicted values for a regression model
Description
This function plots the predicted values against the residuals based on tidymodels results for a regression model.
Usage
plot_numeric_res_pred(dat, y_name, size = 1, source = NULL)
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
size |
The size for the geom points. |
source |
A character string of length 1 that matches the source argument in event_data(). |
Value
A plotly::ggplotly()
object.
Visualizing the confusion matrix for a classification model
Description
This function plots the confusion matrix for a classification model.
Usage
plot_twoclass_conf_mat(dat)
Arguments
dat |
The predictions data frame in the |
Value
A plotly::ggplotly()
object.
Visualizing predicted probability vs. true class for a two-class classification model
Description
This function plots the predicted probabilities against the observed class based on tidymodels results for a two-class classification model.
Usage
plot_twoclass_obs_pred(dat, y_name, event_level = "first", prob_bins = 0.05)
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
event_level |
A single character value for the level corresponding to the event. |
prob_bins |
The desired |
Value
A plotly::ggplotly()
object.
Visualizing the PR curve for a classification model
Description
This function plots the full precision recall curve.
Usage
plot_twoclass_pr(dat, y_name, event_level = "first")
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
event_level |
A single character value for the level corresponding to the event. |
Value
A plotly::ggplotly()
object.
Visualizing the predicted probabilities vs. a factor variable for a classification model
Description
This function plots the predicted probabilities against a factor column based on tidymodels results for a two-class classification model.
Usage
plot_twoclass_pred_factorcol(
dat,
y_name,
factorcol,
alpha = 1,
size = 1,
prob_scaling = FALSE,
event_level = "first",
prob_eps = 0.001,
source = NULL
)
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
factorcol |
The factor column to plot against the predicted probabilities. |
alpha |
The opacity for the geom points. |
size |
The size for the geom points. |
prob_scaling |
The boolean to turn on or off the logit scale for probability. |
event_level |
A single character value for the level corresponding to the event. |
prob_eps |
A small numerical constant to prevent division by zero. |
source |
A character string of length 1 that matches the source argument in event_data(). |
Value
A plotly::ggplotly()
object.
Visualizing the predicted probabilities vs. a numeric column for a classification model
Description
This function plots the predicted probabilities against a numeric column based on tidymodels results for a two-class classification model.
Usage
plot_twoclass_pred_numcol(
dat,
y_name,
numcol,
alpha = 1,
size = 1,
prob_scaling = FALSE,
event_level = "first",
prob_eps = 0.001,
source = NULL
)
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
numcol |
The numerical column to plot against the predicted probabilities. |
alpha |
The opacity for the geom points. |
size |
The size for the geom points. |
prob_scaling |
The boolean to turn on or off the logit scale for probability. |
event_level |
A single character value for the level corresponding to the event. |
prob_eps |
A small numerical constant to prevent division by zero. |
source |
A character string of length 1 that matches the source argument in event_data(). |
Value
A plotly::ggplotly()
object.
Visualizing the ROC curve for a classification model
Description
This function plots the ROC curve for a classification model.
Usage
plot_twoclass_roc(dat, y_name, event_level = "first")
Arguments
dat |
The predictions data frame in the |
y_name |
The y/response variable for the model. |
event_level |
A single character value for the level corresponding to the event. |
Value
A plotly::ggplotly()
object.
Returns the class, app type, y name, and the number of rows of an object of
shiny_data
class
Description
This is a print method for a shiny_data class
Usage
## S3 method for class 'shiny_data'
print(x, ...)
Arguments
x |
an object of class shiny_data |
... |
Other parameters not currently used. |
Value
x
invisibly.
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- generics
Tuned flexible discriminant analysis results
Description
This object has the results when a flexible discriminant analysis model was tuned over the interaction degree parameters.
Details
To reduce the object size, five bootstraps were used for resampling and missing data were removed.
The code used to produce this object:
library(tidymodels) library(discrim) tidymodels_prefer() # ------------------------------------------------------------------------------ ctrl_gr <- control_grid(save_pred = TRUE) # ------------------------------------------------------------------------------ data(scat) scat <- scat[complete.cases(scat), ] # ------------------------------------------------------------------------------ set.seed(1) scat_rs <- bootstraps(scat, times = 5) scat_fda_bt <- discrim_flexible(prod_degree = tune()) %>% tune_grid( Species ~ ., resamples = scat_rs, control = ctrl_gr )
Value
An object with primary class tune_results
.
Internal function to run shiny application on an object of shiny_data
class
Description
This function takes the organize_data()
result to shiny_models a Shiny app.
Usage
shiny_models(x, hover_cols = NULL, hover_only = NULL, ...)
## Default S3 method:
shiny_models(x, hover_cols = NULL, hover_only = NULL, ...)
## S3 method for class 'multi_cls_shiny_data'
shiny_models(x, hover_cols = NULL, hover_only = FALSE, ...)
## S3 method for class 'reg_shiny_data'
shiny_models(x, hover_cols = NULL, hover_only = FALSE, ...)
## S3 method for class 'two_cls_shiny_data'
shiny_models(x, hover_cols = NULL, hover_only = FALSE, ...)
Arguments
x |
The |
hover_cols |
The columns to display while hovering in the Shiny app. This argument can be:
|
hover_only |
A logical to determine if interactive highlighting of points is enabled (the default) or not. This can be helpful for very large data sets. |
... |
Other parameters not currently used. |
Value
A shiny application.
Test set results for logistic regression
Description
This object has the results when a logistic regression model is fit to the training set and is evaluated on the test set.
Details
The code used to produce this object:
library(tidymodels) tidymodels_prefer() # ------------------------------------------------------------------------------ set.seed(1) data(two_class_dat) # ------------------------------------------------------------------------------ two_class_split <- initial_split(two_class_dat) # ------------------------------------------------------------------------------ glm_spec <- logistic_reg() two_class_final <- glm_spec %>% last_fit( Class ~ ., split = two_class_split )
Value
An object with primary class last_fit
.