The ggRandomForests package extracts tidy data objects from either randomForestSRC or randomForest fits and feeds them into familiar ggplot2 workflows. This vignette highlights the most common objects— gg_error, gg_variable, and gg_vimp—along with a small helper for building balanced conditioning intervals.
Error trajectories with gg_error()
library(randomForest)set.seed(42)rf_iris <-randomForest(Species ~ ., data = iris, ntree =200, keep.forest =TRUE)err_df <- ggRandomForests::gg_error(rf_iris, training =TRUE)head(err_df)
The gg_error() object stores the cumulative OOB error rate for each outcome column plus the ntree counter. When training = TRUE, the function reconstructs the original model frame and appends the in-bag error trajectory (train). Plotting overlays both curves by default:
Classes 'gg_variable', 'regression' and 'data.frame': 506 obs. of 2 variables:
$ lstat: num 4.98 9.14 4.03 2.94 5.33 ...
$ yhat : num 29.2 22.5 35.1 36.4 33.4 ...
Because the original training data are recovered from the model call, gg_variable() works even when the forest was trained within helper functions or against a subset() expression. The output keeps the raw predictors plus either a continuous yhat column (regression) or per-class probabilities (yhat.<class> for classification). Plotting a single variable is straightforward:
plot(var_df, xvar ="lstat")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Survival forests can request multiple horizons using the time argument; non-OOB predictions are available by setting oob = FALSE.
If a randomForest object lacks stored importance scores, gg_vimp() tries to compute them on the fly. When the forest truly cannot provide the information (for example when importance = FALSE and the predictors are no longer accessible), the function emits a warning and returns NA placeholders so plots still render.