--- title: "Case study: Using debrief for iterative R performance optimization" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Case study: Using debrief for iterative R performance optimization} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## About this case study This case study was conducted by Claude (an AI assistant) to optimize tidypredict's random forest parsing. Two important notes about the debrief package usage: 1. **No prior knowledge**: Claude had no prior knowledge of the debrief package before this session. The package was not part of Claude's training data. 2. **No documentation lookup**: Claude did not read debrief's documentation, README, or help files. The only information provided was the user's prompt showing the basic usage pattern (`pv_print_debrief(p)`). All interpretation of the output was done from first principles based on the section names and content. 3. **Test-driven verification**: After each optimization, Claude ran the package's existing test suite (`devtools::test()`) to verify the changes didn't break functionality. All 38 relevant tests (ranger: 18, randomForest: 13, partykit: 7) passed after the optimizations. This demonstrates that debrief's output is self-explanatory enough to guide optimization work without requiring prior familiarity with the package. The section names (TOP FUNCTIONS BY SELF-TIME, HOT LINES, MEMORY ALLOCATION) clearly communicate what each part shows, and the "Next steps" suggestions in the output hinted at additional functions available. ## Overview This case study demonstrates how the **debrief** package can guide iterative performance optimization in R. We use tidypredict's random forest parsing as a real-world example, achieving a **331x speedup** through systematic profiling and targeted fixes. ## Problem statement [tidymodels/orbital#83](https://github.com/tidymodels/orbital/issues/83) reported that parsing random forest models in tidypredict takes 45+ seconds for a single tree. We rolled back to the problematic version (commit `0dde432`) to investigate. ### User prompt The user provided the following instructions to guide the profiling session: ````md profile the tidypredict_fit call in the following code. library(tidymodels) library(orbital) rec_spec <- recipe(Sale_Price ~ ., data = ames) |> step_normalize(all_numeric_predictors()) |> step_dummy(all_nominal_predictors()) mod_spec <- rand_forest("regression", trees = 1) wf_spec <- workflow(rec_spec, mod_spec) wf_fit <- fit(wf_spec, ames) rf_tree <- extract_fit_engine(wf_fit) tmp <- tidypredict_fit(rf_tree) I want you to use the profvis and debrief package to iteratively make improvements to the function library(profvis) library(debrief) # Profile some code p <- profvis({ # your code here }) pv_print_debrief(p) ```` This prompt established the workflow: profile with profvis, analyze with `pv_print_debrief()`, optimize, and repeat. ## Iteration 1: Initial profiling ### Running debrief ```{r} p <- profvis({ tmp <- tidypredict_fit(rf_tree) }) pv_print_debrief(p) ``` ### Debrief output (key sections) ``` ## PROFILING SUMMARY Total time: 47340 ms (4734 samples @ 10 ms interval) ### TOP FUNCTIONS BY SELF-TIME 15180 ms ( 32.1%) [.data.frame 5280 ms ( 11.2%) 4220 ms ( 8.9%) sys.call 3950 ms ( 8.3%) [[.data.frame ### HOT LINES (by self-time) 12710 ms ( 26.8%) /R/model-ranger.R:6 row <- tree[tree$nodeID == j, ] 2880 ms ( 6.1%) /R/model-ranger.R:7 lc <- row["leftChild"][[1]] == find 2580 ms ( 5.4%) /R/model-ranger.R:8 lr <- row["rightChild"][[1]] == find ### MEMORY ALLOCATION (by line) 15655.00 MB /R/model-ranger.R:6 row <- tree[tree$nodeID == j, ] 3450.47 MB /R/model-ranger.R:7 lc <- row["leftChild"][[1]] == find ``` ### How I interpreted this output 1. **TOP FUNCTIONS BY SELF-TIME** showed `[.data.frame` at 32.1% - this is the data.frame subsetting operator, suggesting repeated data.frame operations are the culprit. 2. **HOT LINES** pinpointed the exact line: `row <- tree[tree$nodeID == j, ]` consuming 26.8% of total time. This single line was the primary bottleneck. 3. **MEMORY ALLOCATION** revealed this same line was allocating 15.6 GB - explaining the high GC time (11.2%). Each subsetting operation creates a new data.frame copy. 4. **The pattern**: Line 6 is inside a loop (indicated by the variable `j`). Filtering a data.frame with `tree[tree$nodeID == j, ]` is O(n) per call. In a loop, this becomes O(n^2). ### Optimization applied Pre-extract columns as vectors and use direct indexing: ```{r} # Before (O(n) per lookup): row <- tree[tree$nodeID == j, ] lc <- row["leftChild"][[1]] == find # After (O(1) per lookup): idx <- j + 1L lc <- left_child[idx] == find ``` ### Result | Metric | Before | After | |:--------|----------:|----------:| | Time | 47,340 ms | 280 ms | | Memory | ~18 GB | ~20 MB | | Speedup | - | **169x** | ## Iteration 2: Finding the next bottleneck ### Debrief output after first optimization ``` ## PROFILING SUMMARY Total time: 290 ms (29 samples @ 10 ms interval) ### TOP FUNCTIONS BY SELF-TIME 60 ms ( 20.7%) is.na 20 ms ( 6.9%) .f 20 ms ( 6.9%) if (lc || lr) { 20 ms ( 6.9%) lc <- left_child[idx] == find ### HOT LINES (by self-time) 60 ms ( 20.7%) /R/model-ranger.R:16 if (is.na(lc)) { 20 ms ( 6.9%) /R/model-ranger.R:14 lc <- left_child[idx] == find 20 ms ( 6.9%) /R/model-ranger.R:22 if (lc || lr) { ``` ### How I interpreted this output 1. **TOP FUNCTIONS BY SELF-TIME** now showed `is.na` as the top consumer at 20.7%. The bottleneck shifted from data.frame operations to NA checking. 2. **HOT LINES** showed the `is.na` checks on lines 16 and 19 were being called repeatedly in the loop. 3. **The insight**: The code was searching for parent nodes by iterating from `node_id` down to 0, checking each potential parent. This linear search could be replaced with a direct lookup. ### Optimization applied Pre-compute parent relationships once, then traverse directly: ```{r} # Before: Linear search for parent for (j in node_id:0) { idx <- j + 1L lc <- left_child[idx] == find lr <- right_child[idx] == find if (is.na(lc)) lc <- FALSE if (is.na(lr)) lr <- FALSE if (lc || lr) { ... } } # After: Direct traversal using parent lookup while (!is.na(parent[current + 1L])) { current <- parent[current + 1L] path <- c(path, current) } ``` ### Result | Metric | Before | After | |:--------|-------:|:---------------| | Time | 290 ms | 160 ms | | Speedup | - | **45% faster** | ## Iteration 3: Diminishing returns ### Debrief output (using 10 trees for better sampling) ``` ## PROFILING SUMMARY Total time: 1650 ms (165 samples @ 10 ms interval) ### TOP FUNCTIONS BY SELF-TIME 280 ms ( 17.0%) 200 ms ( 12.1%) enexpr 110 ms ( 6.7%) reduce_impl ### HOT LINES (by self-time) 60 ms ( 3.6%) /R/model-rf.R:80 if (.x$op == "less") i <- expr(!!sym(.x$col) < !!.x$val) 30 ms ( 1.8%) /R/model-rf.R:78 if (.x$op == "more") i <- expr(!!sym(.x$col) > !!.x$val) ### TOP FUNCTIONS BY TOTAL TIME 1020 ms ( 61.8%) path_formulas 500 ms ( 30.3%) expr ``` ### How I interpreted this output 1. **TOP FUNCTIONS BY TOTAL TIME** showed `path_formulas` at 61.8% - the bottleneck shifted to formula building. 2. **HOT LINES** showed multiple if-statements checking `.x$op` repeatedly (lines 78-81). 3. **TOP FUNCTIONS BY SELF-TIME** showed `enexpr` (expression building) and `reduce_impl` as top consumers - these are doing useful work. 4. **The insight**: The if-chain could be replaced with `switch()`, but we're now optimizing code that's doing inherent work. Further gains will be marginal. ### Optimization applied Replace if-chain with switch statement: ```{r} # Before: if (.x$op == "more") i <- expr(!!sym(.x$col) > !!.x$val) if (.x$op == "more-equal") i <- expr(!!sym(.x$col) >= !!.x$val) if (.x$op == "less") i <- expr(!!sym(.x$col) < !!.x$val) if (.x$op == "less-equal") i <- expr(!!sym(.x$col) <= !!.x$val) # After: switch( .x$op, "more" = expr(!!col_sym > !!val), "more-equal" = expr(!!col_sym >= !!val), "less" = expr(!!col_sym < !!val), "less-equal" = expr(!!col_sym <= !!val) ) ``` ### Result Minor improvement (~4%). The remaining time is in expression building which is inherent to the task. ## Why I only used `pv_print_debrief()` The debrief package offers several functions, and `pv_print_debrief()` suggests others at the end of its output: ``` ### Next steps pv_source_context(p, "/R/model-ranger.R") pv_suggestions(p) pv_focus(p, ".f") pv_help() ``` ### Functions I didn't use | Function | Purpose | Why I skipped it | |:----------------------|:-------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------| | `pv_source_context()` | Shows more code context around hot lines | I was already reading full source files directly. Once HOT LINES said "line 6 in model-ranger.R", I just read that file. | | `pv_suggestions()` | Provides automated optimization suggestions| The bottlenecks were obvious enough (32% in `[.data.frame`, 15.6 GB on one line) that I could diagnose the issue myself. | | `pv_focus()` | Filters analysis to a specific function | The initial output was already clear about where problems were. No need to drill down. | ### When `pv_print_debrief()` alone is sufficient The single function was enough because: 1. **HOT LINES gave exact locations** - File path and line number meant I could go straight to the problem code 2. **MEMORY ALLOCATION correlated with time** - The same lines appeared in both sections, confirming the diagnosis 3. **Patterns were recognizable** - Data.frame subsetting in a loop is a known anti-pattern ### When to use the other functions I would reach for additional functions if: - **Diffuse profile** - Time spread across many functions without a clear culprit - **Complex call paths** - Need to understand how a bottleneck is reached from different entry points - **Unfamiliar codebase** - `pv_source_context()` would help see surrounding code without switching tools - **Stuck on diagnosis** - `pv_suggestions()` might catch patterns I missed ## Final results | Stage | Time | Speedup | Key debrief insight | |:---------------------|-----------:|---------:|:-----------------------| | Original | 47,340 ms | 1x | `[.data.frame` at 32% | | After vectorization | 280 ms | 169x | `is.na` now at 20% | | After parent lookup | 160 ms | 296x | `path_formulas` at 60% | | After switch | **143 ms** | **331x** | Diminishing returns |