Contents

library(BenchHub)

1 Overview

The BenchmarkStudy object is designed to encapsulate all necessary components in a benchmarking study, including the data and functions associated. It provides a unified structure for benchmark developers to share their work and for method developers to interact with an existing benchmark study.

This vignette provides a guide for both use cases under the current BenchHub submission workflow.

2 For Benchmark Developer

This section demonstrates how to create a BenchmarkStudy object from a benchmarking study.

2.1 Initialising the Study

We begin by creating an empty BenchmarkStudy object.

study <- BenchmarkStudy$new()
# Download an existing Trio from the submission database
example_trio <- downloadSubmissionTrio("D001", cachePath = tempdir())

example_trio

2.2 Define mapping function and protocol function

A mapping function is a helper function that processes method output into a format that can then be compared with the supporting evidence stored in the reference Trio. There are three ways to contribute the mapping function:

  1. Leave blank: if you don’t want to contribute now
  2. Use existing GitHub repository: if your mapping function has been uploaded to the GitHub repository in the published paper
  3. Upload mapping functions stored in the study object
  4. Upload local mapping function to gist: define the mapping function locally

In this toy spatial transcriptomics example, the Trio contains the following supporting evidence:

  • annotated_domain
  • celltype_proportions

We therefore define two mapping functions that extract those objects from a method result.

Example 1: extract predicted spatial domains.

# Define the mapping function 
extract_domains <- function(result) {
  if (is.data.frame(result) && "annotated_domain" %in% colnames(result)) {
    return(result$annotated_domain)
  }
  if (is.list(result) && "annotated_domain" %in% names(result)) {
    return(result$annotated_domain)
  }
  stop("Could not find 'annotated_domain' in the method output.")
}

# Add the mapping function
study$addMappingFunction(
  name = "annotated_domain",
  func = extract_domains,
  inputDescription = "Method output containing one predicted spatial domain label per spot.",
  outputDescription = "A vector of predicted spatial domain labels aligned to spots.",
  exampleUsage = paste(
    "## Minimal example",
    "#result <- list(annotated_domain = c('domain_1', 'domain_1', 'domain_2', 'domain_2'))",
    "#res <- study$runMapping('annotated_domain', result)",
    "#head(res)",
    sep = "\n"
  )
)

Example 2: extract predicted cell type proportions.

# Define the mapping function 
extract_celltype_props <- function(result) {
  if (is.data.frame(result) && "celltype_proportions" %in% names(result)) {
    return(result$celltype_proportions)
  }
  if (is.list(result) && "celltype_proportions" %in% names(result)) {
    return(result$celltype_proportions)
  }
  if (is.matrix(result) || is.data.frame(result)) {
    mat <- as.matrix(result)
    rs <- rowSums(mat)
    rs[rs == 0] <- 1
    return(mat / rs)
  }
  stop("Could not extract cell type proportions from the method output.")
}

# Add the mapping function, it is optional but recommended to add example usage 
study$addMappingFunction(
  name = "celltype_proportions",
  func = extract_celltype_props,
  inputDescription = "Method output containing cell type proportions per spot.",
  outputDescription = "A matrix or data frame of cell type proportions aligned to spots.",
  exampleUsage = paste(
    "## Minimal example",
    "#props <- matrix(c(0.9, 0.1, 0.8, 0.2, 0.2, 0.8, 0.1, 0.9), ncol = 2, byrow = TRUE)",
    "#study$runMapping('celltype_proportions', props)",
    sep = "\n"
  )
)

Similar as mapping functions, the protocol function is the full workflow of benchmarking study. There are three ways to contribute the protocol function:

  1. Leave blank: if you don’t want to contribute now
  2. Use existing protocol gist URL: if your protocol function has been uploaded to the GitHub repository in the published paper
  3. Upload local protocol file to gist

2.3 Upload Study

Once the BenchmarkStudy object includes:

  1. A study name and description
  2. One or more Trio objects already represented in the submission database
  3. Mapping functions [optional]
  4. Protocol functions [optional]

the recommended next step is to an interactive console workflow via interactivePrepareStudySubmission(study).

# Set name and description manually
study <- BenchmarkStudy$new(name = "ST toy study")
study$description <- "Toy spatial transcriptomics study."

interactivePrepareStudySubmission(study)

In that interactive workflow, BenchHub will guide you through:

  • selecting or confirming dataset IDs to link to the study,
  • entering the study description,
  • optionally providing a protocol gist,
  • optionally providing a mapping-functions gist or uploading mapping functions already stored in the study object,
  • reviewing the submission bundle, and
  • optionally submitting the Study immediately.

3 For Method Developer

This section illustrates how a method developer can use the benchmark study object created by another user, apply their method, and evaluate its performance.

3.1 Loading the Study

A BenchmarkStudy object can be downloaded from the submission database through its studyID.

loaded_study <- downloadSubmissionStudy(studyID = "ST005", cachePath = tempdir())

This returns a populated BenchmarkStudy object. For example:

loaded_study
loaded_study$name
loaded_study$description
loaded_study$version
length(loaded_study$trios)

Inspect the list of available trios, and available mapping functions

Each entry of loaded_study$trios is a Trio object with supporting evidence that can be used for evaluation.

length(loaded_study$trios)

loaded_study$trios[[1]]

This study provides mapping functions to process method outputs into a format that can be used for evaluation.

Each mapping function has documentation.

# list the names of the mapping function
loaded_study$listMappingFunctions()

# choose one to print the documentation 
loaded_study$printMappingFunctionDocumentation("annotated_domain")

3.2 Preparing for evaluation

This benchmark study wants to assess predicted spatial domains and cell type proportions.

Suppose the method developer has run a method and obtained predicted domain labels and cell type proportions for each spot.

method_output <- list(
  annotated_domain = c("domain_1", "domain_1", "domain_2", "domain_2"),
  celltype_proportions = data.frame(
    celltype_A = c(0.9, 0.8, 0.2, 0.1),
    celltype_B = c(0.1, 0.2, 0.8, 0.9)
  )
)

The method developer can apply the mapping functions to the method output to generate the objects required for evaluation.

domain_pred <- loaded_study$runMapping("annotated_domain", method_output)
prop_pred <- loaded_study$runMapping("celltype_proportions", method_output)

3.3 Evaluate

Now we can compare the simulated data against an experimental dataset using the evaluate function.

The evaluate function is in the format of study$evaluate(trio_name, list(supporting evidence = output to compare with)).

In the function below, the names in the list correspond to supporting evidence stored in the reference Trio.

result <- loaded_study$evaluate(loaded_study$trios[[1]]$name,  # name of the Trio to compare with
  list(
    "annotated_domain" = domain_pred,
    "celltype_proportions" = prop_pred
  ))

result

4 Summary

This vignette demonstrated two ways that users can interact with the BenchmarkStudy framework:

5 Session Info

sessionInfo()
## R version 4.6.0 RC (2026-04-17 r89917)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] scuttle_1.21.6              SingleCellExperiment_1.33.2
##  [3] SummarizedExperiment_1.41.1 Biobase_2.71.0             
##  [5] GenomicRanges_1.63.2        Seqinfo_1.1.0              
##  [7] IRanges_2.45.0              S4Vectors_0.49.2           
##  [9] BiocGenerics_0.57.1         generics_0.1.4             
## [11] MatrixGenerics_1.23.0       matrixStats_1.5.0          
## [13] R6_2.6.1                    glmnet_4.1-10              
## [15] Matrix_1.7-5                lubridate_1.9.5            
## [17] forcats_1.0.1               stringr_1.6.0              
## [19] dplyr_1.2.1                 purrr_1.2.2                
## [21] readr_2.2.0                 tidyr_1.3.2                
## [23] tibble_3.3.1                ggplot2_4.0.2              
## [25] tidyverse_2.0.0             BenchHub_0.99.14           
## [27] BiocStyle_2.39.0           
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3     rstudioapi_0.18.0      jsonlite_2.0.0        
##   [4] shape_1.4.6.1          datawizard_1.3.0       magrittr_2.0.5        
##   [7] TH.data_1.1-5          estimability_1.5.1     ggstance_0.3.7        
##  [10] magick_2.9.1           farver_2.1.2           rmarkdown_2.31        
##  [13] fs_2.1.0               vctrs_0.7.3            base64enc_0.1-6       
##  [16] tinytex_0.59           S4Arrays_1.11.1        htmltools_0.5.9       
##  [19] curl_7.0.0             broom_1.0.12           cellranger_1.1.0      
##  [22] SparseArray_1.11.13    Formula_1.2-5          funkyheatmap_0.5.2    
##  [25] googlesheets4_1.1.2    sass_0.4.10            bslib_0.10.0          
##  [28] htmlwidgets_1.6.4      plyr_1.8.9             sandwich_3.1-1        
##  [31] httr2_1.2.2            emmeans_2.0.3          zoo_1.8-15            
##  [34] cachem_1.1.0           lifecycle_1.0.5        iterators_1.0.14      
##  [37] pkgconfig_2.0.3        fastmap_1.2.0          rbibutils_2.4.1       
##  [40] digest_0.6.39          colorspace_2.1-2       patchwork_1.3.2       
##  [43] Hmisc_5.2-5            beachmat_2.27.5        labeling_0.4.3        
##  [46] timechange_0.4.0       abind_1.4-8            httr_1.4.8            
##  [49] polyclip_1.10-7        compiler_4.6.0         gargle_1.6.1          
##  [52] bit64_4.8.0            withr_3.0.2            htmlTable_2.4.3       
##  [55] S7_0.2.1-1             backports_1.5.1        BiocParallel_1.45.0   
##  [58] ggcorrplot_0.1.4.1     performance_0.16.0     ggforce_0.5.0         
##  [61] MASS_7.3-65            DelayedArray_0.37.1    rappdirs_0.3.4        
##  [64] ggsci_5.0.0            tools_4.6.0            foreign_0.8-91        
##  [67] otel_0.2.0             googledrive_2.1.2      nnet_7.3-20           
##  [70] glue_1.8.1             grid_4.6.0             checkmate_2.3.4       
##  [73] cluster_2.1.8.2        reshape2_1.4.5         gtable_0.3.6          
##  [76] tzdb_0.5.0             data.table_1.18.2.1    hms_1.1.4             
##  [79] XVector_0.51.0         utf8_1.2.6             ggrepel_0.9.8         
##  [82] foreach_1.5.2          pillar_1.11.1          vroom_1.7.1           
##  [85] splines_4.6.0          tweenr_2.0.3           splitTools_1.0.1      
##  [88] lattice_0.22-9         survival_3.8-6         bit_4.6.0             
##  [91] dotwhisker_0.8.4       tidyselect_1.2.1       knitr_1.51            
##  [94] gridExtra_2.3          bookdown_0.46          xfun_0.57             
##  [97] stringi_1.8.7          yaml_2.3.12            evaluate_1.0.5        
## [100] codetools_0.2-20       BiocManager_1.30.27    cli_3.6.6             
## [103] rpart_4.1.27           xtable_1.8-8           parameters_0.28.3     
## [106] Rdpack_2.6.6           jquerylib_0.1.4        dichromat_2.0-0.1     
## [109] Rcpp_1.1.1-1           coda_0.19-4.1          survAUC_1.4-0         
## [112] parallel_4.6.0         assertthat_0.2.1       bayestestR_0.17.0     
## [115] marginaleffects_0.32.0 mvtnorm_1.3-7          scales_1.4.0          
## [118] insight_1.5.0          crayon_1.5.3           rlang_1.2.0           
## [121] cowplot_1.2.0          multcomp_1.4-30