library(BenchHub)
The BenchmarkStudy object is designed to encapsulate all necessary components in a benchmarking study, including the data and functions associated. It provides a unified structure for benchmark developers to share their work and for method developers to interact with an existing benchmark study.
This vignette provides a guide for both use cases under the current BenchHub submission workflow.
This section demonstrates how to create a BenchmarkStudy object from a benchmarking study.
We begin by creating an empty BenchmarkStudy object.
study <- BenchmarkStudy$new()
# Download an existing Trio from the submission database
example_trio <- downloadSubmissionTrio("D001", cachePath = tempdir())
example_trio
A mapping function is a helper function that processes method output into a format that can then be compared with the supporting evidence stored in the reference Trio. There are three ways to contribute the mapping function:
In this toy spatial transcriptomics example, the Trio contains the following supporting evidence:
annotated_domaincelltype_proportionsWe therefore define two mapping functions that extract those objects from a method result.
Example 1: extract predicted spatial domains.
# Define the mapping function
extract_domains <- function(result) {
if (is.data.frame(result) && "annotated_domain" %in% colnames(result)) {
return(result$annotated_domain)
}
if (is.list(result) && "annotated_domain" %in% names(result)) {
return(result$annotated_domain)
}
stop("Could not find 'annotated_domain' in the method output.")
}
# Add the mapping function
study$addMappingFunction(
name = "annotated_domain",
func = extract_domains,
inputDescription = "Method output containing one predicted spatial domain label per spot.",
outputDescription = "A vector of predicted spatial domain labels aligned to spots.",
exampleUsage = paste(
"## Minimal example",
"#result <- list(annotated_domain = c('domain_1', 'domain_1', 'domain_2', 'domain_2'))",
"#res <- study$runMapping('annotated_domain', result)",
"#head(res)",
sep = "\n"
)
)
Example 2: extract predicted cell type proportions.
# Define the mapping function
extract_celltype_props <- function(result) {
if (is.data.frame(result) && "celltype_proportions" %in% names(result)) {
return(result$celltype_proportions)
}
if (is.list(result) && "celltype_proportions" %in% names(result)) {
return(result$celltype_proportions)
}
if (is.matrix(result) || is.data.frame(result)) {
mat <- as.matrix(result)
rs <- rowSums(mat)
rs[rs == 0] <- 1
return(mat / rs)
}
stop("Could not extract cell type proportions from the method output.")
}
# Add the mapping function, it is optional but recommended to add example usage
study$addMappingFunction(
name = "celltype_proportions",
func = extract_celltype_props,
inputDescription = "Method output containing cell type proportions per spot.",
outputDescription = "A matrix or data frame of cell type proportions aligned to spots.",
exampleUsage = paste(
"## Minimal example",
"#props <- matrix(c(0.9, 0.1, 0.8, 0.2, 0.2, 0.8, 0.1, 0.9), ncol = 2, byrow = TRUE)",
"#study$runMapping('celltype_proportions', props)",
sep = "\n"
)
)
Similar as mapping functions, the protocol function is the full workflow of benchmarking study. There are three ways to contribute the protocol function:
Once the BenchmarkStudy object includes:
the recommended next step is to an interactive console workflow via interactivePrepareStudySubmission(study).
# Set name and description manually
study <- BenchmarkStudy$new(name = "ST toy study")
study$description <- "Toy spatial transcriptomics study."
interactivePrepareStudySubmission(study)
In that interactive workflow, BenchHub will guide you through:
study object,This section illustrates how a method developer can use the benchmark study object created by another user, apply their method, and evaluate its performance.
A BenchmarkStudy object can be downloaded from the submission database through its studyID.
loaded_study <- downloadSubmissionStudy(studyID = "ST005", cachePath = tempdir())
This returns a populated BenchmarkStudy object. For example:
loaded_study
loaded_study$name
loaded_study$description
loaded_study$version
length(loaded_study$trios)
Inspect the list of available trios, and available mapping functions
Each entry of loaded_study$trios is a Trio object with supporting evidence that can be used for evaluation.
length(loaded_study$trios)
loaded_study$trios[[1]]
This study provides mapping functions to process method outputs into a format that can be used for evaluation.
Each mapping function has documentation.
# list the names of the mapping function
loaded_study$listMappingFunctions()
# choose one to print the documentation
loaded_study$printMappingFunctionDocumentation("annotated_domain")
This benchmark study wants to assess predicted spatial domains and cell type proportions.
Suppose the method developer has run a method and obtained predicted domain labels and cell type proportions for each spot.
method_output <- list(
annotated_domain = c("domain_1", "domain_1", "domain_2", "domain_2"),
celltype_proportions = data.frame(
celltype_A = c(0.9, 0.8, 0.2, 0.1),
celltype_B = c(0.1, 0.2, 0.8, 0.9)
)
)
The method developer can apply the mapping functions to the method output to generate the objects required for evaluation.
domain_pred <- loaded_study$runMapping("annotated_domain", method_output)
prop_pred <- loaded_study$runMapping("celltype_proportions", method_output)
Now we can compare the simulated data against an experimental dataset using the evaluate function.
The evaluate function is in the format of study$evaluate(trio_name, list(supporting evidence = output to compare with)).
In the function below, the names in the list correspond to supporting evidence stored in the reference Trio.
result <- loaded_study$evaluate(loaded_study$trios[[1]]$name, # name of the Trio to compare with
list(
"annotated_domain" = domain_pred,
"celltype_proportions" = prop_pred
))
result
This vignette demonstrated two ways that users can interact with the BenchmarkStudy framework:
Benchmark developers: create or update a BenchmarkStudy by adding Trio objects and optional mapping functions with clear documentation, then prepare and submit the study through the current Study submission workflow.
Method developers: load an existing BenchmarkStudy from the submission database, use the Trio objects to execute benchmarking methods of interest, use the mapping functions to convert method outputs where needed, and evaluate those outputs against the study’s supporting evidence using the evaluate() function.
sessionInfo()
## R version 4.6.0 RC (2026-04-17 r89917)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scuttle_1.21.6 SingleCellExperiment_1.33.2
## [3] SummarizedExperiment_1.41.1 Biobase_2.71.0
## [5] GenomicRanges_1.63.2 Seqinfo_1.1.0
## [7] IRanges_2.45.0 S4Vectors_0.49.2
## [9] BiocGenerics_0.57.1 generics_0.1.4
## [11] MatrixGenerics_1.23.0 matrixStats_1.5.0
## [13] R6_2.6.1 glmnet_4.1-10
## [15] Matrix_1.7-5 lubridate_1.9.5
## [17] forcats_1.0.1 stringr_1.6.0
## [19] dplyr_1.2.1 purrr_1.2.2
## [21] readr_2.2.0 tidyr_1.3.2
## [23] tibble_3.3.1 ggplot2_4.0.2
## [25] tidyverse_2.0.0 BenchHub_0.99.13
## [27] BiocStyle_2.39.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.18.0 jsonlite_2.0.0
## [4] shape_1.4.6.1 datawizard_1.3.0 magrittr_2.0.5
## [7] TH.data_1.1-5 estimability_1.5.1 ggstance_0.3.7
## [10] magick_2.9.1 farver_2.1.2 rmarkdown_2.31
## [13] fs_2.1.0 vctrs_0.7.3 base64enc_0.1-6
## [16] tinytex_0.59 S4Arrays_1.11.1 htmltools_0.5.9
## [19] curl_7.0.0 broom_1.0.12 cellranger_1.1.0
## [22] SparseArray_1.11.13 Formula_1.2-5 funkyheatmap_0.5.2
## [25] googlesheets4_1.1.2 sass_0.4.10 bslib_0.10.0
## [28] htmlwidgets_1.6.4 plyr_1.8.9 sandwich_3.1-1
## [31] httr2_1.2.2 emmeans_2.0.3 zoo_1.8-15
## [34] cachem_1.1.0 lifecycle_1.0.5 iterators_1.0.14
## [37] pkgconfig_2.0.3 fastmap_1.2.0 rbibutils_2.4.1
## [40] digest_0.6.39 colorspace_2.1-2 patchwork_1.3.2
## [43] Hmisc_5.2-5 beachmat_2.27.5 labeling_0.4.3
## [46] timechange_0.4.0 abind_1.4-8 httr_1.4.8
## [49] polyclip_1.10-7 compiler_4.6.0 gargle_1.6.1
## [52] bit64_4.8.0 withr_3.0.2 htmlTable_2.4.3
## [55] S7_0.2.1-1 backports_1.5.1 BiocParallel_1.45.0
## [58] ggcorrplot_0.1.4.1 performance_0.16.0 ggforce_0.5.0
## [61] MASS_7.3-65 DelayedArray_0.37.1 rappdirs_0.3.4
## [64] ggsci_5.0.0 tools_4.6.0 foreign_0.8-91
## [67] otel_0.2.0 googledrive_2.1.2 nnet_7.3-20
## [70] glue_1.8.1 grid_4.6.0 checkmate_2.3.4
## [73] cluster_2.1.8.2 reshape2_1.4.5 gtable_0.3.6
## [76] tzdb_0.5.0 data.table_1.18.2.1 hms_1.1.4
## [79] XVector_0.51.0 utf8_1.2.6 ggrepel_0.9.8
## [82] foreach_1.5.2 pillar_1.11.1 vroom_1.7.1
## [85] splines_4.6.0 tweenr_2.0.3 splitTools_1.0.1
## [88] lattice_0.22-9 survival_3.8-6 bit_4.6.0
## [91] dotwhisker_0.8.4 tidyselect_1.2.1 knitr_1.51
## [94] gridExtra_2.3 bookdown_0.46 xfun_0.57
## [97] stringi_1.8.7 yaml_2.3.12 evaluate_1.0.5
## [100] codetools_0.2-20 BiocManager_1.30.27 cli_3.6.6
## [103] rpart_4.1.27 xtable_1.8-8 parameters_0.28.3
## [106] Rdpack_2.6.6 jquerylib_0.1.4 dichromat_2.0-0.1
## [109] Rcpp_1.1.1-1 coda_0.19-4.1 survAUC_1.4-0
## [112] parallel_4.6.0 assertthat_0.2.1 bayestestR_0.17.0
## [115] marginaleffects_0.32.0 mvtnorm_1.3-7 scales_1.4.0
## [118] insight_1.5.0 crayon_1.5.3 rlang_1.2.0
## [121] cowplot_1.2.0 multcomp_1.4-30