---
title: "bettr"
author: "Charlotte Soneson & Federico Marini"
date: "`r Sys.Date()`"
output: 
    BiocStyle::html_document
vignette: >
  %\VignetteIndexEntry{bettr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```

# Introduction

Method benchmarking is a core part of computational biology research, with an 
intrinsic power to establish best practices in method selection and 
application, as well as help identifying gaps and possibilities for 
improvement. A typical benchmark evaluates a set of methods using multiple 
different metrics, intended to capture different aspects of their performance. 
The best method to choose in any given situation can then be found, e.g., 
by averaging the different performance metrics, possibly putting more emphasis 
on those that are more important to the specific situation. 

Inspired by the 
[OECD 'Better Life Index'](https://www.oecdbetterlifeindex.org/#/11111111111),
the `bettr` package was developed to provide support for this last step. It 
allows users to easily create performance summaries emphasizing the aspects 
that are most important to them. `bettr` can be used interactively, via a 
R/Shiny application, or programmatically by calling the underlying functions. 
In this vignette, we illustrate both alternatives, using example data 
provided with the package. 

Given the abundance of methods available for 
computational analysis of biological data, both within and beyond Bioconductor,
and the importance of careful, adaptive benchmarking, we believe that 
`bettr` will be a useful complement to currently available Bioconductor 
infrastructure related to benchmarking and performance estimation. 
Other packages (e.g., `r Biocpkg("pipeComp")`) provide frameworks for _executing_
benchmarks by applying and recording pre-defined workflows to data. Packages 
such as `r Biocpkg("iCOBRA")` and `r CRANpkg("ROCR")` instead provide 
functionality for calculating well-established evaluation metrics. In contrast, 
`bettr` focuses on _visual exploration_ of benchmark results, represented by 
the values of several evaluation metrics. 

# Installation

`bettr` can be installed from Bioconductor (from release 3.19 onwards): 

```{r, eval=FALSE}
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("bettr")
```

# Usage

```{r}
suppressPackageStartupMessages({
    library("bettr")
    library("SummarizedExperiment")
    library("tibble")
    library("dplyr")
})
```

The main input to `bettr` is a `data.frame` containing values of several 
metrics for several methods. In addition, the user can provide additional 
annotations and characteristics for the methods and metrics, which can be 
used to group and filter them in the interactive application. 

```{r}
## Data for two metrics (metric1, metric2) for three methods (M1, M2, M3)
df <- data.frame(Method = c("M1", "M2", "M3"), 
                 metric1 = c(1, 2, 3),
                 metric2 = c(3, 1, 2))

## More information for metrics
metricInfo <- data.frame(Metric = c("metric1", "metric2", "metric3"),
                         Group = c("G1", "G2", "G2"))

## More information for methods ('IDs')
idInfo <- data.frame(Method = c("M1", "M2", "M3"), 
                     Type = c("T1", "T1", "T2"))
```

To simplify handling and sharing, the data can be combined into a 
`SummarizedExperiment` object (with methods as rows and metrics as columns) as 
follows: 

```{r}
se <- assembleSE(df = df, idCol = "Method", metricInfo = metricInfo, 
                 idInfo = idInfo)
se
```

The interactive application to explore the rankings can then be launched by
calling the `bettr()` function. The input can be either the assembled 
`SummarizedExperiment` object or the individual components. 

```{r}
#| eval: false

## Alternative 1
bettr(bettrSE = se)

## Alternative 2
bettr(df = df, idCol = "Method", metricInfo = metricInfo, idInfo = idInfo)
```

# Example - single-cell RNA-seq clustering benchmark

Next, we show a more elaborate example, visualizing data from the benchmark of
single-cell clustering methods performed by 
[Duo et al (2018)](https://f1000research.com/articles/7-1141). The values 
for a set of evaluation metrics applied to results obtained by several 
clustering methods are provided in a `.csv` file in the package:

```{r}
res <- read.csv(system.file("extdata", "duo2018_results.csv", 
                            package = "bettr"))
dim(res)
tibble(res)
```

As we can see, we have 14 methods (rows) and 48 different metrics (columns). 
The first column provides the name of the clustering method.
More precisely, the columns correspond to four different metrics, each of 
which was applied to clustering output from 12 data sets. We encode this 
"grouping" of metrics in a data frame, in such a way that we can later 
collapse performance across data sets in `bettr`: 

```{r}
metricInfo <- tibble(Metric = colnames(res)[-1]) |>
    mutate(Class = sub("_.*", "", Metric))
head(metricInfo)
table(metricInfo$Class)
```

In order to make different metrics comparable, we next define the 
transformation that should be applied to each of them within `bettr`. First, 
we need to make sure that the metric are consistent in terms of whether large 
values indicate "good" or "bad" performance. In our case, for both the `elapsed`
(elapsed run time), `nclust.vs.true` (difference between estimated and true 
number of clusters) and `s.norm.vs.true` (difference between estimated and 
true normalized Shannon entropy for a clustering), a small value indicates 
"better" performance, while for the `ARI` (adjusted Rand index), larger 
values are better. Hence, we will flip the sign of the first three before 
doing additional analyses. Moreover, the different metrics clearly live in 
different numeric ranges - the maximal value of the `ARI` is 1, while the 
other metrics can have much larger values. As an example, here we therefore 
scale the three other metrics linearly to the interval `[0, 1]` to make them 
more comparable to the `ARI` values. We record these transformations in a list, 
that will be passed to `bettr`:

```{r}
## Initialize list
initialTransforms <- lapply(res[, grep("elapsed|nclust.vs.true|s.norm.vs.true", 
                                       colnames(res), value = TRUE)], 
                            function(i) {
                                list(flip = TRUE, transform = '[0,1]')
                            })

length(initialTransforms)
names(initialTransforms)
head(initialTransforms)
```

We can specify four different aspects of the desired transform, which will 
be applied in the following order:

* `flip` (`TRUE` or `FALSE`, whether to flip the sign of the values). 
The default is `FALSE`.
* `offset` (a numeric value to add to the observed values, possibly after 
applying the sign flip). The default is 0.
* `transform` (one of `None`, `[0,1]`, `[-1,1]`, `z-score`, or `Rank`).
The default is `None`.
* `cuts` (a numeric vector of cuts that will be used to turn a numeric 
variable into a categorical one). The default is `NULL`. 

Only values that deviate from the defaults need to be specified.

Finally, we can define a set of colors that we would like to use for 
visualizing the methods and metrics in `bettr`. 

```{r}
metricColors <- list(
    Class = c(ARI = "purple", elapsed = "forestgreen", 
              nclust.vs.true = "blue", 
              s.norm.vs.true = "orange"
    ))
idColors <- list(
    method = c(
        CIDR = "#332288", FlowSOM = "#6699CC", PCAHC = "#88CCEE", 
        PCAKmeans = "#44AA99", pcaReduce = "#117733",
        RtsneKmeans = "#999933", Seurat = "#DDCC77", SC3svm = "#661100", 
        SC3 = "#CC6677", TSCAN = "grey34", ascend = "orange", SAFE = "black",
        monocle = "red", RaceID2 = "blue"
    ))
```

All the information defined so far can be combined in a `SummarizedExperiment`
object, as shown above for the small example data: 

```{r}
duo2018 <- assembleSE(df = res, idCol = "method", metricInfo = metricInfo, 
                      initialTransforms = initialTransforms,
                      metricColors = metricColors, idColors = idColors)
duo2018
```

The `assay` of the `SummarizedExperiment` object contains the values for 
the 48 performance measures for the 14 clustering methods. 
The `metricInfo` is stored in the `colData`, and the lists of colors and the 
initial transforms in the `metadata`:

```{r}
## Display the whole performance table
tibble(assay(duo2018, "values"))

## Showing the first metric, evaluated on all datasets
head(colData(duo2018), 12)

## These are the color definitions (can mix character and hex values)
metadata(duo2018)$bettrInfo$idColors
metadata(duo2018)$bettrInfo$metricColors

names(metadata(duo2018)$bettrInfo$initialTransforms)

## An example of a transformation - elapsed time for the Koh dataset
metadata(duo2018)$bettrInfo$initialTransforms$elapsed_Koh
```

Now, we can launch the app for this data set:

```{r}
#| eval: false

bettr(bettrSE = duo2018, bstheme = "sandstone")
```

The screenshot below illustrates the default view of the interactive 
interface. The methods (rows) are sorted by the aggregated scores, which are 
shown as bars to the right of the heatmap (recall that in this case, large 
scores are better). If we have many methods, we can choose to only display the 
top ones by ticking the `Show only top IDs` box in the left-hand sidebar. 
If additional annotations are provided for the methods (similarly as for the 
metrics above), we can further display only the top methods in each group. This 
can be useful e.g. in situations where each tool is run with many different 
parameter settings, and we want to limit the plot to the best parameter 
settings for each tool. 

<img src="bettr-screenshot-heatmap.png" width="100%">

We can choose to collapse the metric values to have a single 
value for each metric class, to reduce the redundancy (via `Metrics` - 
`Grouping of metrics` in the left-hand sidebar). We can now also 
freely decide how to weight the respective metrics by means of the sliders in 
the left side bar. The bars on top of the heatmap show the current weight 
assignment. 

<img src="bettr-screenshot-heatmap-collapsed.png" width="100%">

`bettr` also provides alternative visualizations, e.g. a polar plot:

<img src="bettr-screenshot-polar.png" width="100%">

At any point, methods and metrics can be filtered out (either individually or
by excluding all methods and metrics with a specified annotation) via the 
`Filter methods/metrics` tab. The metrics can be transformed interactively 
in the `Transform metrics` tab (see above for the different transformation 
options). Finally, the current transformed values for all metrics as well as 
the summary scores can be viewed in the `Data table` tab. 

# Programmatic interface

The interactive application showcased above is the main entry point to using
`bettr`. However, we also provide a wrapper function to prepare the input data 
for plotting (replicating the steps that are performed in the app), as well as 
access to the plotting functions themselves. The following code replicates the 
results for the example above.

```{r}
#| fig.width: 7
#| fig.height: 7

## Assign a higher weight to one of the collapsed metric classes
metadata(duo2018)$bettrInfo$initialWeights["Class_ARI"] <- 0.55

prepData <- bettrGetReady(
    bettrSE = duo2018, idCol = "method", 
    scoreMethod = "weighted mean", metricGrouping = "Class", 
    metricCollapseGroup = TRUE)

## This object is fairly verbose and detailed, 
## but has the whole set of info needed
prepData

## Call the plotting routines specifying one single parameter
makeHeatmap(bettrList = prepData)
makePolarPlot(bettrList = prepData)
```

# Exporting data from the app

It is possible to export the data used internally by the interactive 
application, in the same format as the output from `bettrGetReady()`.
To enable such export, first generate the `app` object using the `bettr()`
function, and then assign the call to `shiny::runApp()` to a variable to 
capture the output. For example: 

```{r}
if (interactive()) {
    app <- bettr(bettrSE = duo2018, bstheme = "sandstone")
    out <- shiny::runApp(app)
}
```

To activate the export, make sure to click the button 'Close app' (in the 
bottom of the left-hand side bar) in order to 
close the application (don't just close the window). This will take you back to 
your R session, where the variable `out` will be populated with the data used 
in the app (in the same format as the output from `bettrGetReady()`).
This list can be directly provided as the input to e.g. `makeHeatmap()` and 
the other plotting functions via the `bettrList` argument, as shown above.

# Additional examples

`bettr` can also be adapted to represent other types of collections of 
metric scores than the results of benchmarking studies in computational 
biology. An 
[example](https://github.com/federicomarini/bettr/blob/devel/inst/scripts/visualize_oecd_data.R), 
which is also included in the `inst/scripts` folder of 
this package, presents the OECD Better Life Index 
(https://stats.oecd.org/index.aspx?DataSetCode=BLI), spanning over 11 topics, 
each represented by one to three indicators. These indicators are measures 
of the concepts of well-being, and well suited to display some
comparison across countries.

<!-- screenshot of that? -->

Additional examples can be added to the codebase upon interest, and we 
encourage users to contribute to that via a Pull Request to 
https://github.com/federicomarini/bettr.


# Session info {-}

```{r}
sessionInfo()
```