This R package supports the processing and quality control of highly multiplexed MALDI-imaging data. gutenTAG is aimed at processing of highly-multiplexed targeted MALDI-IHC data in particular. It is characterised by a novel ‘metapeak’ generation step to account for mass shift and the presence of isotopic peaks. The data handling capabilities build heavily on the Bioconductor package Cardinal, a tool for general processing and analysis of Mass Spectrometry Imaging data. The data can easily be converted into MSImagingExperiment, SpatialExperiment objects and AnnData objects through inbuilt conversion functions, facilitating further downstream analysis. Additionally, the package supports image visualisation from cytomapper (and therefore cytoviewer, an RShiny extension of cytomapper) through the construction of a CytoImageList object.
gutenTAG 0.99.14
This vignette introduces the gutenTAG package for processing
highly-multiplexed MALDI-IHC images. gutenTAG is built heavily on
Cardinal Bioconductor package (Bemis et al. 2015), a
go-to resource for processing and analysis of Mass Spectrometry Imaging
data. gutenTAG uses relevant functionality from Cardinal as well as
a novel ‘metapeak’ generation step to facilitate accurate peak picking
in the presence of technical mass shift and isotopic peaks. `gutenTAG`
is designed specifically for cases such as MALDI-IHC with photocleavable
mass tags (PCMTs), where we are searching for known features that are
spiked into our samples.
The development of highly-multiplexed imaging technologies has enabled the exploration and characterisation of tissues on a spatial level hitherto unimaginable. Advances in these technologies have enabled analysis of the complex systems at play in diseases such as cancer in a manner that respects the spatial nature of their progression. Spatial transcriptomic methods have dominated the spatial biology landscape in recent years. This is in large part due to the massive feature spaces that emerge from these technologies, opening the door to the application of high-dimensional statistics and machine learning techniques on molecular biology datasets. While spatial transcriptomics has captured the limelight, there is a large and ever-growing interest in highly-multiplexed proteomics imaging due to the foundational role that proteins play in cellular processes and in disease (Moffitt, Lundberg, and Heyn 2022). This field has seen a huge amount of innovation in recent years, resulting in the emergence of myriad different approaches to achieve highly-multiplexed targeted proteomic imaging. These can be broadly characterised into cyclic immunofluorescence optical approaches, such as CyCIF (Lin et al. 2018), and one-step mass tag methods such as IMC (Giesen et al. 2014). However, the high-plexities achieved by methods in spatial transcriptomics has eluded the field of spatial proteomics so far.
Mass Spectrometry Imaging (MSI), a powerful imaging technology with many applications in metabolomics, lipidomics, glycomics, etc, has largely been excluded from the proteomics party. Proteins are typically too large for direct analysis by mass spectrometry imaging. MSI methods have been developing at a rapid rate in recent years. One such method is the introduction of Photocleavable Mass Tags (PCMTs) (Gargey Yagnik and Lim 2021), which enable targeted tagging of proteins of interest via specific antibodies. This technique of applying PCMT-tagged antibodies with MALDI-imaging has been termed MALDI-IHC. The use of peptide tags as the mass reporter system of choice lifts the lid on the existing plexity ceiling in current one-shot imaging proteomic technologies, as it is no longer capped by the abundance of heavy metals as in IMC.
MALDI-IHC data shows some persistent features which must be taken into
consideration for analysis. Firstly, it is quite large. Depending
on the size of the tissue in question and the pixel size chosen for
acquisition, it is not uncommon to see files in excess of 50GB. This will only increase as the field pushes for smaller pixel sizes and larger acquisition areas. This makes running even computationally efficient tasks quite a burden. MALDI-imaging data processing is hampered by a few acquisitional features, resulting in mass tag signal not always being in its expected location. Mass shift, where there is a systematic shift of a peak from it’s expected location, can vary from sample to sample, as well as vary within the same sample (can be due to height effects in the tissue, laser defocusing, etc.). A further feature of MALDI-imaging data is the presence of isotopic peaks, which split the signal locally across the spectrum. These are the primary features of MALDI-imaging data which can be accounted for computationally with gutenTAG.
gutenTAG is an R package which forms a pipeline that enables fast,
robust and accurate processing of targeted MALDI-imaging data. It is
initiated by reading .imzML data into R using the
Cardinal readImzML or readMSIData functions. It is then
recommended to pre-process the data. This is followed
by peak picking to identify peaks filter out noise in
our spectra. Following this is the novel
metapeak step, which identifies the m/z
location of our peaks based on watershed-like algorithm performed on the
counts histogram of peaks. These locations on the m/z axis are called
metapeaks since they reflect the prominent location of our peaks across
all spectra in our image. Finally, these metapeaks are then mapped onto
our reference list of expected mass tags (our panel) and our metapeaks
are assigned a label from our panel, completing
the pipeline. This data is provided in a simple list format rather than
the typical object-oriented framework you may be familiar with from many
packages. Functions for easily generating many of the common object
classes in the spatial biology field are provided. The object classes
supported by gutenTAG include the MSImagingExperiment object from
Cardinal, the CytoImageList object for cytomapper and cytoviewer
and the popular SpatialExperiment (Righelli et al. 2022) and AnnData objects.
This is a design choice, since the MALDI-IHC project bridges people from
many communities who may be familiar with different frameworks.
The gutenTAG package exports functions and example data to perform the
following analyses:
To highlight the usability of these function, the gutenTAG package
also exports a number of test data files.
The raw imaging data can be read into R using the readMSIData function
from Cardinal.
imzML_path <- system.file("extdata/Example_data.imzML", package = "gutenTAG")
rawFile <- Cardinal::readMSIData(imzML_path)
The readPanel function can be used to read a .csv file into R.
containing the information for the mass of the peptide tag and the
molecular target into R.
panel_path <- system.file("extdata/ref_list.csv", package = "gutenTAG")
panel <- readPanel(panel_path)
If you want to see how the package functions on test data, or do not
have access to the required data yet, a good place to start is by
loading in the example data from the github repository. The gutenTAG
package contains example data generated by Bruker’s RapiFlex imaging
system. The following section gives an overview of these files.
These files are accessible via:
path <- system.file("extdata", package = "gutenTAG")
list.files(path, recursive = TRUE)
## [1] "Example_data.ibd" "Example_data.imzML"
## [3] "Example_processed.Rdata" "ref_list.csv"
By default, the imaging data is read into an
MSImagingExperiment object. Here, the
extracted channel-specific intensities are stored in the imageData
slot, accessible via the iData getter function from Cardinal.
When acquiring the raw data, the resulting spectra can often be noisy
and messy. To correct for this, some pre-processing steps are required.
This can be done simply using the preProcess function from this
package, which can be implemented as follows.
pre <- preProcess(rawFile)
This function is a wrapper for the Cardinal functions normalize,
smoothSignal and reduceBaseline, using common parameters and methods
for each function. For more flexibility in your pre-processing, it is
recommended to directly use the Cardinal functions. Here, we use total
ion current (TIC) as the normalisation method of choice, Gaussian
smoothing for spectral smoothing and local minimum baseline reduction.
pre <- rawFile |>
Cardinal::normalize(method = "tic") |>
Cardinal::smooth(method = "gaussian") |>
reduceBaseline(method = "locmin") |>
process(BPPARAM = BiocParallel::bpparam())
Full processing of the pre-processed MALDI-imaging data can be performed
using the peakDetection, generateMetapeaks and assignMetapeaks
functions in series.
The first step of processing is to identify peaks in the pre-processed
spectra. This function is essentially a wrapper for
Cardinal::peakPick, and uses the median adaptive deviation (MAD) peak
detection algorithm for robust noise evaluation.
detected <- peakDetection(pre, snr = 3)
Once peakDetection has been performed, the detected peaks per pixel
can be used to generate metapeaks. Here, metapeaks are defined as single
peaks that correspond to clusters of peaks that contain information on
the same molecular species. The function used to generate these
metapeaks is generateMetapeaks. This step is motivated by the loss of
signal through technical mass shifts as well as the presence of isotopic
peaks in our spectra. The following steps are required to be construct
metapeaks.
threshold parameter.It is important to note that metapeaks are not restricted to targeted peaks. Many of the metapeaks generated will be formed by signal that is from untargeted peaks.
metapeaks <- generateMetapeaks(detected, hist_smooth_factor = 1, fixed.limits = 0.5, sparsity = 1)
The final step of processing is to assign metapeaks to the reference
list of masses that was loaded in using the readPanel function. This
is performed using the assignMetapeaks function. The output of this
function is a list of lists, containing primarily IntensityDF, the
channel-specific intensities, SpatialCoords, the spatial coordinates
and CorrespondenceMatrix, a data frame describing the assignment of
metapeaks to reference peaks. This assignment can be controlled via the
mz_threshold parameter, which has a default of 1m/z.
You may notice that some of the expected peaks in the reference list
will not appear in the output. This is likely due to either a lack of
signal, resulting in the metapeak not passing the filter threshold
during generateMetapeaks, or the metapeak center being out of range
for assignment to a reference peak during assignMetapeaks. These
parameters can be adjusted to improve results.
processed <- assignMetapeaks(metapeaks, refList = panel, pre = pre, mz_threshold = 1)
gutenTAG uses the BiocParallel framework for parallel processing. Both
preProcess and peakDetection accept a BPPARAM argument that accepts
any BiocParallelParam object. By default, both functions use
BiocParallel::bpparam(), which returns the session-wide registered backend.
The recommended approach is to register a backend once at the top of your script, before calling any gutenTAG functions:
# Linux / macOS — fork-based, lowest overhead
BiocParallel::register(BiocParallel::MulticoreParam(workers = 4))
# Windows — socket-based (also works on Linux/macOS)
BiocParallel::register(BiocParallel::SnowParam(workers = 4))
# Single-core
BiocParallel::register(BiocParallel::SerialParam())
After registering, all subsequent calls to preProcess and peakDetection
will use the registered backend automatically:
pre <- preProcess(rawFile)
detected <- peakDetection(pre, snr = 3)
You can also pass a backend directly to override the registered default for a single call:
pre <- preProcess(rawFile, BPPARAM = BiocParallel::MulticoreParam(workers = 4))
detected <- peakDetection(pre, snr = 3, BPPARAM = BiocParallel::SnowParam(workers = 4))
| Platform | Recommended backend | Notes |
|---|---|---|
| Linux / macOS | MulticoreParam(n) |
Fork-based, minimal overhead |
| Windows | SnowParam(n) |
Socket-based, works on all systems |
| Any (serial) | SerialParam() |
Default; no parallelisation |
SnowParam also works on Linux/macOS and is the safest choice if you are
writing code intended to run on multiple platforms.
BiocParallel::bpparam()
This package supports visualisation of MALDI images through the
cytomapper (Eling et al. 2020) framework developed by Eling et al
(reference). In order to use the plotPixels function, the data must
first be coerced to a CytoImageList object using the asCytoImageList
function from this package.
library(cytomapper)
cil <- asCytoImageList(processed)
Now the channels can be visualised using the plotPixels function.
Single channel and multichannel imaging is supported using this format.
cytomapper::plotPixels(image = cil,
colour_by = "HLA.ABC",
scale_bar = list(length = 5, lwidth = 0.5, margin = c(1,1)),
interpolate = FALSE,
image_title = list(margin = c(1, 1)))
plotPixels(cil, colour_by = c("HLA.ABC", "CD98", "beta.actin"),
legend = list(colour_by.title.cex = 0.7),
scale_bar = list(length = 5, lwidth = 0.5, margin = c(1,1)),
image_title = list(margin = c(1, 1)),
interpolate = FALSE)
Additionally, the images can be visualised using the cytoviewer R
Shiny app developed by Meyer et al 2023 (Meyer, Eling, and Bodenmiller 2023). cytoviewer
allows interactive visualistion of multi-channel images, supporting
visualisation of up to 6 channels. It should be noted that the
cell-level functionality of cytomapper and cytoviewer are not
applicable for our data, since single-cell resolution is not yet
possible. Interactive visualisation of our images can’t be performed
through a vignette, but the call is very simple: cytoviewer(cil).
gutenTAG provides functions for easy conversion to some of the popular
frameworks in the spatial omics ecosystem. These functions include:
asMSImagingExperiment() for MSImagingExperiment class from
Cardinal for those who are familiar with typical MSI analysis and
prefer to return to the Cardinal framework
asSpatialExperiment() for SpatialExperiment class, a spatial
extension of the SingleCellExperiment class.
asCytoImageList() for CytoImageList class for image
visualisation using the cytomapper and cytoviewer packages.
asAnnData() for AnnData class for access to the scverse
ecosystem.
These functions are written to be called on the output of the
assignMetapeaks() function.
gutenTAG provides many quality control functions. These can be useful for assessing data quality, filtering low performance peaks and assessing the overall quality of gutenTAG’s segmentation. A full QC report can be found in inst/qc_report.Rmd.
John Abbey wrote the package.
Pierre Bost conceptualised the metapeak generation step for data processing and was instrumental in
developing the initial architecture of the pipeline. Leonardo Schwarz contributed enormously to the package codebase. Thank you to Nils Eling and Lasse Meyer for assisting in the development of the package, as well as helping with integrating the package with cytomapper and cytoviewer. We also thank Christian Panse and Witold Wolski of the Functional Genomics Center Zurich (FGCZ) for all of their support.
## R version 4.6.0 RC (2026-04-17 r89917)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] cytomapper_1.23.0 SingleCellExperiment_1.33.2
## [3] SummarizedExperiment_1.41.1 Biobase_2.71.0
## [5] GenomicRanges_1.63.2 Seqinfo_1.1.0
## [7] IRanges_2.45.0 MatrixGenerics_1.23.0
## [9] matrixStats_1.5.0 EBImage_4.53.0
## [11] gutenTAG_0.99.14 Cardinal_3.13.0
## [13] S4Vectors_0.49.2 ProtGenerics_1.43.0
## [15] BiocGenerics_0.57.1 generics_0.1.4
## [17] BiocParallel_1.45.0 BiocStyle_2.39.0
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-9 gridExtra_2.3 rlang_1.2.0
## [4] magrittr_2.0.5 svgPanZoom_0.3.4 shinydashboard_0.7.3
## [7] otel_0.2.0 compiler_4.6.0 flexmix_2.3-20
## [10] systemfonts_1.3.2 png_0.1-9 fftwtools_0.9-11
## [13] vctrs_0.7.3 pkgconfig_2.0.3 SpatialExperiment_1.21.0
## [16] fastmap_1.2.0 magick_2.9.1 XVector_0.51.0
## [19] promises_1.5.0 rmarkdown_2.31 ggbeeswarm_0.7.3
## [22] modeltools_0.2-24 xfun_0.57 cachem_1.1.0
## [25] splus2R_1.3-5 jsonlite_2.0.0 later_1.4.8
## [28] rhdf5filters_1.23.3 DelayedArray_0.37.1 Rhdf5lib_1.99.6
## [31] jpeg_0.1-11 tiff_0.1-12 terra_1.9-11
## [34] irlba_2.3.7 parallel_4.6.0 R6_2.6.1
## [37] bslib_0.10.0 RColorBrewer_1.1-3 jquerylib_0.1.4
## [40] Rcpp_1.1.1-1 bookdown_0.46 knitr_1.51
## [43] nnet_7.3-20 httpuv_1.6.17 Matrix_1.7-5
## [46] nnls_1.6 tidyselect_1.2.1 viridis_0.6.5
## [49] dichromat_2.0-0.1 abind_1.4-8 yaml_2.3.12
## [52] codetools_0.2-20 lattice_0.22-9 tibble_3.3.1
## [55] withr_3.0.2 shiny_1.13.0 S7_0.2.2
## [58] evaluate_1.0.5 ontologyIndex_2.12 mclust_6.1.2
## [61] pillar_1.11.1 BiocManager_1.30.27 matter_2.13.5
## [64] sp_2.2-1 RCurl_1.98-1.18 ggplot2_4.0.3
## [67] scales_1.4.0 xtable_1.8-8 glue_1.8.1
## [70] tools_4.6.0 locfit_1.5-9.12 rhdf5_2.55.16
## [73] grid_4.6.0 nlme_3.1-169 raster_3.6-32
## [76] CardinalIO_1.9.0 beeswarm_0.4.0 HDF5Array_1.39.1
## [79] vipor_0.4.7 cli_3.6.6 textshaping_1.0.5
## [82] viridisLite_0.4.3 S4Arrays_1.11.1 svglite_2.2.2
## [85] dplyr_1.2.1 gtable_0.3.6 sass_0.4.10
## [88] digest_0.6.39 SparseArray_1.11.13 rjson_0.2.23
## [91] htmlwidgets_1.6.4 farver_2.1.2 htmltools_0.5.9
## [94] lifecycle_1.0.5 h5mread_1.3.3 mime_0.13
Bemis, Kyle D., April Harry, Livia S. Eberlin, Christina Ferreira, Stephanie M. van de Ven, Parag Mallick, Mark Stolowitz, and Olga Vitek. 2015. “Cardinal: An R Package for Statistical Analysis of Mass Spectrometry-Based Imaging Experiments.” Bioinformatics, 1883–6. https://doi.org/doi:10.1093/bioinformatics/btv146.
Eling, Nils, Nicolas Damond, Tobias Hoch, and Bernd Bodenmiller. 2020. “Cytomapper: An R/Bioconductor Package for Visualization of Highly Multiplexed Imaging Data.” Bioinformatics 36 (24): 5706–8. https://doi.org/10.1093/bioinformatics/btaa1061.
Gargey Yagnik, Kenneth J. Rothschild, Ziying Liu, and Mark J. Lim. 2021. “Highly Multiplexed Immunohistochemical Maldi-Ms Imaging of Biomarkers in Tissues.” Journal of the American Society for Mass Spectrometry 32: 977–88. https://doi.org/doi: 10.1021/jasms.0c00473.
Giesen, Charlotte, Hao A. O. Wang, Denis Schapiro, Nevena Zivanovic, Andrea Jacobs, Bodo Hattendorf, Peter J. Schüffler, et al. 2014. “Highly Multiplexed Imaging of Tumor Tissues with Subcellular Resolution by Mass Cytometry.” Nature Methods 11 (4): 417–22.
Lin, Jia-Ren, Benjamin Izar, Shu Wang, Clarence Yapp, Shaolin Mei, Parin M. Shah, Sandro Santagata, and Peter K. Sorger. 2018. “Highly Multiplexed Immunofluorescence Imaging of Human Tissues and Tumors Using T-Cycif and Conventional Optical Microscopes.” eLife 7: 1–46.
Meyer, Lasse, Nils Eling, and Bernd Bodenmiller. 2023. “Cytoviewer: An R/Bioconductor Package for Interactive Visualization and Exploration of Highly Multiplexed Imaging Data.” https://doi.org/10.1101/2023.05.24.542115.
Moffitt, Jeffrey R., Emma Lundberg, and Holger Heyn. 2022. “The Emerging Landscape of Spatial Profiling Technologies.” Nature Reviews Genetics 23: 741–59.
Righelli, Dario, Lukas M. Weber, Helena L. Crowell, Brenda Pardo, Leonardo Collado-Torres, Shila Ghazanfar, Aaron T. L. Lun, Stephanie C. Hicks, and Davide Risso. 2022. “SpatialExperiment: Infrastructure for Spatially-Resolved Transcriptomics Data in R Using Bioconductor.” Bioinformatics 38 (11): –3. https://doi.org/10.1093/bioinformatics/btac299.