1 Introduction

This vignette introduces the gutenTAG package for processing highly-multiplexed MALDI-IHC images. gutenTAG is built heavily on Cardinal Bioconductor package (Bemis et al. 2015), a go-to resource for processing and analysis of Mass Spectrometry Imaging data. gutenTAG uses relevant functionality from Cardinal as well as a novel ‘metapeak’ generation step to facilitate accurate peak picking in the presence of technical mass shift and isotopic peaks. `gutenTAG` is designed specifically for cases such as MALDI-IHC with photocleavable mass tags (PCMTs), where we are searching for known features that are spiked into our samples.

1.1 Highly-multiplexed imaging

The development of highly-multiplexed imaging technologies has enabled the exploration and characterisation of tissues on a spatial level hitherto unimaginable. Advances in these technologies have enabled analysis of the complex systems at play in diseases such as cancer in a manner that respects the spatial nature of their progression. Spatial transcriptomic methods have dominated the spatial biology landscape in recent years. This is in large part due to the massive feature spaces that emerge from these technologies, opening the door to the application of high-dimensional statistics and machine learning techniques on molecular biology datasets. While spatial transcriptomics has captured the limelight, there is a large and ever-growing interest in highly-multiplexed proteomics imaging due to the foundational role that proteins play in cellular processes and in disease (Moffitt, Lundberg, and Heyn 2022). This field has seen a huge amount of innovation in recent years, resulting in the emergence of myriad different approaches to achieve highly-multiplexed targeted proteomic imaging. These can be broadly characterised into cyclic immunofluorescence optical approaches, such as CyCIF (Lin et al. 2018), and one-step mass tag methods such as IMC (Giesen et al. 2014). However, the high-plexities achieved by methods in spatial transcriptomics has eluded the field of spatial proteomics so far.

1.2 Mass Spectrometry Imaging

Mass Spectrometry Imaging (MSI), a powerful imaging technology with many applications in metabolomics, lipidomics, glycomics, etc, has largely been excluded from the proteomics party. Proteins are typically too large for direct analysis by mass spectrometry imaging. MSI methods have been developing at a rapid rate in recent years. One such method is the introduction of Photocleavable Mass Tags (PCMTs) (Gargey Yagnik and Lim 2021), which enable targeted tagging of proteins of interest via specific antibodies. This technique of applying PCMT-tagged antibodies with MALDI-imaging has been termed MALDI-IHC. The use of peptide tags as the mass reporter system of choice lifts the lid on the existing plexity ceiling in current one-shot imaging proteomic technologies, as it is no longer capped by the abundance of heavy metals as in IMC.

1.3 MALDI-IHC

MALDI-IHC data shows some persistent features which must be taken into consideration for analysis. Firstly, it is quite large. Depending on the size of the tissue in question and the pixel size chosen for acquisition, it is not uncommon to see files in excess of 50GB. This will only increase as the field pushes for smaller pixel sizes and larger acquisition areas. This makes running even computationally efficient tasks quite a burden. MALDI-imaging data processing is hampered by a few acquisitional features, resulting in mass tag signal not always being in its expected location. Mass shift, where there is a systematic shift of a peak from it’s expected location, can vary from sample to sample, as well as vary within the same sample (can be due to height effects in the tissue, laser defocusing, etc.). A further feature of MALDI-imaging data is the presence of isotopic peaks, which split the signal locally across the spectrum. These are the primary features of MALDI-imaging data which can be accounted for computationally with gutenTAG.

1.4 gutenTAG overview

gutenTAG is an R package which forms a pipeline that enables fast, robust and accurate processing of targeted MALDI-imaging data. It is initiated by reading .imzML data into R using the Cardinal readImzML or readMSIData functions. It is then recommended to pre-process the data. This is followed by peak picking to identify peaks filter out noise in our spectra. Following this is the novel metapeak step, which identifies the m/z location of our peaks based on watershed-like algorithm performed on the counts histogram of peaks. These locations on the m/z axis are called metapeaks since they reflect the prominent location of our peaks across all spectra in our image. Finally, these metapeaks are then mapped onto our reference list of expected mass tags (our panel) and our metapeaks are assigned a label from our panel, completing the pipeline. This data is provided in a simple list format rather than the typical object-oriented framework you may be familiar with from many packages. Functions for easily generating many of the common object classes in the spatial biology field are provided. The object classes supported by gutenTAG include the MSImagingExperiment object from Cardinal, the CytoImageList object for cytomapper and cytoviewer and the popular SpatialExperiment (Righelli et al. 2022) and AnnData objects. This is a design choice, since the MALDI-IHC project bridges people from many communities who may be familiar with different frameworks.

2 gutenTAG pipeline

The gutenTAG package exports functions and example data to perform the following analyses:

  1. Read in raw datadata
  2. Perform data pre-processing
  3. Data processing
  4. Visualisation of processed targeted images
  5. Interoperability with object classes
  6. Quality Control

To highlight the usability of these function, the gutenTAG package also exports a number of test data files.

2.1 Read in raw data and panel

The raw imaging data can be read into R using the readMSIData function from Cardinal.

imzML_path <- system.file("extdata/Example_data.imzML", package = "gutenTAG")
rawFile <- Cardinal::readMSIData(imzML_path)

The readPanel function can be used to read a .csv file into R. containing the information for the mass of the peptide tag and the molecular target into R.

panel_path <- system.file("extdata/ref_list.csv", package = "gutenTAG")
panel <- readPanel(panel_path)

2.1.1 Example data

If you want to see how the package functions on test data, or do not have access to the required data yet, a good place to start is by loading in the example data from the github repository. The gutenTAG package contains example data generated by Bruker’s RapiFlex imaging system. The following section gives an overview of these files.

These files are accessible via:

path <- system.file("extdata", package = "gutenTAG")
list.files(path, recursive = TRUE)
## [1] "Example_data.ibd"        "Example_data.imzML"     
## [3] "Example_processed.Rdata" "ref_list.csv"

By default, the imaging data is read into an MSImagingExperiment object. Here, the extracted channel-specific intensities are stored in the imageData slot, accessible via the iData getter function from Cardinal.

2.2 Pre-processing

When acquiring the raw data, the resulting spectra can often be noisy and messy. To correct for this, some pre-processing steps are required. This can be done simply using the preProcess function from this package, which can be implemented as follows.

pre <- preProcess(rawFile)

This function is a wrapper for the Cardinal functions normalize, smoothSignal and reduceBaseline, using common parameters and methods for each function. For more flexibility in your pre-processing, it is recommended to directly use the Cardinal functions. Here, we use total ion current (TIC) as the normalisation method of choice, Gaussian smoothing for spectral smoothing and local minimum baseline reduction.

pre <- rawFile |>
    Cardinal::normalize(method = "tic") |>
    Cardinal::smooth(method = "gaussian") |>
    reduceBaseline(method = "locmin") |>
    process(BPPARAM = BiocParallel::bpparam())

2.3 Processing

Full processing of the pre-processed MALDI-imaging data can be performed using the peakDetection, generateMetapeaks and assignMetapeaks functions in series.

2.3.1 Peak Detection

The first step of processing is to identify peaks in the pre-processed spectra. This function is essentially a wrapper for Cardinal::peakPick, and uses the median adaptive deviation (MAD) peak detection algorithm for robust noise evaluation.

detected <- peakDetection(pre, snr = 3)

2.3.2 Metapeak generation

Once peakDetection has been performed, the detected peaks per pixel can be used to generate metapeaks. Here, metapeaks are defined as single peaks that correspond to clusters of peaks that contain information on the same molecular species. The function used to generate these metapeaks is generateMetapeaks. This step is motivated by the loss of signal through technical mass shifts as well as the presence of isotopic peaks in our spectra. The following steps are required to be construct metapeaks.

  1. Obtain the number of pixels in which each detected peak is present (counts histogram).
  2. Gaussian smoothing over the counts histogram.
  3. Watershed algorithm to cluster detected peaks
  4. Filter peaks present in less than a given threshold of pixels, which can be controlled using the threshold parameter.

It is important to note that metapeaks are not restricted to targeted peaks. Many of the metapeaks generated will be formed by signal that is from untargeted peaks.

metapeaks <- generateMetapeaks(detected, hist_smooth_factor = 1, fixed.limits = 0.5, sparsity = 1)

2.3.3 Assigning metapeaks

The final step of processing is to assign metapeaks to the reference list of masses that was loaded in using the readPanel function. This is performed using the assignMetapeaks function. The output of this function is a list of lists, containing primarily IntensityDF, the channel-specific intensities, SpatialCoords, the spatial coordinates and CorrespondenceMatrix, a data frame describing the assignment of metapeaks to reference peaks. This assignment can be controlled via the mz_threshold parameter, which has a default of 1m/z.

You may notice that some of the expected peaks in the reference list will not appear in the output. This is likely due to either a lack of signal, resulting in the metapeak not passing the filter threshold during generateMetapeaks, or the metapeak center being out of range for assignment to a reference peak during assignMetapeaks. These parameters can be adjusted to improve results.

processed <- assignMetapeaks(metapeaks, refList = panel, pre = pre, mz_threshold = 1)

2.4 Parallelisation

gutenTAG uses the BiocParallel framework for parallel processing. Both preProcess and peakDetection accept a BPPARAM argument that accepts any BiocParallelParam object. By default, both functions use BiocParallel::bpparam(), which returns the session-wide registered backend.

2.4.1 Setting a backend for the session

The recommended approach is to register a backend once at the top of your script, before calling any gutenTAG functions:

# Linux / macOS — fork-based, lowest overhead
BiocParallel::register(BiocParallel::MulticoreParam(workers = 4))

# Windows — socket-based (also works on Linux/macOS)
BiocParallel::register(BiocParallel::SnowParam(workers = 4))

# Single-core
BiocParallel::register(BiocParallel::SerialParam())

After registering, all subsequent calls to preProcess and peakDetection will use the registered backend automatically:

pre      <- preProcess(rawFile)
detected <- peakDetection(pre, snr = 3)

2.4.2 Passing a backend per call

You can also pass a backend directly to override the registered default for a single call:

pre      <- preProcess(rawFile, BPPARAM = BiocParallel::MulticoreParam(workers = 4))
detected <- peakDetection(pre, snr = 3, BPPARAM = BiocParallel::SnowParam(workers = 4))

2.4.3 Choosing the right backend

Platform Recommended backend Notes
Linux / macOS MulticoreParam(n) Fork-based, minimal overhead
Windows SnowParam(n) Socket-based, works on all systems
Any (serial) SerialParam() Default; no parallelisation

SnowParam also works on Linux/macOS and is the safest choice if you are writing code intended to run on multiple platforms.

2.4.4 Checking what backend is currently registered

BiocParallel::bpparam()

2.5 Visualisation of processed data

2.5.1 Cytomapper

This package supports visualisation of MALDI images through the cytomapper (Eling et al. 2020) framework developed by Eling et al (reference). In order to use the plotPixels function, the data must first be coerced to a CytoImageList object using the asCytoImageList function from this package.

library(cytomapper)
cil <- asCytoImageList(processed)

Now the channels can be visualised using the plotPixels function. Single channel and multichannel imaging is supported using this format.

cytomapper::plotPixels(image = cil, 
                       colour_by = "HLA.ABC", 
                       scale_bar = list(length = 5, lwidth = 0.5, margin = c(1,1)),
                       interpolate = FALSE, 
                       image_title = list(margin = c(1, 1)))

plotPixels(cil, colour_by = c("HLA.ABC", "CD98", "beta.actin"),
           legend = list(colour_by.title.cex = 0.7), 
           scale_bar = list(length = 5, lwidth = 0.5, margin = c(1,1)),
           image_title = list(margin = c(1, 1)),
           interpolate = FALSE)

2.5.2 Cytoviewer

Additionally, the images can be visualised using the cytoviewer R Shiny app developed by Meyer et al 2023 (Meyer, Eling, and Bodenmiller 2023). cytoviewer allows interactive visualistion of multi-channel images, supporting visualisation of up to 6 channels. It should be noted that the cell-level functionality of cytomapper and cytoviewer are not applicable for our data, since single-cell resolution is not yet possible. Interactive visualisation of our images can’t be performed through a vignette, but the call is very simple: cytoviewer(cil).

2.6 Interoperability

gutenTAG provides functions for easy conversion to some of the popular frameworks in the spatial omics ecosystem. These functions include:

  • asMSImagingExperiment() for MSImagingExperiment class from Cardinal for those who are familiar with typical MSI analysis and prefer to return to the Cardinal framework

  • asSpatialExperiment() for SpatialExperiment class, a spatial extension of the SingleCellExperiment class.

  • asCytoImageList() for CytoImageList class for image visualisation using the cytomapper and cytoviewer packages.

  • asAnnData() for AnnData class for access to the scverse ecosystem.

These functions are written to be called on the output of the assignMetapeaks() function.

2.7 Quality Control

gutenTAG provides many quality control functions. These can be useful for assessing data quality, filtering low performance peaks and assessing the overall quality of gutenTAG’s segmentation. A full QC report can be found in inst/qc_report.Rmd.

3 Contributions

John Abbey wrote the package. Pierre Bost conceptualised the metapeak generation step for data processing and was instrumental in developing the initial architecture of the pipeline. Leonardo Schwarz contributed enormously to the package codebase. Thank you to Nils Eling and Lasse Meyer for assisting in the development of the package, as well as helping with integrating the package with cytomapper and cytoviewer. We also thank Christian Panse and Witold Wolski of the Functional Genomics Center Zurich (FGCZ) for all of their support.

Session info

## R version 4.6.0 RC (2026-04-17 r89917)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] cytomapper_1.23.0           SingleCellExperiment_1.33.2
##  [3] SummarizedExperiment_1.41.1 Biobase_2.71.0             
##  [5] GenomicRanges_1.63.2        Seqinfo_1.1.0              
##  [7] IRanges_2.45.0              MatrixGenerics_1.23.0      
##  [9] matrixStats_1.5.0           EBImage_4.53.0             
## [11] gutenTAG_0.99.14            Cardinal_3.13.0            
## [13] S4Vectors_0.49.2            ProtGenerics_1.43.0        
## [15] BiocGenerics_0.57.1         generics_0.1.4             
## [17] BiocParallel_1.45.0         BiocStyle_2.39.0           
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-9             gridExtra_2.3            rlang_1.2.0             
##  [4] magrittr_2.0.5           svgPanZoom_0.3.4         shinydashboard_0.7.3    
##  [7] otel_0.2.0               compiler_4.6.0           flexmix_2.3-20          
## [10] systemfonts_1.3.2        png_0.1-9                fftwtools_0.9-11        
## [13] vctrs_0.7.3              pkgconfig_2.0.3          SpatialExperiment_1.21.0
## [16] fastmap_1.2.0            magick_2.9.1             XVector_0.51.0          
## [19] promises_1.5.0           rmarkdown_2.31           ggbeeswarm_0.7.3        
## [22] modeltools_0.2-24        xfun_0.57                cachem_1.1.0            
## [25] splus2R_1.3-5            jsonlite_2.0.0           later_1.4.8             
## [28] rhdf5filters_1.23.3      DelayedArray_0.37.1      Rhdf5lib_1.99.6         
## [31] jpeg_0.1-11              tiff_0.1-12              terra_1.9-11            
## [34] irlba_2.3.7              parallel_4.6.0           R6_2.6.1                
## [37] bslib_0.10.0             RColorBrewer_1.1-3       jquerylib_0.1.4         
## [40] Rcpp_1.1.1-1             bookdown_0.46            knitr_1.51              
## [43] nnet_7.3-20              httpuv_1.6.17            Matrix_1.7-5            
## [46] nnls_1.6                 tidyselect_1.2.1         viridis_0.6.5           
## [49] dichromat_2.0-0.1        abind_1.4-8              yaml_2.3.12             
## [52] codetools_0.2-20         lattice_0.22-9           tibble_3.3.1            
## [55] withr_3.0.2              shiny_1.13.0             S7_0.2.2                
## [58] evaluate_1.0.5           ontologyIndex_2.12       mclust_6.1.2            
## [61] pillar_1.11.1            BiocManager_1.30.27      matter_2.13.5           
## [64] sp_2.2-1                 RCurl_1.98-1.18          ggplot2_4.0.3           
## [67] scales_1.4.0             xtable_1.8-8             glue_1.8.1              
## [70] tools_4.6.0              locfit_1.5-9.12          rhdf5_2.55.16           
## [73] grid_4.6.0               nlme_3.1-169             raster_3.6-32           
## [76] CardinalIO_1.9.0         beeswarm_0.4.0           HDF5Array_1.39.1        
## [79] vipor_0.4.7              cli_3.6.6                textshaping_1.0.5       
## [82] viridisLite_0.4.3        S4Arrays_1.11.1          svglite_2.2.2           
## [85] dplyr_1.2.1              gtable_0.3.6             sass_0.4.10             
## [88] digest_0.6.39            SparseArray_1.11.13      rjson_0.2.23            
## [91] htmlwidgets_1.6.4        farver_2.1.2             htmltools_0.5.9         
## [94] lifecycle_1.0.5          h5mread_1.3.3            mime_0.13

References

Bemis, Kyle D., April Harry, Livia S. Eberlin, Christina Ferreira, Stephanie M. van de Ven, Parag Mallick, Mark Stolowitz, and Olga Vitek. 2015. “Cardinal: An R Package for Statistical Analysis of Mass Spectrometry-Based Imaging Experiments.” Bioinformatics, 1883–6. https://doi.org/doi:10.1093/bioinformatics/btv146.

Eling, Nils, Nicolas Damond, Tobias Hoch, and Bernd Bodenmiller. 2020. “Cytomapper: An R/Bioconductor Package for Visualization of Highly Multiplexed Imaging Data.” Bioinformatics 36 (24): 5706–8. https://doi.org/10.1093/bioinformatics/btaa1061.

Gargey Yagnik, Kenneth J. Rothschild, Ziying Liu, and Mark J. Lim. 2021. “Highly Multiplexed Immunohistochemical Maldi-Ms Imaging of Biomarkers in Tissues.” Journal of the American Society for Mass Spectrometry 32: 977–88. https://doi.org/doi: 10.1021/jasms.0c00473.

Giesen, Charlotte, Hao A. O. Wang, Denis Schapiro, Nevena Zivanovic, Andrea Jacobs, Bodo Hattendorf, Peter J. Schüffler, et al. 2014. “Highly Multiplexed Imaging of Tumor Tissues with Subcellular Resolution by Mass Cytometry.” Nature Methods 11 (4): 417–22.

Lin, Jia-Ren, Benjamin Izar, Shu Wang, Clarence Yapp, Shaolin Mei, Parin M. Shah, Sandro Santagata, and Peter K. Sorger. 2018. “Highly Multiplexed Immunofluorescence Imaging of Human Tissues and Tumors Using T-Cycif and Conventional Optical Microscopes.” eLife 7: 1–46.

Meyer, Lasse, Nils Eling, and Bernd Bodenmiller. 2023. “Cytoviewer: An R/Bioconductor Package for Interactive Visualization and Exploration of Highly Multiplexed Imaging Data.” https://doi.org/10.1101/2023.05.24.542115.

Moffitt, Jeffrey R., Emma Lundberg, and Holger Heyn. 2022. “The Emerging Landscape of Spatial Profiling Technologies.” Nature Reviews Genetics 23: 741–59.

Righelli, Dario, Lukas M. Weber, Helena L. Crowell, Brenda Pardo, Leonardo Collado-Torres, Shila Ghazanfar, Aaron T. L. Lun, Stephanie C. Hicks, and Davide Risso. 2022. “SpatialExperiment: Infrastructure for Spatially-Resolved Transcriptomics Data in R Using Bioconductor.” Bioinformatics 38 (11): –3. https://doi.org/10.1093/bioinformatics/btac299.