The VectraPolarisData ExperimentHub package provides two large multiplex immunofluorescence datasets collected by Akoya Biosciences Vectra 3 and Vectra Polaris platforms. Image preprocessing (cell segmentation and phenotyping) was performed using Inform software. Data cover are formatted into objects of class SpatialExperiment.
VectraPolarisData 1.12.0
To retrieve a dataset, we can use a dataset’s corresponding named function <id>(), where <id> should correspond to one a valid dataset identifier (see ?VectraPolarisData). Below both the lung and ovarian cancer datasets are loaded this way.
library(VectraPolarisData)
spe_lung <- HumanLungCancerV3()
spe_ovarian <- HumanOvarianCancerVP()Alternatively, data can loaded directly from Bioconductor’s ExperimentHub as follows. First, we initialize a hub instance and store the complete list of records in a variable eh. Using query(), we then identify any records made available by the VectraPolarisData package, as well as their accession IDs (EH7311 for the lung cancer data). Finally, we can load the data into R via eh[[id]], where id corresponds to the data entry’s identifier we’d like to load. E.g.:
library(ExperimentHub)
eh <- ExperimentHub()        # initialize hub instance
q <- query(eh, "VectraPolarisData") # retrieve 'VectraPolarisData' records
id <- q$ah_id[1]             # specify dataset ID to load
spe <- eh[[id]]              # load specified datasetBoth the HumanLungCancerV3() and HumanOvarianCancerVP() datasets are stored as SpatialExperiment objects. This allows users of our data to interact with methods built for SingleCellExperiment, SummarizedExperiment, and SpatialExperiment class methods in Bioconductor. See this ebook for more details on SpatialExperiment. To get cell level tabular data that can be stored in this format, raw multiplex.tiff images have been preprocessed, segmented and cell phenotyped using Inform software from Akoya Biosciences.
The SpatialExperiment class was originally built for spatial transcriptomics data and follows the structure depicted in the schematic below (Righelli et al. 2021):
To adapt this class structure for multiplex imaging data we use slots in the following way:
assays slot: intensities, nucleus_intensities, membrane_intensitiessample_id slot: contains image identifier. For the VectraOvarianDataVP this also identifies the subject because there is one image per subjectcolData slot: Other cell-level characteristics of the marker intensities, cell phenotypes, cell shape characteristicsspatialCoordsNames slot: The x- and y- coordinates describing the location of the center point in the image for each cellmetadata slot: A dataframe of subject-level patient clinical characteristics.The following code shows how to transform the SpatialExperiment class object to a data.frame class object, if that is preferred for analysis. The example below is shown using the HumanOvarianVP dataset.
library(dplyr)
## Assays slots
assays_slot <- assays(spe_ovarian)
intensities_df <- assays_slot$intensities
rownames(intensities_df) <- paste0("total_", rownames(intensities_df))
nucleus_intensities_df<- assays_slot$nucleus_intensities
rownames(nucleus_intensities_df) <- paste0("nucleus_", rownames(nucleus_intensities_df))
membrane_intensities_df<- assays_slot$membrane_intensities
rownames(membrane_intensities_df) <- paste0("membrane_", rownames(membrane_intensities_df))
# colData and spatialData
colData_df <- colData(spe_ovarian)
spatialCoords_df <- spatialCoords(spe_ovarian)
# clinical data
patient_level_df <- metadata(spe_ovarian)$clinical_data
cell_level_df <- as.data.frame(cbind(colData_df, 
                                     spatialCoords_df,
                                     t(intensities_df),
                                     t(nucleus_intensities_df),
                                     t(membrane_intensities_df))
                               )
ovarian_df <- full_join(patient_level_df, cell_level_df, by = "sample_id")
The objects provided in this package are rich data sources we encourage others to use in their own analyses. If you do include them in your peer-reviewed work, we ask that you cite our package and the original studies.
To cite the VectraPolarisData package, use:
@Manual{VectraPolarisData,
    title = {VectraPolarisData: Vectra Polaris and Vectra 3 multiplex single-cell imaging data},
    author = {Wrobel, J and Ghosh, T},
    year = {2022},
    note = {Bioconductor R package version 1.0},
  }To cite the HumanLungCancerV3() data in bibtex format, use:
@article{johnson2021cancer,
  title={Cancer cell-specific MHCII expression as a determinant of the immune infiltrate organization and function in the non-small cell lung cancer tumor microenvironment.},
  author={Johnson, AM and Boland, JM and Wrobel, J and Klezcko, EK and Weiser-Evans, M and Hopp, K and Heasley, L and Clambey, ET and Jordan, K and Nemenoff, RA and others},
  journal={Journal of Thoracic Oncology: Official Publication of the International Association for the Study of Lung Cancer},
  year={2021}
}To cite the HumanOvarianCancerVP() data, use:
@article{steinhart2021spatial,
  title={The spatial context of tumor-infiltrating immune cells associates with improved ovarian cancer survival},
  author={Steinhart, Benjamin and Jordan, Kimberly R and Bapat, Jaidev and Post, Miriam D and Brubaker, Lindsay W and Bitler, Benjamin G and Wrobel, Julia},
  journal={Molecular Cancer Research},
  volume={19},
  number={12},
  pages={1973--1979},
  year={2021},
  publisher={AACR}
}Detailed tables representing what is provided in each dataset are listed here
In the table below note the following shorthand:
[marker] represents one of: cd3, cd8, cd14, cd19, cd68, ck, dapi, hladr,[cell region] represents one of: entire_cell, membrane, nucleusTable 1: data dictionary for HumanLungCancerV3
| Variable | Slot | Description | Variable coding | 
|---|---|---|---|
| [marker] | assays: intensities | mean total cell intensity for [marker] | |
| [marker] | assays: nucleus_intensities | mean nucleus intensity for [marker] | |
| [marker] | assays: membrane_intensities | mean membrane intensity for [marker] | |
| sample_id | image identifier, also subject id for the ovarian data | ||
| cell_id | colData | cell identifier | |
| slide_id | slide identifier, also the patient id for the lung data | ||
| tissue category | type of tissue (indicates a region of the image) | Stroma or Tumor | |
| [cell region]_[marker]_min | min [cell region] intensity for [marker] | ||
| [cell region]_[marker]_max | max [cell region] intensity for [marker] | ||
| [cell region]_[marker]_std_dev | [cell region] std dev of intensity for [marker] | ||
| [cell region]_[marker]_total | total [cell region] intensity for [marker] | ||
| [cell region]_area_square_microns | [cell region] area in square microns | ||
| [cell region]_compactness | [cell region] compactness | ||
| [cell region]_minor_axis | [cell region] length of minor axis | ||
| [cell region]_major_axis | [cell region] length of major axis | ||
| [cell region]_axis_ratio | [cell region] ratio of major and minor axis | ||
| phenotype_[marker] | cell phenotype label as determined by Inform software | ||
| cell_x_position | spatialCoordsNames | cell x coordinate | |
| cell_y_position | cell y coordinate | ||
| gender | metadata | gender | “M”, “F” | 
| mhcII_status | MHCII status, from Johnson et.al. 2021 | “low”, “high” | |
| age_at_diagnosis | age at diagnosis | ||
| stage_at_diagnosis | stage of the cancer when image was collected | ||
| stage_numeric | numeric version of stage variable | ||
| pack_years | pack-years of cigarette smoking | ||
| survival_days | time in days from date of diagnosis to date of death or censoring event | ||
| survival_status | did the participant pass away? | 0 = no, 1 = yes | |
| cause_of_death | cause of death | ||
| recurrence_or_lung_ca_death | did the participant have a recurrence or death event? | 0 = no, 1 = yes | |
| time_to_recurrence_days | time in days from date of diagnosis to first recurrent event | ||
| adjuvant_therapy | whether or not the participant received adjuvant therapy | “No”, “Yes” | 
In the table below note the following shorthand:
[marker] represents one of: cd3, cd8, cd19, cd68, ck, dapi, ier3, ki67, pstat3[cell region] represents one of: cytoplasm, membrane, nucleusTable 2: data dictionary for HumanOvarianCancerVP
| Variable | Slot | Description | Variable coding | 
|---|---|---|---|
| [marker] | assays: intensities | mean total cell intensity for [marker] | |
| [marker] | assays: nucleus_intensities | mean nucleus intensity for [marker] | |
| [marker] | assays: membrane_intensities | mean membrane intensity for [marker] | |
| sample_id | image identifier, also subject id for the ovarian data | ||
| cell_id | colData | cell identifier | |
| slide_id | slide identifier | ||
| tissue category | type of tissue (indicates a region of the image) | Stroma or Tumor | |
| [cell region]_[marker]_min | min [cell region] intensity for [marker] | ||
| [cell region]_[marker]_max | max [cell region] intensity for [marker] | ||
| [cell region]_[marker]_std_dev | [cell region] std dev of intensity for [marker] | ||
| [cell region]_[marker]_total | total [cell region] intensity for [marker] | ||
| [cell region]_area_square_microns | [cell region] area in square microns | ||
| [cell region]_compactness | [cell region] compactness | ||
| [cell region]_minor_axis | [cell region] length of minor axis | ||
| [cell region]_major_axis | [cell region] length of major axis | ||
| [cell region]_axis_ratio | [cell region] ratio of major and minor axis | ||
| cell_x_position | spatialCoordsNames | cell x coordinate | |
| cell_y_position | cell y coordinate | ||
| diagnosis | metadata | ||
| primary | primary tumor from initial diagnosis? | 0 = no, 1 = yes | |
| recurrent | tumor from a recurrent event (not initial diagnosis tumor)? | 0 = no, 1 = yes | |
| treatment_effect | was tumor treated with chemo prior to imaging? | 0 = no, 1 = yes | |
| stage | stage of the cancer when image was collected | I,II,II,IV | |
| grade | grade of cancer severity (nearly all 3) | ||
| survival_time | time in months from date of diagnosis to date of death or censoring event | ||
| death | did the participant pass away? | 0 = no, 1 = yes | |
| BRCA_mutation | does the participant have a BRCA mutation? | 0 = no, 1 = yes | |
| age_at_diagnosis | age at diagnosis | ||
| time_to_recurrence | time in months from date of diagnosis to first recurrent event | ||
| parpi_inhibitor | whether or not the participant received PARPi inhibitor | N = no, Y = yes | |
| debulking | subjective rating of how the tumor removal process went | optimal, suboptimal, interval | 
Note: the debulking variable described as optimal if surgeon believes tumor area was reduced to 1 cm or below; suboptimal if surgeon was unable to remove significant amount of tumor due to various reasons; interval if tumor removal came after three cycles of chemo
sessionInfo()
#> R version 4.5.0 RC (2025-04-04 r88126)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] dplyr_1.1.4                 VectraPolarisData_1.12.0   
#>  [3] SpatialExperiment_1.18.0    SingleCellExperiment_1.30.0
#>  [5] SummarizedExperiment_1.38.0 Biobase_2.68.0             
#>  [7] GenomicRanges_1.60.0        GenomeInfoDb_1.44.0        
#>  [9] IRanges_2.42.0              S4Vectors_0.46.0           
#> [11] MatrixGenerics_1.20.0       matrixStats_1.5.0          
#> [13] ExperimentHub_2.16.0        AnnotationHub_3.16.0       
#> [15] BiocFileCache_2.16.0        dbplyr_2.5.0               
#> [17] BiocGenerics_0.54.0         generics_0.1.3             
#> [19] BiocStyle_2.36.0           
#> 
#> loaded via a namespace (and not attached):
#>  [1] KEGGREST_1.48.0         rjson_0.2.23            xfun_0.52              
#>  [4] bslib_0.9.0             lattice_0.22-7          vctrs_0.6.5            
#>  [7] tools_4.5.0             curl_6.2.2              tibble_3.2.1           
#> [10] AnnotationDbi_1.70.0    RSQLite_2.3.9           blob_1.2.4             
#> [13] pkgconfig_2.0.3         Matrix_1.7-3            lifecycle_1.0.4        
#> [16] GenomeInfoDbData_1.2.14 compiler_4.5.0          Biostrings_2.76.0      
#> [19] htmltools_0.5.8.1       sass_0.4.10             yaml_2.3.10            
#> [22] pillar_1.10.2           crayon_1.5.3            jquerylib_0.1.4        
#> [25] DelayedArray_0.34.0     cachem_1.1.0            magick_2.8.6           
#> [28] abind_1.4-8             mime_0.13               tidyselect_1.2.1       
#> [31] digest_0.6.37           purrr_1.0.4             bookdown_0.43          
#> [34] BiocVersion_3.21.1      grid_4.5.0              fastmap_1.2.0          
#> [37] SparseArray_1.8.0       cli_3.6.4               magrittr_2.0.3         
#> [40] S4Arrays_1.8.0          withr_3.0.2             filelock_1.0.3         
#> [43] UCSC.utils_1.4.0        rappdirs_0.3.3          bit64_4.6.0-1          
#> [46] rmarkdown_2.29          XVector_0.48.0          httr_1.4.7             
#> [49] bit_4.6.0               png_0.1-8               memoise_2.0.1          
#> [52] evaluate_1.0.3          knitr_1.50              rlang_1.1.6            
#> [55] Rcpp_1.0.14             glue_1.8.0              DBI_1.2.3              
#> [58] BiocManager_1.30.25     jsonlite_2.0.0          R6_2.6.1