1 Introduction

SPICEY is an R package for quantifying cell-type specificity from single-cell omics data. It supports inputs from common analysis frameworks such as Seurat (for single-cell RNA-seq) and Signac (for single-cell ATAC-seq), and provides a unified interface for computing cell-type-specificity metrics at both the regulatory and transcriptional levels.

1.1 Preamble

SPICEY is designed to work with the results of differential analyses performed on single-cell ATAC-seq (to assess chromatin accessibility) or single-cell RNA-seq (to measure gene expression). These analyses typically compare features—either regulatory elements (e.g., peaks) or genes—across cell types to identify cell-type-specific activity or expression.

Rather than performing differential testing itself, SPICEY leverages pre-computed statistics from such analyses, specifically the log2 fold-change and associated adjusted p-values, which reflect both the magnitude and significance of cell-type enrichment. Using these inputs, SPICEY computes composite specificity scores that capture both the intensity and selectivity of differential signal across cell types.

In this tutorial, we illustrate the application of SPICEY on single-cell datasets derived from non-diabetic human pancreatic islets obtained through the Human Pancreas Analysis Program (HPAP) @ref. We analyzed paired single-nucleus ATAC-seq and RNA-seq data from three donors with similar total cell counts and comparable distributions of major pancreatic islet cell types. To associate distal regulatory elements with their putative target genes, we used co-accessibility links inferred via the Cicero package.

Differential expression and accessibility analyses were conducted using Seurat (FindMarkers()) function on both RNA and ATAC assays. All genes and regions were considered (logfc.threshold = 0), and the Wilcoxon Rank Sum test was used (test.use = "wilcox"), following Seurat default parameters.

1.2 Cell-Type Specificity Indices: RETSI and GETSI

SPICEY defines two complementary metrics to quantify cell-type specificity:

RETSI (Regulatory Element Tissue Specificity Index): Evaluates the specificity of regulatory elements (e.g., ATAC-seq peaks) across different cell types.
GETSI (Gene Expression Tissue Specificity Index): Evaluates the specificity of gene expression patterns across cell types.

Both RETSI and GETSI are calculated using a weighted specificity score for each feature (regulatory region or gene), reflecting the level of enrichment in each cell type. These specificity indices are computed as follows:

\[ TSI_{i,x} = \frac{FC_{i,x}}{\max(FC)_x} \cdot w_{i} \]

\[ w_{i} = \text{rescale}\left( -\log_{10}(padj_{i}),\ [0, 1] \right) \]

where:

\(i\) is a specific cell type
\(x \in \{r, g\}\), with \(r\) for regulatory region (RETSI) and \(g\) for gene (GETSI)
\(\text{FC}_{i,x}\) is the log fold-change of feature \(x\) in cell type \(i\)
\(w_{i}\) is the weight term, defined as the rescaled value of \(-\log_{10}(\text{adjusted } p\text{-value})\) between 0 and 1

SPICEY offers two strategies for linking regulatory elements to their putative target genes:

Nearest-gene assignment: based on proximity to the closest transcription start site (TSS).
Co-accessibility-based linking: leveraging correlations in chromatin accessibility (e.g., via Cicero).

These linking methods enable the integration of regulatory specificity (RETSI) and gene expression specificity (GETSI), facilitating the construction of detailed cell-type-specific regulatory networks from single-cell data.

Note: This vignette illustrates the use of SPICEY with preprocessed GRanges inputs, which must be prepared externally as they are not provided within the package.

1.3 Entropy-based Specificity Indices

In addition to fold change–based metrics, SPICEY incorporates an entropy-based approach to quantify the distributional concentration or distributional skewness of feature activity across cell types using Shannon entropy.

For each feature \(x\) (e.g., gene or regulatory region), the activity or expression scores across \(N\) cell types are normalized into a probability distribution. This allows measuring how specific or broadly distributed a feature’s activity is across cell types. Lower entropy values indicate higher cell-type specificity, while higher entropy reflects more ubiquitous expression or accessibility. These entropy-based indices are computed as follows:

\[ H(x) = - \sum_{i=1}^{N} p_{i,x} \log_2(p_{i,x}) \]

\[ H_{\text{norm}}(x) = 1 - e^{-H(x)} \] where:

\[ p_{i,x} = \frac{s_{i,x}}{\sum_{j=1}^{N} s_{j,x}} \]

being:

\(s_{i,x}\) is the specificity score (e.g., RETSI or GETSI) of feature \(x\) in cell type \(i\)
\(p_{i,x}\) is the normalized score, interpreted as the probability of feature \(x\)’s activity in cell type \(i\)
\(H(x)\) is the Shannon entropy, quantifying signal dispersion across cell types
\(H_{\text{norm}}(x)\) is the normalized entropy, bounded between 0 and 1:
- Values close to 0 indicate strong specificity
- Values close to 1 indicate broad distribution

This entropy framework provides a robust and interpretable measure of cell-type specificity that is complementary to fold change–based approaches.

2 Installation

Install the latest release of SPICEY from Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("SPICEY")

3 Input Requirements

library(SPICEY)

SPICEY requires preprocessed input files generated from differential analyses of single-cell omics data, specifically chromatin accessibility (ATAC-seq) and/or gene expression (RNA-seq). These inputs are not included in the package and must be prepared externally. Below is a detailed overview of the required input formats and metadata.

3.1 Single-cell ATAC-seq Data

Accepted Formats:
- A named list of GRanges objects or data.frames (convertible to GRanges), where each element contains differential accessibility results for a specific cell type.
- A GRangesList with each element representing a distinct cell type.
All inputs must be named with cell type identifiers.
Required Columns (per element):
- region_id — Unique peak identifier matching the format in your ATAC data (e.g., chr1-5000-5800 or chr1:5000-5800).
- avg_log2FC — Average log2 fold-change of accessibility in the given cell type.
- p_val_adj — Adjusted p-value from the differential test.
- cell_type — Cell type or cluster label for each measurement.

Each peak should have multiple entries, one per cell type, allowing SPICEY to assess cell-type-specific accessibility patterns.

data("atac")

3.2 Single-cell RNA-seq Data

Accepted Formats:
- A named list of GRanges objects or data.frames (convertible to GRanges), with each element containing differential expression results for a specific cell type.
- A GRangesList, where each element corresponds to a distinct cell type.
The list or GRangesList must be named with cell type identifiers.
Required Columns (per element):
- gene_id — Gene identifier column, consistent across all entries, using official gene symbols (e.g., GAPDH).
- avg_log2FC — Average log2 fold-change in expression for the gene within the specific cell type.
- p_val_adj — Adjusted p-value from the differential test.
- cell_type — Cell type or cluster label associated with the measurement.

Each gene should have multiple entries—one per cell type—enabling SPICEY to evaluate expression specificity across cell populations.

data("rna")

3.3 Co-accessibility Links (Optional for Region-to-Gene Linking)

Purpose:
Link regulatory elements to putative target genes based on co-accessibility, typically derived from tools like Cicero. This input is required only when using the annotation parameter for RETSI/GETSI integration.
Format:
A data.frame containing co-accessibility pairs and their scores.
Required Columns:
- Peaks1 — Genomic coordinate or ID of the first peak in the pair.
- Peaks2 — Genomic coordinate or ID of the second peak.
- coaccess — Co-accessibility score or correlation value between the paired peaks.

Note: Peak identifiers in Peaks1 and Peaks2 must exactly match those in the ATAC input, including coordinate formatting (e.g., chr1:5000-6000 vs. chr1-5000-6000). Inconsistent formats will disrupt accurate annotation.

data("cicero_links")
head(cicero_links)
#>                      Peak1                    Peak2  coaccess
#> 1 chr1-110209621-110211746 chr1-110211947-110212475 0.6000726
#> 2 chr1-110211947-110212475 chr1-110209621-110211746 0.6000726
#> 3 chr1-119541413-119543141 chr1-119552715-119554580 0.6251863
#> 4 chr1-119552715-119554580 chr1-119541413-119543141 0.6251863
#> 5 chr1-167654711-167655798 chr1-167683611-167684550 0.8876609
#> 6 chr1-167661501-167662210 chr1-167683611-167684550 0.6807298

4 Quick example

SPICEY offers a comprehensive main function, SPICEY(), which streamlines the entire workflow into a single, reproducible process. It also includes optional helper functions for region-to-gene annotation using different strategies: annotate_with_coaccessibility() and annotate_with_nearest().

When supplied with differential chromatin accessibility (ATAC) and/or gene expression (RNA) data from single-cell experiments, along with a chosen region-to-gene mapping method (“nearest” or “coaccessibility”), the SPICEY() function performs the following:

Computes tissue specificity scores from single-cell data:
- RETSI (Regulatory Element Tissue Specificity Index) for ATAC data
- GETSI (Gene Expression Tissue Specificity Index) for RNA data
(Optional) Annotates regulatory elements to target genes using either:
- Nearest gene transcription start site (TSS) (e.g., +/- 2000bp around the TSS)
- Co-accessibility links (e.g., derived from Cicero)
Builds tissue-specific regulatory networks by integrating RETSI and GETSI scores via the provided region-to-gene annotation.

The output is a structured list containing all intermediate and final results, facilitating downstream analysis and reuse.

# Load necessary packages
library(dplyr)
library(GenomicRanges)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)
library(SPICEY)

data(atac)
data(rna)
data(cicero_links)

# Annotate peaks to genes with coaccessibility
peaks <- unique(unlist(atac)[, c("region_id")])
annotation_coacc <- annotate_with_coaccessibility(
  peaks = peaks,
  txdb = TxDb.Hsapiens.UCSC.hg38.knownGene,
  links_df = cicero_links,
  annot_dbi = org.Hs.eg.db,
  protein_coding_only = TRUE,
  verbose = TRUE,
  add_tss_annotation = FALSE,
  upstream = 2000,
  downstream = 2000
)

# Calculate SPICEY measures and link them with coaccessibility
spicey_coacc <- SPICEY(
  rna = rna,
  atac = atac,
  annotation = annotation_coacc
)

# Plot results
spicey_heatmap(spicey_coacc$linked,
  spicey_measure = "SPICEY",
  combined_score = TRUE
)

5 Example: Step-by-Step SPICEY Workflow

This section outlines how to execute SPICEY using preprocessed input files, as detailed previously.

5.1 Computing RETSI

To calculate the Regulatory Element Tissue Specificity Index (RETSI), the following inputs are required:

atac: A named list of GRanges objects or data.frames (convertible to GRanges), each representing differential accessibility results for a distinct cell type.

SPICEY returns a data.frame combining the input differential accessibility data with RETSI scores for each regulatory region.

Note: Confirm that your inputs comply with the format outlined in the Input Requirements section prior to execution.

retsi <- SPICEY(atac = atac)
head(retsi)
#>   cell_type     p_val_adj         p_val avg_log2FC                 region_id
#> 1    Acinar 3.353304e-302  0.000000e+00   6.025791  chr7-142748901-142749963
#> 2    Acinar 3.353304e-302  0.000000e+00   6.160286    chr2-79119834-79120951
#> 3    Acinar 3.353304e-302  0.000000e+00   6.299025 chr10-116587488-116588562
#> 4    Acinar 3.353304e-302  0.000000e+00   6.849885 chr10-116553223-116553959
#> 5    Acinar 4.272632e-232 9.160344e-238   6.694460 chr10-116589603-116590250
#> 6    Acinar 4.123107e-214 8.839769e-220   5.184217   chr16-75217280-75217880
#>       RETSI RETSI_entropy
#> 1 1.0000000  0.0524689042
#> 2 1.0000000  0.0020261572
#> 3 1.0000000  0.0005612349
#> 4 1.0000000  0.0002746290
#> 5 0.7674589  0.0001781440
#> 6 0.7078037  0.0013891926

5.2 Computing GETSI

To calculate the Gene Expression Tissue Specificity Index (GETSI), the following inputs are required:

rna: A named list of GRanges objects or data.frames (convertible to GRanges), each containing differential gene expression results for a specific cell type.

SPICEY outputs a data.frame that integrates the input differential expression data with computed GETSI scores for each gene.

Note: Verify that your inputs adhere to the format described in the Input Requirements section before execution.

getsi <- SPICEY(rna = rna)
head(getsi)
#>   gene_id         p_val avg_log2FC pct.1 pct.2     p_val_adj cell_type
#> 1   REG1A  0.000000e+00   6.446878 0.392 0.012 1.053548e-303    Acinar
#> 2   PRSS1 3.793351e-218   5.111544 0.608 0.067 7.442554e-214    Acinar
#> 3   PNLIP 5.687535e-208   5.156786 0.471 0.038 1.115894e-203    Acinar
#> 4   PRSS2 1.241892e-185   4.545974 0.568 0.066 2.436593e-181    Acinar
#> 5   REG1B 9.599031e-101   4.519686 0.176 0.010  1.883330e-96    Acinar
#> 6   PLCB4  5.936709e-84  -2.027707 0.458 0.933  1.164782e-79    Acinar
#>        GETSI GETSI_entropy
#> 1 1.00000000   0.001229388
#> 2 0.70344625   0.104333983
#> 3 0.66985991   0.006319652
#> 4 0.59612780   0.432573280
#> 5 0.31594796   0.000000000
#> 6 0.05136069   0.700016953

5.3 Computing SPICEY

SPICEY offers an integrated approach to quantify tissue specificity at both the epigenetic and transcriptional levels by simultaneously computing Regulatory Element Tissue Specificity Indices (RETSI) and Gene Expression Tissue Specificity Indices (GETSI) from single-cell data.

By supplying both atac and rna inputs together to the SPICEY() function, the tool calculates RETSI and GETSI scores and returns a list with each specificity measure provided separately.

spicey <- SPICEY(
  atac = atac,
  rna = rna
)
lapply(spicey, head)
#> $RETSI
#>   cell_type     p_val_adj         p_val avg_log2FC                 region_id
#> 1    Acinar 3.353304e-302  0.000000e+00   6.025791  chr7-142748901-142749963
#> 2    Acinar 3.353304e-302  0.000000e+00   6.160286    chr2-79119834-79120951
#> 3    Acinar 3.353304e-302  0.000000e+00   6.299025 chr10-116587488-116588562
#> 4    Acinar 3.353304e-302  0.000000e+00   6.849885 chr10-116553223-116553959
#> 5    Acinar 4.272632e-232 9.160344e-238   6.694460 chr10-116589603-116590250
#> 6    Acinar 4.123107e-214 8.839769e-220   5.184217   chr16-75217280-75217880
#>       RETSI RETSI_entropy
#> 1 1.0000000  0.0524689042
#> 2 1.0000000  0.0020261572
#> 3 1.0000000  0.0005612349
#> 4 1.0000000  0.0002746290
#> 5 0.7674589  0.0001781440
#> 6 0.7078037  0.0013891926
#> 
#> $GETSI
#>   gene_id         p_val avg_log2FC pct.1 pct.2     p_val_adj cell_type
#> 1   REG1A  0.000000e+00   6.446878 0.392 0.012 1.053548e-303    Acinar
#> 2   PRSS1 3.793351e-218   5.111544 0.608 0.067 7.442554e-214    Acinar
#> 3   PNLIP 5.687535e-208   5.156786 0.471 0.038 1.115894e-203    Acinar
#> 4   PRSS2 1.241892e-185   4.545974 0.568 0.066 2.436593e-181    Acinar
#> 5   REG1B 9.599031e-101   4.519686 0.176 0.010  1.883330e-96    Acinar
#> 6   PLCB4  5.936709e-84  -2.027707 0.458 0.933  1.164782e-79    Acinar
#>        GETSI GETSI_entropy
#> 1 1.00000000   0.001229388
#> 2 0.70344625   0.104333983
#> 3 0.66985991   0.006319652
#> 4 0.59612780   0.432573280
#> 5 0.31594796   0.000000000
#> 6 0.05136069   0.700016953

5.4 Building Tissue-Specific Regulatory Networks

SPICEY facilitates the construction of tissue-specific regulatory networks by integrating RETSI and GETSI scores through the association of regulatory elements with their putative target genes. This linkage connects each regulatory region’s cell type-specific RETSI score to the corresponding gene’s GETSI score, enabling comprehensive downstream network analyses.

To establish these links, regulatory regions must be annotated to target genes. SPICEY supports two annotation strategies:

Proximity-based annotation — assigning regions to the nearest transcription start site (TSS).
Co-accessibility-based annotation — leveraging promoter–enhancer co-accessibility relationships derived from snATAC-seq data (e.g., via Cicero).

5.5 Annotating Regions to Putative Target Genes (Optional)

5.5.0.1 Nearest Gene Annotation

This method links each regulatory region to its closest gene TSS.

5.5.0.1.1 Required inputs:

peaks: A GRanges object or data.frame containing peaks to annotate, including these columns:
- seqnames: Chromosome name (e.g., chr1)
- start: Peak start coordinate
- end: Peak end coordinate
- region_id: Unique region identifier in genomic coordinate format (e.g., chr1-5000-5800 or chr1:5000-5800)
txdb: A TxDb object for genome annotation (e.g., TxDb.Hsapiens.UCSC.hg38.knownGene).
annot_dbi: An AnnotationDbi object for gene metadata (e.g., org.Hs.eg.db).
upstream: Number of bases upstream from the TSS to consider (default: 2000).
downstream: Number of bases downstream from the TSS to consider (default: 2000).

5.5.0.1.2 Optional parameters:

protein_coding_only: Limit annotations to protein-coding genes (default: TRUE).
add_tss_annotation: Annotate regulatory elements overlapping the TSS (default: FALSE). If enabled, uses ±1 bp around TSS instead of ±2000 bp.
verbose: Display progress messages (default: TRUE).

peaks <- unique(unlist(atac)[, c("region_id")])

annotation_near <- annotate_with_nearest(
  peaks = peaks,
  txdb = TxDb.Hsapiens.UCSC.hg38.knownGene,
  annot_dbi = org.Hs.eg.db,
  protein_coding_only = TRUE,
  verbose = TRUE,
  add_tss_annotation = FALSE,
  upstream = 2000,
  downstream = 2000
)

head(annotation_near)
#>                                           region_id distanceToTSS gene_id
#> chr7-142748901-142749963   chr7-142748901-142749963      25424083    WNT2
#> chr2-79119834-79120951       chr2-79119834-79120951             0   REG1A
#> chr10-116587488-116588562 chr10-116587488-116588562      15524492 KAZALD1
#> chr10-116553223-116553959 chr10-116553223-116553959      15490227 KAZALD1
#> chr10-116589603-116590250 chr10-116589603-116590250      15526607 KAZALD1
#> chr16-75217280-75217880     chr16-75217280-75217880       8850526    CDH5
#>                           annotation
#> chr7-142748901-142749963      Distal
#> chr2-79119834-79120951      Promoter
#> chr10-116587488-116588562     Distal
#> chr10-116553223-116553959     Distal
#> chr10-116589603-116590250     Distal
#> chr16-75217280-75217880       Distal

This function returns a data frame linking each regulatory region to its nearest gene based on genomic proximity.

5.5.0.2 Annotate to co-accessible Genes

This method connects regulatory regions to genes through co-accessibility interactions, which must be computed externally (e.g., using Cicero).

5.5.0.2.1 Required inputs:

peaks: A GRanges object or data.frame of peaks to annotate, including the following columns:
- seqnames: Chromosome name (e.g., chr1)
- start: Peak start coordinate
- end: Peak end coordinate
- region_id: Unique region identifier in genomic coordinate format (e.g., chr1-5000-5800 or chr1:5000-5800)
txdb: A TxDb object for genome annotation (e.g., TxDb.Hsapiens.UCSC.hg38.knownGene).
annot_dbi: An AnnotationDbi object for gene metadata (e.g., org.Hs.eg.db).
links_df: A data.frame containing co-accessibility links or scores (e.g., from Cicero), with these required columns:
- Peaks1: Genomic coordinate or ID of the first peak in the pair
- Peaks2: Genomic coordinate or ID of the second peak linked by co-accessibility
- coaccess: Co-accessibility score or correlation value between peaks
upstream: Number of bases upstream from the TSS to consider (default: 2000).
downstream: Number of bases downstream from the TSS to consider (default: 2000).

5.5.0.2.2 Optional parameters:

protein_coding_only: Limit annotations to protein-coding genes (default: TRUE).
add_tss_annotation: Annotate regulatory elements overlapping the TSS (default: FALSE). If enabled, uses ±1 bp around TSS instead of ±2000 bp.
verbose: Display progress messages (default: TRUE).

peaks <- unique(unlist(atac)[, c("region_id")])

annotation_coacc <- annotate_with_coaccessibility(
  peaks = peaks,
  txdb = TxDb.Hsapiens.UCSC.hg38.knownGene,
  links_df = cicero_links,
  annot_dbi = org.Hs.eg.db,
  protein_coding_only = TRUE,
  verbose = TRUE,
  add_tss_annotation = FALSE,
  upstream = 2000,
  downstream = 2000
)

head(annotation_coacc)
#>                   region_id gene_id
#> 1  chr1-110209621-110211746   KCNC4
#> 2    chr1-20290295-20292065  UBXN10
#> 3    chr1-20290295-20292065 PLA2G2C
#> 4 chr11-102177735-102178851    YAP1
#> 5 chr11-102551894-102553101   BIRC3
#> 6 chr11-102551894-102553101    MMP7

This function returns a data frame linking each regulatory region to its associated gene based on co-accessibility relationships.

5.5.0.3 Using custom region-to-gene annotation

The annotation step(s) are optional. Users can either generate these annotations using SPICEY’s built-in functions or supply a pre-computed data.frame of gene–region associations. When using pre-computed data to construct tissue-specific regulatory networks, the data.frame must include at minimum the following columns:

region_id: Unique identifier for each region, ideally following standard genomic coordinate formats such as chr-start-end (e.g., chr1-5000-5800) or chr:start-end (e.g., chr1:5000-5800). It should be the same being provided to SPICEY as region_id.
gene_id: Unique gene identifier. It should be the same being provided to SPICEY as gene_id.

5.5.1 Integrating SPICEY measures to build tissue-specific networks

With the annotation linking regulatory elements to their target genes (via TSS proximity or co-accessibility), SPICEY enables integration of RETSI and GETSI scores. This links each regulatory region’s cell type-specific RETSI score with its target gene’s corresponding GETSI score, facilitating the construction of detailed tissue-specific regulatory networks.

This approach supports comprehensive analysis of regulatory mechanisms at both epigenetic and transcriptional levels.

# Using nearest gene annotation data
spicey_near <- SPICEY(
  rna = rna,
  atac = atac,
  annotation = annotation_near
)


# Using co-accessibility gene annotation data
spicey_coacc <- SPICEY(
  rna = rna,
  atac = atac,
  annotation = annotation_coacc
)

This function returns a list containing separately computed RETSI and GETSI scores, along with an additional element that links these measures based on the selected region-to-gene annotation method.

6 Output Description

The primary output is a data.frame or a list of data.frames representing genomic regions (e.g., enhancers or promoters), each annotated with rich metadata derived from chromatin accessibility and gene expression data. Key columns include:

6.0.1 SPICEY Measures

Column	Description
`cell_type`	Cell type exhibiting significant accessibility or expression (e.g., `"Acinar"`).
`p_val_adj`	Adjusted p-value from differential testing, controlling for multiple comparisons.
`avg_log2FC`	Average log2 fold-change of accessibility/expression in the specified cell type vs others.
`region_id`	Genomic interval identifier (e.g., `"chr7:142748901-142749963"`).
`RETSI`	Regulatory Element Tissue Specificity Index; values near 1 indicate high cell-type specificity.
`RETSI_entropy`	Entropy-based specificity metric; lower values indicate greater cell-type specificity.
`gene_id` or `nearestGeneSymbol`	Gene identifier (official gene symbol).
`GETSI`	Gene Expression Tissue Specificity Index for linked genes, indicating expression specificity.
`GETSI_entropy`	Entropy-based specificity metric for gene expression, analogous to `RETSI_entropy`.

6.0.2 Linked SPICEY Measures

Column	Description
`cell_type`	Cell type showing significant accessibility (e.g., `"Acinar"`).
`p_val_adj_ATAC`	Adjusted p-value from differential accessibility testing.
`avg_log2FC_ATAC`	Average log2 fold-change in accessibility for the cell type versus others.
`region_id`	Genomic interval identifier (e.g., `"chr7:142748901-142749963"`).
`RETSI`	Regulatory Element Tissue Specificity Index.
`RETSI_entropy`	Entropy-based specificity measure for accessibility.
`gene_id` or `nearestGeneSymbol`	Gene identifier (official gene symbol).
`p_val_adj_RNA`	Adjusted p-value from differential expression testing.
`avg_log2FC_RNA`	Average log2 fold-change in gene expression for the cell type versus others.
`pct.1`	Percentage of cells expressing the gene in the first group of the differential analysis.
`pct.2`	Percentage of cells expressing the gene in the second group.
`GETSI`	Gene Expression Tissue Specificity Index for linked genes.
`GETSI_entropy`	Entropy-based specificity metric for gene expression.

This annotated data.frame serves as a comprehensive resource combining genomic coordinates, cell type-specific regulatory activity, gene associations, and expression specificity—enabling advanced functional genomics analyses.

7 Visualizing cell-type specificity

The SPICEY package provides a flexible function for visualizing gene-regulatory-level specificity across cell types using the spicey_heatmap() function. This function supports visualizing:

RETSI
GETSI
SPICEY, which is the combined score of both RETSI and GETSI.

These visualizations help uncover transcriptional and regulatory programs that are specific to individual cell types and can assist in identifying marker genes or highly specific regulatory programs.

The spicey_heatmap() function performs the following steps:

Filters and scales the input: Extracts the SPICEY scores for the selected specificity index (RETSI, GETSI, or a combined SPICEY score).
Selects top genes: For each cell type, identifies the top n genes based on their SPICEY score ranks.
Generates a heatmap: Creates a gene-by-cell-type heatmap. Genes are ordered by their highest-scoring cell type and their SPICEY score magnitude.

Depending on the spicey_measure and combined_score arguments, the function adapts the output:

Single-modality visualization:
- spicey_measure = "RETSI": Plots scored RETSI values, where RETSI reflects cell-type-specific accessibility of regulatory regions linked to genes.
- spicey_measure = "GETSI": Plots scored GETSI values based on gene expression.
Dual-modality (SPICEY) visualization:
- spicey_measure = "SPICEY" and combined_score = FALSE: Displays two separate heatmaps -one for RETSI, one for GETSI - enabling side-by-side comparison.
- spicey_measure = "SPICEY" and combined_score = TRUE: RETSI and GETSI are independently scored and averaged per gene-cell-type pair. A single unified heatmap is generated to highlight shared specificity across both data types.

7.0.1 Required inputs:

To run spicey_heatmap(), the input must be a data.frame with the following columns:

gene_id:
- For RETSI: The gene annotated to the given regulatory region (e.g., from co-accessibility or distance to the nearest TSS).
- For GETSI: The expressed gene for which expression specificity is measured.
cell_type: Cell type or cluster label (e.g., "Acinar").
GETSI and/or RETSI score.

7.0.2 Arguments

df: A data.frame containing the columns described above.
top_n: Number of top-ranked genes to retain per cell type (default: 5).
spicey_measure: Score type to visualize. Must be one of "RETSI", "GETSI", or "SPICEY".
combined_score: Logical. If TRUE and spicey_measure = "SPICEY", computes a combined mean score across RETSI and GETSI (default: FALSE).

# RETSI and/or GETSI separately
spicey_heatmap(spicey_coacc$GETSI,
  spicey_measure = "GETSI"
)


# If only RETSI is calculated (only ATAC data available), join with coaccessibility or nearest gene annotation to show gene_id in the plots
retsi <- spicey_coacc$RETSI
retsi <- retsi |> left_join(annotation_near, by = c("region_id"))
spicey_heatmap(retsi,
  spicey_measure = "RETSI"
)

spicey_heatmap(spicey_coacc$linked,
  spicey_measure = "SPICEY",
  combined_score = FALSE
)

# SPICEY: combined average RETSI and GETSI scores
spicey_heatmap(spicey_coacc$linked,
  spicey_measure = "SPICEY",
  combined_score = TRUE
)

8 References

Appendix

A SessionInfo

sessionInfo()
#> R Under development (unstable) (2025-10-20 r88955)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] org.Hs.eg.db_3.22.0                     
#>  [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.22.0
#>  [3] GenomicFeatures_1.63.1                  
#>  [4] AnnotationDbi_1.73.0                    
#>  [5] Biobase_2.71.0                          
#>  [6] GenomicRanges_1.63.0                    
#>  [7] Seqinfo_1.1.0                           
#>  [8] IRanges_2.45.0                          
#>  [9] S4Vectors_0.49.0                        
#> [10] BiocGenerics_0.57.0                     
#> [11] generics_0.1.4                          
#> [12] dplyr_1.1.4                             
#> [13] SPICEY_1.1.0                            
#> [14] BiocStyle_2.39.0                        
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.2.1            farver_2.1.2               
#>  [3] blob_1.2.4                  Biostrings_2.79.1          
#>  [5] S7_0.2.0                    bitops_1.0-9               
#>  [7] fastmap_1.2.0               RCurl_1.98-1.17            
#>  [9] GenomicAlignments_1.47.0    XML_3.99-0.19              
#> [11] digest_0.6.37               lifecycle_1.0.4            
#> [13] KEGGREST_1.51.0             RSQLite_2.4.3              
#> [15] magrittr_2.0.4              compiler_4.6.0             
#> [17] rlang_1.1.6                 sass_0.4.10                
#> [19] tools_4.6.0                 yaml_2.3.10                
#> [21] rtracklayer_1.71.0          knitr_1.50                 
#> [23] S4Arrays_1.11.0             labeling_0.4.3             
#> [25] bit_4.6.0                   curl_7.0.0                 
#> [27] DelayedArray_0.37.0         RColorBrewer_1.1-3         
#> [29] abind_1.4-8                 BiocParallel_1.45.0        
#> [31] withr_3.0.2                 purrr_1.2.0                
#> [33] grid_4.6.0                  ggplot2_4.0.0              
#> [35] scales_1.4.0                tinytex_0.57               
#> [37] dichromat_2.0-0.1           SummarizedExperiment_1.41.0
#> [39] cli_3.6.5                   rmarkdown_2.30             
#> [41] crayon_1.5.3                httr_1.4.7                 
#> [43] rjson_0.2.23                DBI_1.2.3                  
#> [45] cachem_1.1.0                parallel_4.6.0             
#> [47] BiocManager_1.30.26         XVector_0.51.0             
#> [49] restfulr_0.0.16             matrixStats_1.5.0          
#> [51] vctrs_0.6.5                 Matrix_1.7-4               
#> [53] jsonlite_2.0.0              bookdown_0.45              
#> [55] bit64_4.6.0-1               magick_2.9.0               
#> [57] jquerylib_0.1.4             tidyr_1.3.1                
#> [59] glue_1.8.0                  codetools_0.2-20           
#> [61] cowplot_1.2.0               gtable_0.3.6               
#> [63] GenomeInfoDb_1.47.0         BiocIO_1.21.0              
#> [65] UCSC.utils_1.7.0            tibble_3.3.0               
#> [67] pillar_1.11.1               htmltools_0.5.8.1          
#> [69] R6_2.6.1                    evaluate_1.0.5             
#> [71] lattice_0.22-7              png_0.1-8                  
#> [73] Rsamtools_2.27.0            cigarillo_1.1.0            
#> [75] memoise_2.0.1               bslib_0.9.0                
#> [77] Rcpp_1.1.0                  SparseArray_1.11.1         
#> [79] xfun_0.54                   MatrixGenerics_1.23.0      
#> [81] pkgconfig_2.0.3

Measuring tissue specificity from single cell data with SPICEY

4 November 2025

Contents