---
title: "Getting Started with SpNeigh"
author:
  - name: Jinming Cheng
    affiliation:
    - &duk_nus Centre for Biomedical Data Science,
               Duke-NUS Medical School,
               Singapore, 169857, 
               Singapore
    # email: jinming.cheng@outlook.com
    

date: "`r format(Sys.time(), '%d %B, %Y')`"
vignette: >
  %\VignetteIndexEntry{Getting Started with SpNeigh}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}

output:
  BiocStyle::html_document:
    toc_float: false
bibliography-style: plain
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    eval = TRUE,
    collapse = TRUE,
    comment = "#>",
    out.width = "100%",
    dev = "png",
    dpi = 50,
    fig.height = 4.2,
    fig.width = 5.6
)
```


# Introduction

Recent spatial transcriptomics technologies such as **10x Xenium**, 
**MERFISH**, and **Visium HD** enable high-resolution profiling of 
gene expression while preserving tissue architecture. 
These platforms have opened new avenues for studying tissue organization, 
cell–cell interactions, and spatial gene regulation.

Most existing methods focus on identifying globally spatially variable genes 
or enhancing spatial visualization, with limited support for 
modeling **local spatial context**, such as region boundaries, 
cell neighborhoods, or gradients across tissue structures.

**SpNeigh** is an R package that introduces a 
**boundary-aware and region-informed framework** for spatial 
neighborhood analysis and spatial differential expression modeling. 
Key features include:

- Boundary detection and neighborhood ring construction  
- Spatial weighting using distances to centroids or boundaries  
- Spline-based modeling of gene expression along spatial gradients  
- A spatial enrichment index (SEI) to quantify boundary- or 
region-enriched genes  

Unlike existing Bioconductor tools such as **spatialDE**,
which emphasize global spatial variability, 
SpNeigh enables **local, interpretable**, and **geometry-aware** modeling 
of spatial expression patterns.

SpNeigh supports inputs from both **SpatialExperiment** and **Seurat** objects 
for coordinate extraction and visualization, and emphasizes transparent 
modeling using user-supplied matrices and weights—ensuring reproducibility 
and compatibility with Bioconductor workflows.

By integrating spatial geometry with transcriptomic data, SpNeigh reveals 
biologically meaningful patterns such as **intermediate cell states**, 
**microenvironmental differences**, and **spatial gene expression gradients**, 
which are critical for understanding development, tissue function, 
and disease progression.


# Overview of this vignette

This vignette demonstrates the key functionalities of the **SpNeigh** 
package using a real  
[10x Xenium Fresh Frozen Mouse Brain Tiny Subset dataset](https://www.10xgenomics.com/datasets/fresh-frozen-mouse-brain-for-xenium-explorer-demo-1-standard).

Specifically, we show how to:

- Identify spatial boundaries and construct neighboring ring regions  
- Explore spatial interaction patterns between cell clusters  
- Perform differential expression analysis between spatially 
defined cell populations  
- Compute spatial weights based on boundary or centroid proximity  
- Model spatial expression gradients using spline-based differential 
expression  
- Quantify spatial enrichment along spatial gradients  

This workflow highlights the analytical flexibility of **SpNeigh** and 
illustrates how neighborhood-aware modeling can reveal spatial gene expression 
patterns that are not readily captured by existing approaches.


# Installation

The **SpNeigh** package can be installed from GitHub by using:

```{r, eval=FALSE}
devtools::install_github("jinming-cheng/SpNeigh")
```


<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<!-- %                                                 % -->

# Load packages

```{r loadPackage}
library(SpNeigh)
library(ggplot2)
```

# Load data

## Input: Coordinate data frame and Normalized expression matrix

Coordinates of mouse brain dataset
```{r}
coords <- readRDS(system.file("extdata", "MouseBrainCoords.rds",
    package = "SpNeigh"
))
head(coords)
```


Ensure the rownames of `coords` are the same with the cell names
```{r}
rownames(coords) <- coords$cell
```


Log normalized expression data generated by `NormalizeData` 
function in *Seurat* package 
```{r}
logNorm_expr <- readRDS(system.file("extdata", "LogNormExpr.rds",
    package = "SpNeigh"
))
class(logNorm_expr)
```


Annotate clusters based on anatomical brain regions
```{r}
new_cell_types <- c(
    "0" = "Meninges",
    "1" = "Cerebral_cortex",
    "2" = "White_matter",
    "3" = "Cerebral_cortex",
    "4" = "Cerebral_cortex",
    "5" = "Thalamus",
    "6" = "Cerebral_cortex",
    "7" = "Hippocampus",
    "8" = "Hippocampus",
    "9" = "Hippocampus",
    "10" = "White_matter",
    "11" = "Cerebral_cortex"
)
```


In this vignette, we focus on cluster-level rather than 
cell type–level analyses.

<!-- %                                                 % -->
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->


<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<!-- %                                                 % -->

## Alternative input: SpatialExperiment object

Coordinates of cells in clusters 0 and 2
```{r}
coords_sub <- subset(coords, cluster %in% c("0", "2"))
coords_sub <- as.matrix(coords_sub[, c("x", "y")])
```

metadata of cells in clusters 0 and 2
```{r}
metadata_sub <- subset(coords[, c("cell", "cluster")], cluster %in% c("0", "2"))
```


Create SpatialExperiment
```{r}
spe <- SpatialExperiment::SpatialExperiment(
    assay = list("logcounts" = logNorm_expr),
    colData = metadata_sub,
    spatialCoords = coords_sub
)
spe
```

Extract coordinates from SpatialExperiment object
```{r}
spe_coords <- extractCoords(spe)
```

Extract log-normalized expression matrix from SpatialExperiment object
```{r}
spe_logcounts <- SingleCellExperiment::logcounts(spe)
```

<!-- %                                                 % -->
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->


<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<!-- %                                                 % -->

## Alternative input: Seurat object

Load Seurat pacakge
```{r}
library(Seurat)
```

Create Seurat object (Note: a normalized expression matrix is used here 
for tutorial purposes only. In practice, use a raw count matrix.)
```{r}
seu_sp <- CreateSeuratObject(
    assay = "Spatial",
    counts = logNorm_expr,
    meta.data = metadata_sub
)
seu_sp
``` 

Insert normalized data into the `data` layer (Seurat v5)
```{r}
LayerData(seu_sp, assay = "Spatial", layer = "data") <- logNorm_expr
```


Add FOV
```{r}
cents <- CreateCentroids(coords_sub[, c("x", "y")])

fov <- CreateFOV(
    coords = list("centroids" = cents),
    type = c("centroids"),
    assay = "Spatial"
)

seu_sp[["fov"]] <- fov
```

Add `seurat_clusters` to the metadata
```{r}
seu_sp$seurat_clusters <- seu_sp$cluster
```

Extract coordinates from Seurat object
```{r}
seu_sp_coords <- extractCoords(seu_sp)
```

Extract log-normalized expression matrix from Seurat object
```{r}
seu_sp_log_expr <- GetAssayData(seu_sp)
```

<!-- %                                                 % -->
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->


<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<!-- %                                                 % -->

# Neighborhood analysis for cluster 2 cells

## Quick look at the boundaries of one cluster

Quick look at the boundaries of cluster 2 using default parameter settings. 
The input is a SpatialExperiment object.
```{r PlotBoundary_spe, fig.width=6.6, fig.height=4.2}
plotBoundary(spe, one_cluster = "2")
```

Similarly, we can use a Seurat object as input and adjust parameters
```{r PlotBoundary_seu, fig.width=6.6, fig.height=4.2}
plotBoundary(seu_sp, one_cluster = "2", eps = 120)
```

## Detect spatial boundaries

Extract boundaries of cluster 2. Adjust the `eps` and `minPts` parameters 
to refine the number of subregions detected by the default 
dbscan clustering method. 

Note: The `eps` parameter controls the clustering tolerance for boundary 
detection based on spatial distance. The default value is `eps = 80`, 
which is suitable for coordinates in the range of hundreds or thousands 
(e.g., 10x Xenium or Visium HD data). 
For datasets with smaller coordinate ranges (e.g., 1 to 5), 
such as manually scaled or normalized inputs, 
a smaller value such as `eps = 0.1` may be more appropriate.
```{r}
bon_points_c2 <- getBoundary(
    data = coords,
    one_cluster = 2,
    eps = 120,
    minPts = 10
)
table(bon_points_c2$region_id)
```


Add boundaries of cluster 2 to the spatial plot
```{r PlotWithBoundary_C2, fig.width=6.6, fig.height=4.2}
plotBoundary(coords, boundary = bon_points_c2)
```

Alternatively, add boundaries of cluster 2 using addBoundary() function
```{r}
# plotBoundary(coords) + addBoundary(boundary = bon_points_c2)
```

Plot boundary regions
```{r PlotRegion_C2,fig.height = 2.8,  fig.width = 5.6}
bon_polys_c2 <- buildBoundaryPoly(bon_points_c2)
plotRegion(bon_polys_c2)
```


## Obtain neighborhood ring regions

Get spatial ring-shaped regions by subtracting the original boundary polygons 
from their corresponding outer buffered polygons. 
The `outer_boundary` is automatically computed when it is not supplied.
```{r}
ring_regions <- getRingRegion(boundary = bon_points_c2)
```


Plot ring regions
```{r PlotRing_C2,fig.height = 2.8,  fig.width = 5.6}
plotRegion(ring_regions)
```


## Statistics of cells inside rings

Get cells inside rings for cluster 2. (`getCellsInside()` takes a few seconds
to run on this example, but may require several minutes or longer 
for larger datasets.)
```{r}
cells_ring <- getCellsInside(data = coords, boundary = ring_regions)
cells_ring
```


Plot cells inside rings
```{r PlotCellsInsideRing_C2,fig.height = 3.6,  fig.width = 5.6}
plotCellsInside(cells_ring, point_size = 0.2)
```


Obtain statistics of cells inside rings for cluster 2
```{r}
stats_ring <- statsCellsInside(cells_ring)
```


Plot proportion of cells in different clusters for each sub region 
using bar plot. 
<!-- The `stat_column` is used to plot "proportion" or "count". -->
```{r PlotStatsBar_Proportion_C2_Ring, fig.height = 4.2,  fig.width = 6.6}
plotStatsBar(stats_ring, stat_column = "proportion")
```


Plot proportion of cells in different clusters for each sub region 
using donut chart. If `plot_donut = FALSE`, pie chart is used.
```{r PlotStats_Donut_C2_Ring, fig.height = 3.5,  fig.width = 5.6}
plotStatsPie(stats_ring, plot_donut = TRUE)
```


## Neighborhood interaction of clusters inside rings

Compute neighborhood interaction matrix using K-nearest neighbors for 
cells inside rings
```{r}
interaction_matrix <- computeSpatialInteractionMatrix(
    subset(coords, cell %in% cells_ring$cell)
)
interaction_matrix
```


Heatmap of a row-scaled interaction matrix for cells inside rings 
for cluster 2. 
The heatmap reveals that clusters 2, 3, and 4 are spatially adjacent 
to one another. 
```{r Heatmap_InteractionMatrix_C2_Ring, fig.height = 4.2,  fig.width = 5.6}
plotInteractionMatrix(interaction_matrix)
```

The plot of cells inside rings split by clusters further confirms the 
co-occurrence of clusters 2, 3, 4 in ring region 1.
```{r PlotCellsInside_SplitByCluster_C2_Ring, fig.width=8.4, fig.height=4.2}
plotCellsInside(cells_ring) +
    facet_wrap(~cluster) +
    Seurat::RotatedAxis() +
    addBoundaryPoly(ring_regions, linewidth_boundary = 0.1)
```

<!-- %                                                 % -->
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->


<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<!-- %                                                 % -->

# DE analysis of cluster 2 cells inside and outside boundaries

Most cells in cluster 2 are located in sub region 1. Hence, we focus on the 
DE analysis between cluster 2 cells inside boundary of sub region 1 and 
cells in the outside neighboring ring region 1.


Get cells inside the boundaries of cluster 2
```{r}
cells_inside <- getCellsInside(data = coords, boundary = bon_points_c2)
cells_inside
```


Obtain cluster 2 cells inside and outside boundary of sub region
```{r}
cells_in <- subset(cells_inside, region_id == 1 & cluster == 2)[["cell"]]
cells_out <- subset(cells_ring, region_id == 1 & cluster == 2)[["cell"]]
```


Perform DE analysis between cells inside boundary and outside boundary 
using the `limma` framewrok (`lmFit` + `eBayes`)
```{r}
tab <- runLimmaDE(
    exp_mat = logNorm_expr,
    min.pct = 0.25,
    cells_reference = cells_in,
    cells_target = cells_out
)
```


Top DE genes between cells outside boundary and cells inside boundary. 
The DE genes are ordered by the abstract value of `logFC` by default.
```{r}
head(tab[, c("gene", "logFC", "adj.P.Val", "pct.reference", "pct.target")])
```


Set a random seed to make plotExpression results reproducible
```{r}
set.seed(123)
```


Expression of top DE genes in cluster 2 cells outside boundary. 
The cells are plotted randomly using the given seed when `shuffle = TRUE`. 
If `shuffle = FALSE`, cells with higher expression values 
are plotted last (on the top).
```{r Expression_Slc17a7_c2, fig.width = 8, fig.height = 2.5}
plotExpression(
    data = coords[colnames(logNorm_expr), ],
    exp_mat = logNorm_expr,
    genes = tab$gene[1:2],
    sub_plot = TRUE,
    one_cluster = 2,
    shuffle = TRUE,
    point_size = 0.1,
    angle_x_label = 45
)
```


Expression of *Slc17a7* for cluster 2 cells inside and outside boundary 
of sub region 1. Cluster 2 cells outside the boundary show much 
higher expression of Slc17a7.
```{r Expression_Slc17a7_InOutBoundary,fig.height = 2.8, fig.width = 5.6}
plotExpression(
    data = coords[colnames(logNorm_expr), ],
    exp_mat = logNorm_expr,
    genes = "Slc17a7",
    sub_plot = TRUE,
    shuffle = TRUE,
    sub_cells = c(cells_in, cells_out),
    point_size = 0.3
) +
    addBoundaryPoly(bon_polys_c2[1, ], linewidth_boundary = 0.5)
``` 


Expression of a general marker gene *Sox10* for Oligodendrocytes 
(cluster 2 cells). *Sox10* is expressed in both cluster 2 cells 
inside and outside the boundary. This suggests that cluster 2 cells 
located near the boundary may represent intermediate cell states.
```{r PlotExpression_Sox10,fig.height = 2.8,  fig.width = 5.6}
plotExpression(
    data = coords[colnames(logNorm_expr), ],
    exp_mat = logNorm_expr,
    sub_cells = c(cells_in, cells_out),
    sub_plot = TRUE,
    genes = "Sox10",
    shuffle = TRUE,
    point_size = 0.3
) +
    addBoundaryPoly(bon_polys_c2[1, ], linewidth_boundary = 0.5)
```


<!-- %                                                 % -->
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->


<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->
<!-- %                                                 % -->

# Spatial DE analysis of cells in cluster 0 along spatial weights

## Detect spatial boundaries

Obtain cells in cluster 0
```{r}
cells_c0 <- subset(coords, cluster == 0)[, "cell"]
```


Plot cluster 0 cells without boundary
```{r Plot_NoBoundary_C0, fig.width=8*0.7+1, fig.height=6*0.7}
plotBoundary(coords[cells_c0, ])
```


Get boundaries of cluster 0
```{r}
bon_points_c0 <- getBoundary(data = coords, one_cluster = 0)
bon_polys_c0 <- buildBoundaryPoly(bon_points_c0)
```


Plot boundary edges for cluster 0
```{r PlotBoundary_C0, fig.width=6.6, fig.height=4.2}
plotEdge(boundary_poly = bon_polys_c0, linewidth_boundary = 0.6)
```


## Compute and plot spatial weights

Compute spatial weights based on the distance to boundaries
```{r}
weights_bon <- computeBoundaryWeights(
    data = coords,
    cell_ids = cells_c0,
    boundary = bon_points_c0
)
```


Spatial weights are a named vector, where names correspond to cell IDs
```{r}
weights_bon[1:3]
```


Alternatively, compute spatial weights based on the distance to centroids
```{r}
weights_cen <- computeCentroidWeights(data = coords, cell_ids = cells_c0)
```


Plot boundary weights
```{r, fig.width=6.6, fig.height=4.2}
plotWeights(data = coords, weights = weights_bon, point_size = 0.8) +
    labs(title = "Boundary weights")
```


## Perform spatial differential analysis along boundary weights

Run spatial DE analysis for cluster 0 cells along boundary weights using splines
```{r}
tab_sp <- runSpatialDE(
    exp_mat = logNorm_expr[, cells_c0],
    spatial_distance = weights_bon,
    cell_ids = cells_c0
)
```


Top DE genes along boundary weights. 
The first spline coefficient (`Z1`) captures the main expression trend 
along the spatial distance. A positive value indicates increasing expression 
with distance, while a negative value indicates decreasing expression.
```{r}
head(tab_sp)
```


Expression of top DE genes along boundary weights
```{r Expression_top_genes_boundary, fig.width = 8, fig.height = 2.5}
plotExpression(
    data = coords[colnames(logNorm_expr), ],
    exp_mat = logNorm_expr,
    genes = tab_sp$gene[1:2],
    sub_plot = TRUE,
    one_cluster = 0,
    shuffle = TRUE,
    point_size = 0.1,
    angle_x_label = 45
)
```

Plot scaled average expression along boundary weight bins for 
top 10 spatial DE genes. The scaled average expression ranges 
from 0 to 1 by using min-max normalization method.
```{r Expression_along_boundary_weights_c0, fig.width=6.4, fig.height=4}
plotSpatialExpression(
    exp_mat = logNorm_expr[, cells_c0],
    spatial_distance = weights_bon,
    scale_method = "minmax",
    genes = tab_sp$gene[1:10],
    label_x = "Boundary weights"
)
```


# Spatial enrichment analysis for each gene

Compute spatial enrichment index (SEI) for each gene based on boundary weights. 
The SEI reflects the extent to which gene expression is enriched in 
spatially weighted regions. 
The result is sorted in descending order by `normalized_SEI`.
```{r}
SEI_bon <- computeSpatialEnrichmentIndex(
    exp_mat = logNorm_expr[, cells_c0],
    weights = weights_bon
)
head(SEI_bon)
```

Expression of top genes enriched near boundaries
```{r Expression_top_genes_SEI_boundary, fig.width = 8, fig.height = 2.5}
plotExpression(
    data = coords[colnames(logNorm_expr), ],
    exp_mat = logNorm_expr,
    genes = SEI_bon$gene[1:2],
    sub_plot = TRUE,
    one_cluster = 0,
    shuffle = TRUE,
    point_size = 0.1,
    angle_x_label = 45
)
```

<!-- %                                                 % -->
<!-- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -->

# Session Info

```{r}
sessionInfo()
```

\pagebreak