--- title: "Getting Started with SpNeigh" author: - name: Jinming Cheng affiliation: - &duk_nus Centre for Biomedical Data Science, Duke-NUS Medical School, Singapore, 169857, Singapore # email: jinming.cheng@outlook.com date: "`r format(Sys.time(), '%d %B, %Y')`" vignette: > %\VignetteIndexEntry{Getting Started with SpNeigh} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} output: BiocStyle::html_document: toc_float: false bibliography-style: plain --- ```{r, include = FALSE} knitr::opts_chunk$set( eval = TRUE, collapse = TRUE, comment = "#>", out.width = "100%", dev = "png", dpi = 50, fig.height = 4.2, fig.width = 5.6 ) ``` # Introduction Recent spatial transcriptomics technologies such as **10x Xenium**, **MERFISH**, and **Visium HD** enable high-resolution profiling of gene expression while preserving tissue architecture. These platforms have opened new avenues for studying tissue organization, cell–cell interactions, and spatial gene regulation. Most existing methods focus on identifying globally spatially variable genes or enhancing spatial visualization, with limited support for modeling **local spatial context**, such as region boundaries, cell neighborhoods, or gradients across tissue structures. **SpNeigh** is an R package that introduces a **boundary-aware and region-informed framework** for spatial neighborhood analysis and spatial differential expression modeling. Key features include: - Boundary detection and neighborhood ring construction - Spatial weighting using distances to centroids or boundaries - Spline-based modeling of gene expression along spatial gradients - A spatial enrichment index (SEI) to quantify boundary- or region-enriched genes Unlike existing Bioconductor tools such as **spatialDE**, which emphasize global spatial variability, SpNeigh enables **local, interpretable**, and **geometry-aware** modeling of spatial expression patterns. SpNeigh supports inputs from both **SpatialExperiment** and **Seurat** objects for coordinate extraction and visualization, and emphasizes transparent modeling using user-supplied matrices and weights—ensuring reproducibility and compatibility with Bioconductor workflows. By integrating spatial geometry with transcriptomic data, SpNeigh reveals biologically meaningful patterns such as **intermediate cell states**, **microenvironmental differences**, and **spatial gene expression gradients**, which are critical for understanding development, tissue function, and disease progression. # Overview of this vignette This vignette demonstrates the key functionalities of the **SpNeigh** package using a real [10x Xenium Fresh Frozen Mouse Brain Tiny Subset dataset](https://www.10xgenomics.com/datasets/fresh-frozen-mouse-brain-for-xenium-explorer-demo-1-standard). Specifically, we show how to: - Identify spatial boundaries and construct neighboring ring regions - Explore spatial interaction patterns between cell clusters - Perform differential expression analysis between spatially defined cell populations - Compute spatial weights based on boundary or centroid proximity - Model spatial expression gradients using spline-based differential expression - Quantify spatial enrichment along spatial gradients This workflow highlights the analytical flexibility of **SpNeigh** and illustrates how neighborhood-aware modeling can reveal spatial gene expression patterns that are not readily captured by existing approaches. # Installation The **SpNeigh** package can be installed from GitHub by using: ```{r, eval=FALSE} devtools::install_github("jinming-cheng/SpNeigh") ``` # Load packages ```{r loadPackage} library(SpNeigh) library(ggplot2) ``` # Load data ## Input: Coordinate data frame and Normalized expression matrix Coordinates of mouse brain dataset ```{r} coords <- readRDS(system.file("extdata", "MouseBrainCoords.rds", package = "SpNeigh" )) head(coords) ``` Ensure the rownames of `coords` are the same with the cell names ```{r} rownames(coords) <- coords$cell ``` Log normalized expression data generated by `NormalizeData` function in *Seurat* package ```{r} logNorm_expr <- readRDS(system.file("extdata", "LogNormExpr.rds", package = "SpNeigh" )) class(logNorm_expr) ``` Annotate clusters based on anatomical brain regions ```{r} new_cell_types <- c( "0" = "Meninges", "1" = "Cerebral_cortex", "2" = "White_matter", "3" = "Cerebral_cortex", "4" = "Cerebral_cortex", "5" = "Thalamus", "6" = "Cerebral_cortex", "7" = "Hippocampus", "8" = "Hippocampus", "9" = "Hippocampus", "10" = "White_matter", "11" = "Cerebral_cortex" ) ``` In this vignette, we focus on cluster-level rather than cell type–level analyses. ## Alternative input: SpatialExperiment object Coordinates of cells in clusters 0 and 2 ```{r} coords_sub <- subset(coords, cluster %in% c("0", "2")) coords_sub <- as.matrix(coords_sub[, c("x", "y")]) ``` metadata of cells in clusters 0 and 2 ```{r} metadata_sub <- subset(coords[, c("cell", "cluster")], cluster %in% c("0", "2")) ``` Create SpatialExperiment ```{r} spe <- SpatialExperiment::SpatialExperiment( assay = list("logcounts" = logNorm_expr), colData = metadata_sub, spatialCoords = coords_sub ) spe ``` Extract coordinates from SpatialExperiment object ```{r} spe_coords <- extractCoords(spe) ``` Extract log-normalized expression matrix from SpatialExperiment object ```{r} spe_logcounts <- SingleCellExperiment::logcounts(spe) ``` ## Alternative input: Seurat object Load Seurat pacakge ```{r} library(Seurat) ``` Create Seurat object (Note: a normalized expression matrix is used here for tutorial purposes only. In practice, use a raw count matrix.) ```{r} seu_sp <- CreateSeuratObject( assay = "Spatial", counts = logNorm_expr, meta.data = metadata_sub ) seu_sp ``` Insert normalized data into the `data` layer (Seurat v5) ```{r} LayerData(seu_sp, assay = "Spatial", layer = "data") <- logNorm_expr ``` Add FOV ```{r} cents <- CreateCentroids(coords_sub[, c("x", "y")]) fov <- CreateFOV( coords = list("centroids" = cents), type = c("centroids"), assay = "Spatial" ) seu_sp[["fov"]] <- fov ``` Add `seurat_clusters` to the metadata ```{r} seu_sp$seurat_clusters <- seu_sp$cluster ``` Extract coordinates from Seurat object ```{r} seu_sp_coords <- extractCoords(seu_sp) ``` Extract log-normalized expression matrix from Seurat object ```{r} seu_sp_log_expr <- GetAssayData(seu_sp) ``` # Neighborhood analysis for cluster 2 cells ## Quick look at the boundaries of one cluster Quick look at the boundaries of cluster 2 using default parameter settings. The input is a SpatialExperiment object. ```{r PlotBoundary_spe, fig.width=6.6, fig.height=4.2} plotBoundary(spe, one_cluster = "2") ``` Similarly, we can use a Seurat object as input and adjust parameters ```{r PlotBoundary_seu, fig.width=6.6, fig.height=4.2} plotBoundary(seu_sp, one_cluster = "2", eps = 120) ``` ## Detect spatial boundaries Extract boundaries of cluster 2. Adjust the `eps` and `minPts` parameters to refine the number of subregions detected by the default dbscan clustering method. Note: The `eps` parameter controls the clustering tolerance for boundary detection based on spatial distance. The default value is `eps = 80`, which is suitable for coordinates in the range of hundreds or thousands (e.g., 10x Xenium or Visium HD data). For datasets with smaller coordinate ranges (e.g., 1 to 5), such as manually scaled or normalized inputs, a smaller value such as `eps = 0.1` may be more appropriate. ```{r} bon_points_c2 <- getBoundary( data = coords, one_cluster = 2, eps = 120, minPts = 10 ) table(bon_points_c2$region_id) ``` Add boundaries of cluster 2 to the spatial plot ```{r PlotWithBoundary_C2, fig.width=6.6, fig.height=4.2} plotBoundary(coords, boundary = bon_points_c2) ``` Alternatively, add boundaries of cluster 2 using addBoundary() function ```{r} # plotBoundary(coords) + addBoundary(boundary = bon_points_c2) ``` Plot boundary regions ```{r PlotRegion_C2,fig.height = 2.8, fig.width = 5.6} bon_polys_c2 <- buildBoundaryPoly(bon_points_c2) plotRegion(bon_polys_c2) ``` ## Obtain neighborhood ring regions Get spatial ring-shaped regions by subtracting the original boundary polygons from their corresponding outer buffered polygons. The `outer_boundary` is automatically computed when it is not supplied. ```{r} ring_regions <- getRingRegion(boundary = bon_points_c2) ``` Plot ring regions ```{r PlotRing_C2,fig.height = 2.8, fig.width = 5.6} plotRegion(ring_regions) ``` ## Statistics of cells inside rings Get cells inside rings for cluster 2. (`getCellsInside()` takes a few seconds to run on this example, but may require several minutes or longer for larger datasets.) ```{r} cells_ring <- getCellsInside(data = coords, boundary = ring_regions) cells_ring ``` Plot cells inside rings ```{r PlotCellsInsideRing_C2,fig.height = 3.6, fig.width = 5.6} plotCellsInside(cells_ring, point_size = 0.2) ``` Obtain statistics of cells inside rings for cluster 2 ```{r} stats_ring <- statsCellsInside(cells_ring) ``` Plot proportion of cells in different clusters for each sub region using bar plot. ```{r PlotStatsBar_Proportion_C2_Ring, fig.height = 4.2, fig.width = 6.6} plotStatsBar(stats_ring, stat_column = "proportion") ``` Plot proportion of cells in different clusters for each sub region using donut chart. If `plot_donut = FALSE`, pie chart is used. ```{r PlotStats_Donut_C2_Ring, fig.height = 3.5, fig.width = 5.6} plotStatsPie(stats_ring, plot_donut = TRUE) ``` ## Neighborhood interaction of clusters inside rings Compute neighborhood interaction matrix using K-nearest neighbors for cells inside rings ```{r} interaction_matrix <- computeSpatialInteractionMatrix( subset(coords, cell %in% cells_ring$cell) ) interaction_matrix ``` Heatmap of a row-scaled interaction matrix for cells inside rings for cluster 2. The heatmap reveals that clusters 2, 3, and 4 are spatially adjacent to one another. ```{r Heatmap_InteractionMatrix_C2_Ring, fig.height = 4.2, fig.width = 5.6} plotInteractionMatrix(interaction_matrix) ``` The plot of cells inside rings split by clusters further confirms the co-occurrence of clusters 2, 3, 4 in ring region 1. ```{r PlotCellsInside_SplitByCluster_C2_Ring, fig.width=8.4, fig.height=4.2} plotCellsInside(cells_ring) + facet_wrap(~cluster) + Seurat::RotatedAxis() + addBoundaryPoly(ring_regions, linewidth_boundary = 0.1) ``` # DE analysis of cluster 2 cells inside and outside boundaries Most cells in cluster 2 are located in sub region 1. Hence, we focus on the DE analysis between cluster 2 cells inside boundary of sub region 1 and cells in the outside neighboring ring region 1. Get cells inside the boundaries of cluster 2 ```{r} cells_inside <- getCellsInside(data = coords, boundary = bon_points_c2) cells_inside ``` Obtain cluster 2 cells inside and outside boundary of sub region ```{r} cells_in <- subset(cells_inside, region_id == 1 & cluster == 2)[["cell"]] cells_out <- subset(cells_ring, region_id == 1 & cluster == 2)[["cell"]] ``` Perform DE analysis between cells inside boundary and outside boundary using the `limma` framewrok (`lmFit` + `eBayes`) ```{r} tab <- runLimmaDE( exp_mat = logNorm_expr, min.pct = 0.25, cells_reference = cells_in, cells_target = cells_out ) ``` Top DE genes between cells outside boundary and cells inside boundary. The DE genes are ordered by the abstract value of `logFC` by default. ```{r} head(tab[, c("gene", "logFC", "adj.P.Val", "pct.reference", "pct.target")]) ``` Set a random seed to make plotExpression results reproducible ```{r} set.seed(123) ``` Expression of top DE genes in cluster 2 cells outside boundary. The cells are plotted randomly using the given seed when `shuffle = TRUE`. If `shuffle = FALSE`, cells with higher expression values are plotted last (on the top). ```{r Expression_Slc17a7_c2, fig.width = 8, fig.height = 2.5} plotExpression( data = coords[colnames(logNorm_expr), ], exp_mat = logNorm_expr, genes = tab$gene[1:2], sub_plot = TRUE, one_cluster = 2, shuffle = TRUE, point_size = 0.1, angle_x_label = 45 ) ``` Expression of *Slc17a7* for cluster 2 cells inside and outside boundary of sub region 1. Cluster 2 cells outside the boundary show much higher expression of Slc17a7. ```{r Expression_Slc17a7_InOutBoundary,fig.height = 2.8, fig.width = 5.6} plotExpression( data = coords[colnames(logNorm_expr), ], exp_mat = logNorm_expr, genes = "Slc17a7", sub_plot = TRUE, shuffle = TRUE, sub_cells = c(cells_in, cells_out), point_size = 0.3 ) + addBoundaryPoly(bon_polys_c2[1, ], linewidth_boundary = 0.5) ``` Expression of a general marker gene *Sox10* for Oligodendrocytes (cluster 2 cells). *Sox10* is expressed in both cluster 2 cells inside and outside the boundary. This suggests that cluster 2 cells located near the boundary may represent intermediate cell states. ```{r PlotExpression_Sox10,fig.height = 2.8, fig.width = 5.6} plotExpression( data = coords[colnames(logNorm_expr), ], exp_mat = logNorm_expr, sub_cells = c(cells_in, cells_out), sub_plot = TRUE, genes = "Sox10", shuffle = TRUE, point_size = 0.3 ) + addBoundaryPoly(bon_polys_c2[1, ], linewidth_boundary = 0.5) ``` # Spatial DE analysis of cells in cluster 0 along spatial weights ## Detect spatial boundaries Obtain cells in cluster 0 ```{r} cells_c0 <- subset(coords, cluster == 0)[, "cell"] ``` Plot cluster 0 cells without boundary ```{r Plot_NoBoundary_C0, fig.width=8*0.7+1, fig.height=6*0.7} plotBoundary(coords[cells_c0, ]) ``` Get boundaries of cluster 0 ```{r} bon_points_c0 <- getBoundary(data = coords, one_cluster = 0) bon_polys_c0 <- buildBoundaryPoly(bon_points_c0) ``` Plot boundary edges for cluster 0 ```{r PlotBoundary_C0, fig.width=6.6, fig.height=4.2} plotEdge(boundary_poly = bon_polys_c0, linewidth_boundary = 0.6) ``` ## Compute and plot spatial weights Compute spatial weights based on the distance to boundaries ```{r} weights_bon <- computeBoundaryWeights( data = coords, cell_ids = cells_c0, boundary = bon_points_c0 ) ``` Spatial weights are a named vector, where names correspond to cell IDs ```{r} weights_bon[1:3] ``` Alternatively, compute spatial weights based on the distance to centroids ```{r} weights_cen <- computeCentroidWeights(data = coords, cell_ids = cells_c0) ``` Plot boundary weights ```{r, fig.width=6.6, fig.height=4.2} plotWeights(data = coords, weights = weights_bon, point_size = 0.8) + labs(title = "Boundary weights") ``` ## Perform spatial differential analysis along boundary weights Run spatial DE analysis for cluster 0 cells along boundary weights using splines ```{r} tab_sp <- runSpatialDE( exp_mat = logNorm_expr[, cells_c0], spatial_distance = weights_bon, cell_ids = cells_c0 ) ``` Top DE genes along boundary weights. The first spline coefficient (`Z1`) captures the main expression trend along the spatial distance. A positive value indicates increasing expression with distance, while a negative value indicates decreasing expression. ```{r} head(tab_sp) ``` Expression of top DE genes along boundary weights ```{r Expression_top_genes_boundary, fig.width = 8, fig.height = 2.5} plotExpression( data = coords[colnames(logNorm_expr), ], exp_mat = logNorm_expr, genes = tab_sp$gene[1:2], sub_plot = TRUE, one_cluster = 0, shuffle = TRUE, point_size = 0.1, angle_x_label = 45 ) ``` Plot scaled average expression along boundary weight bins for top 10 spatial DE genes. The scaled average expression ranges from 0 to 1 by using min-max normalization method. ```{r Expression_along_boundary_weights_c0, fig.width=6.4, fig.height=4} plotSpatialExpression( exp_mat = logNorm_expr[, cells_c0], spatial_distance = weights_bon, scale_method = "minmax", genes = tab_sp$gene[1:10], label_x = "Boundary weights" ) ``` # Spatial enrichment analysis for each gene Compute spatial enrichment index (SEI) for each gene based on boundary weights. The SEI reflects the extent to which gene expression is enriched in spatially weighted regions. The result is sorted in descending order by `normalized_SEI`. ```{r} SEI_bon <- computeSpatialEnrichmentIndex( exp_mat = logNorm_expr[, cells_c0], weights = weights_bon ) head(SEI_bon) ``` Expression of top genes enriched near boundaries ```{r Expression_top_genes_SEI_boundary, fig.width = 8, fig.height = 2.5} plotExpression( data = coords[colnames(logNorm_expr), ], exp_mat = logNorm_expr, genes = SEI_bon$gene[1:2], sub_plot = TRUE, one_cluster = 0, shuffle = TRUE, point_size = 0.1, angle_x_label = 45 ) ``` # Session Info ```{r} sessionInfo() ``` \pagebreak