---
title: "Integrative clustering analysis of single cell multi-omics data"
author:
name: "Qianxing Mo"
affiliation: "Department of Biostatistics and Bioinformatics
H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612"
output:
html_document:
theme: united
df_print: kable
date: 'Compiled: `r format(Sys.Date(), "%B %d, %Y")`'
---
```{r setup, include=FALSE}
all_times <- list() # store the time for each chunk
knitr::knit_hooks$set(time_it = local({
now <- NULL
function(before, options) {
if (before) {
now <<- Sys.time()
} else {
res <- difftime(Sys.time(), now)
all_times[[options$label]] <<- res
}
}
}))
knitr::opts_chunk$set(
message = FALSE,
warning = FALSE,
time_it = TRUE,
error = TRUE
)
```
This tutorial demonstrates how to perform vertical integrative clustering (iCluster) analysis of single cell multi-omics data. The iCluster2b function in the iClusterPlus package will be used as an example. The iCluster2b implements an updated iCluster method originally reported by [Shen, Wang and Mo. 2023](https://pubmed.ncbi.nlm.nih.gov/24587839/). The CITE-seq dataset ([Stuart, Butler et al., Cell 2019](https://www.cell.com/cell/fulltext/S0092-8674(19)30559-8)) consisting of scRNA-seq and ADT (antibody-derived tag) data of 30,672 cells from human bone marrow is used for the demonstration. The dataset can be obtained by installing the R package [SeuratData](https://github.com/satijalab/seurat-data).
We use the R scripts of the Seurat's tutorial [Weighted Nearest Neighbor Analysis](https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis) to process the scRNA-seq and ADT data. The iCluster analysis of single cell multi-omics data can be carried out in 4 steps as described in the following.
```{r packages, cache=FALSE}
library(iClusterPlus)
library(lattice)
library(gplots)
library(Seurat)
library(SeuratData)
library(cowplot)
library(dplyr)
library(Rphenograph)
library(umap)
library(mclust)
library(knitr)
```
# Step 1: Processing data using the Seurat package
The scRNA-seq and ADT data are processed separately, which includes normalization, feature selection, scaling and dimension reduction.
```{r data procession}
InstallData("bmcite")
bm <- LoadData(ds = "bmcite")
DefaultAssay(bm) <- 'RNA'
bm <- NormalizeData(bm) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA()
# process ADT data; all the 25 features are used for dimensional reduction
DefaultAssay(bm) <- 'ADT'
VariableFeatures(bm) <- rownames(bm[["ADT"]])
bm <- NormalizeData(bm, normalization.method = 'CLR', margin = 2) %>% ScaleData() %>% RunPCA(reduction.name = 'apca')
```
# Step 2: Perform iCluster modeling
The iCluster modeling generates a latent factor matrix (meanZ), which capture the major sources of variation of the multi-omics data (scRNA-seq and ADT). As a result, the latent factor matrix can be used for sample clustering and visualization.
```{r PCA variance, fig.height=6, fig.width = 10}
# input data for iCluster2b
BMlist <- list(ADT=t(bm[['ADT']]$scale.data),RNA=t(bm[['RNA']]$scale.data))
# PCA variance plot
pcaVarPlot(BMlist,K=50)
```
From the variance plot, we can see that the explained variance starts to level off around the 10th PC. For iCluster modeling of single cell multi-omics data, we suggest selecting the PC at which the explained variance levels off. As an example, we select K=20 as the number of latent factors for iCluster modeling. It can be seen that the clustering results are quite consistent for a range of numbers (e.g., K = 20 - 30).
```{r iCluster analysis}
# since features have been selected, we set lambda to a small value to minimize feature selection
date()
bm.ic = iCluster2b(xList=BMlist,K=20,lambda=list(0.1,0.1),method=c("lasso","lasso"))
date()
```
# Step 3: Clustering of cells using the latent factor matrix
We use the Louvain method implemented in the Rphenograph package to perform cell clustering analysis. Then, we project the cells to a 2-dimensional space using UMAP for visualization.
```{r clustering and UMAP}
## clustering of cells using Louvain method and the latent factor matrix generated by iCluster
ic.graph <- Rphenograph(bm.ic$meanZ, k=30)
tabclst <- table(membership(ic.graph[[2]]))
## order the clusters based on cluster size
clstlevels <- as.numeric(names(tabclst)[order(tabclst,decreasing = TRUE)])
bm.iClusters <- factor(membership(ic.graph[[2]]),levels=clstlevels)
names(bm.iClusters) <- rownames(BMlist$ADT)
# Projecting cells to 2-dimensional space using UMAP
ic.umap <- umap(bm.ic$meanZ)
rownames(ic.umap$layout) <- rownames(BMlist$ADT)
colnames(ic.umap$layout) <- c("umap1","umap2")
## put umap layout and cluster IDs into the Seurat object bm
bm[["ic.umap"]] <- CreateDimReducObject(embeddings = ic.umap$layout, key = "ic_umap")
bm$iCluster <- bm.iClusters
```
For comparison, we perform unsupervised clustering analysis using Seurat's WNN method. For more details of WNN analysis, we refer users to the Seurat's tutorial ([Weighted Nearest Neighbor Analysis](https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis)).
```{r wnn analysis}
# Generate WNN graph
bm <- FindMultiModalNeighbors(bm, reduction.list = list("pca", "apca"),dims.list <- list(1:30, 1:18), k.nn = 20,
knn.graph.name = "wknn",snn.graph.name = "wsnn",modality.weight.name = "RNA.weight")
# multimodal neighbor matrix used for UMAP can be accessed using bm[['weighted.nn']]
# WNN graph can be accessed using bm[["wknn"]]
# SNN graph used for clustering can be accessed at bm[["wsnn"]]
bm <- RunUMAP(bm, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_")
bm <- FindClusters(bm, graph.name = "wsnn", algorithm = 3, resolution = 2, verbose = FALSE)
```
# Step 4: Visualization of cell clusters
We use Seurat's function DimPlot to visualize the cells on a 2-dimensional UMAP space.
1. Visualization of cell clusters generated by iCluster analysis, alongside cell annotations
```{r iCluster.plot, fig.width = 10}
p1 <- DimPlot(bm, reduction = "ic.umap", group.by = "iCluster", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("iCluster")
p2 <- DimPlot(bm, reduction = "ic.umap", group.by = "celltype.l2", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("celltype")
p1 + p2
```
2. Heatmap of cell clusters generated by iCluster analysis
```{r iCluster heatmap, fig.height=6, fig.width=11}
col.scheme <- alist()
col.scheme[[1]] <- bluered(256)
col.scheme[[2]] <- bluered(256)
bm.ic$clusters <- bm.iClusters
ic.hm <- plotHeatmap(fit=bm.ic,xList=BMlist,type=c("gaussian","gaussian"),
threshold=rep(0.25, 2),feature.order=c(T,T),feature.scale=c(F,F),
dist.method = "euclidean", hclust.method="ward.D",col.scheme = col.scheme,width=10,
sparse=c(F,T),cap=c(0.99,0.99))
```
3. Visualization of cell clusters generated by Seurat WNN analysis, alongside cell annotations
```{r seurat clusters plot, fig.width=10}
p3 <- DimPlot(bm, reduction = "wnn.umap", group.by = "seurat_clusters", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("Seurat")
p4 <- DimPlot(bm, reduction = "wnn.umap", group.by = "celltype.l2", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("celltype")
p3 + p4
```
4. Heatmap of cell clusters generated by Seurat WNN analysis
```{r Seurat heatmap, fig.height=6, fig.width=11}
bm.ic$clusters <- bm$seurat_clusters
wnn.hm <- plotHeatmap(fit=bm.ic,xList=BMlist,type=c("gaussian","gaussian"),
threshold=rep(0.25, 2),feature.order=c(T,T),feature.scale=c(F,F),
dist.method = "euclidean", hclust.method="ward.D",col.scheme = col.scheme,width=10,
sparse=c(F,T),cap=c(0.99,0.99))
```
Finally, we compare the integrative clustering results of the iCluster and Seurat WNN using adjusted rand index. In general, the higher of the index, the better the results.
```{r iClusters vs. cell annoations}
# iCluster vs. cell type annotation
adjustedRandIndex(bm$iCluster, bm$celltype.l2)
# seurat WNN clusters vs. cell type annotation
adjustedRandIndex(bm$seurat_clusters, bm$celltype.l2)
# iClusters vs. seurat WNN clusters
adjustedRandIndex(bm$iCluster, bm$seurat_clusters)
# table of annotated cell types and iClusters
icTable <- table(bm$celltype.l2,bm.iClusters)
knitr::kable(icTable)
```
**Session Info**
```{r}
sessionInfo()
```