--- title: "Integrative clustering analysis of single cell multi-omics data" author: name: "Qianxing Mo" affiliation: "Department of Biostatistics and Bioinformatics
H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612" output: html_document: theme: united df_print: kable date: 'Compiled: `r format(Sys.Date(), "%B %d, %Y")`' --- ```{r setup, include=FALSE} all_times <- list() # store the time for each chunk knitr::knit_hooks$set(time_it = local({ now <- NULL function(before, options) { if (before) { now <<- Sys.time() } else { res <- difftime(Sys.time(), now) all_times[[options$label]] <<- res } } })) knitr::opts_chunk$set( message = FALSE, warning = FALSE, time_it = TRUE, error = TRUE ) ``` This tutorial demonstrates how to perform vertical integrative clustering (iCluster) analysis of single cell multi-omics data. The iCluster2b function in the iClusterPlus package will be used as an example. The iCluster2b implements an updated iCluster method originally reported by [Shen, Wang and Mo. 2023](https://pubmed.ncbi.nlm.nih.gov/24587839/). The CITE-seq dataset ([Stuart, Butler et al., Cell 2019](https://www.cell.com/cell/fulltext/S0092-8674(19)30559-8)) consisting of scRNA-seq and ADT (antibody-derived tag) data of 30,672 cells from human bone marrow is used for the demonstration. The dataset can be obtained by installing the R package [SeuratData](https://github.com/satijalab/seurat-data). We use the R scripts of the Seurat's tutorial [Weighted Nearest Neighbor Analysis](https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis) to process the scRNA-seq and ADT data. The iCluster analysis of single cell multi-omics data can be carried out in 4 steps as described in the following. ```{r packages, cache=FALSE} library(iClusterPlus) library(lattice) library(gplots) library(Seurat) library(SeuratData) library(cowplot) library(dplyr) library(Rphenograph) library(umap) library(mclust) library(knitr) ``` # Step 1: Processing data using the Seurat package The scRNA-seq and ADT data are processed separately, which includes normalization, feature selection, scaling and dimension reduction. ```{r data procession} InstallData("bmcite") bm <- LoadData(ds = "bmcite") DefaultAssay(bm) <- 'RNA' bm <- NormalizeData(bm) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA() # process ADT data; all the 25 features are used for dimensional reduction DefaultAssay(bm) <- 'ADT' VariableFeatures(bm) <- rownames(bm[["ADT"]]) bm <- NormalizeData(bm, normalization.method = 'CLR', margin = 2) %>% ScaleData() %>% RunPCA(reduction.name = 'apca') ``` # Step 2: Perform iCluster modeling The iCluster modeling generates a latent factor matrix (meanZ), which capture the major sources of variation of the multi-omics data (scRNA-seq and ADT). As a result, the latent factor matrix can be used for sample clustering and visualization. ```{r PCA variance, fig.height=6, fig.width = 10} # input data for iCluster2b BMlist <- list(ADT=t(bm[['ADT']]$scale.data),RNA=t(bm[['RNA']]$scale.data)) # PCA variance plot pcaVarPlot(BMlist,K=50) ``` From the variance plot, we can see that the explained variance starts to level off around the 10th PC. For iCluster modeling of single cell multi-omics data, we suggest selecting the PC at which the explained variance levels off. As an example, we select K=20 as the number of latent factors for iCluster modeling. It can be seen that the clustering results are quite consistent for a range of numbers (e.g., K = 20 - 30). ```{r iCluster analysis} # since features have been selected, we set lambda to a small value to minimize feature selection date() bm.ic = iCluster2b(xList=BMlist,K=20,lambda=list(0.1,0.1),method=c("lasso","lasso")) date() ``` # Step 3: Clustering of cells using the latent factor matrix We use the Louvain method implemented in the Rphenograph package to perform cell clustering analysis. Then, we project the cells to a 2-dimensional space using UMAP for visualization. ```{r clustering and UMAP} ## clustering of cells using Louvain method and the latent factor matrix generated by iCluster ic.graph <- Rphenograph(bm.ic$meanZ, k=30) tabclst <- table(membership(ic.graph[[2]])) ## order the clusters based on cluster size clstlevels <- as.numeric(names(tabclst)[order(tabclst,decreasing = TRUE)]) bm.iClusters <- factor(membership(ic.graph[[2]]),levels=clstlevels) names(bm.iClusters) <- rownames(BMlist$ADT) # Projecting cells to 2-dimensional space using UMAP ic.umap <- umap(bm.ic$meanZ) rownames(ic.umap$layout) <- rownames(BMlist$ADT) colnames(ic.umap$layout) <- c("umap1","umap2") ## put umap layout and cluster IDs into the Seurat object bm bm[["ic.umap"]] <- CreateDimReducObject(embeddings = ic.umap$layout, key = "ic_umap") bm$iCluster <- bm.iClusters ``` For comparison, we perform unsupervised clustering analysis using Seurat's WNN method. For more details of WNN analysis, we refer users to the Seurat's tutorial ([Weighted Nearest Neighbor Analysis](https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis)). ```{r wnn analysis} # Generate WNN graph bm <- FindMultiModalNeighbors(bm, reduction.list = list("pca", "apca"),dims.list <- list(1:30, 1:18), k.nn = 20, knn.graph.name = "wknn",snn.graph.name = "wsnn",modality.weight.name = "RNA.weight") # multimodal neighbor matrix used for UMAP can be accessed using bm[['weighted.nn']] # WNN graph can be accessed using bm[["wknn"]] # SNN graph used for clustering can be accessed at bm[["wsnn"]] bm <- RunUMAP(bm, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_") bm <- FindClusters(bm, graph.name = "wsnn", algorithm = 3, resolution = 2, verbose = FALSE) ``` # Step 4: Visualization of cell clusters We use Seurat's function DimPlot to visualize the cells on a 2-dimensional UMAP space. 1. Visualization of cell clusters generated by iCluster analysis, alongside cell annotations ```{r iCluster.plot, fig.width = 10} p1 <- DimPlot(bm, reduction = "ic.umap", group.by = "iCluster", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("iCluster") p2 <- DimPlot(bm, reduction = "ic.umap", group.by = "celltype.l2", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("celltype") p1 + p2 ``` 2. Heatmap of cell clusters generated by iCluster analysis ```{r iCluster heatmap, fig.height=6, fig.width=11} col.scheme <- alist() col.scheme[[1]] <- bluered(256) col.scheme[[2]] <- bluered(256) bm.ic$clusters <- bm.iClusters ic.hm <- plotHeatmap(fit=bm.ic,xList=BMlist,type=c("gaussian","gaussian"), threshold=rep(0.25, 2),feature.order=c(T,T),feature.scale=c(F,F), dist.method = "euclidean", hclust.method="ward.D",col.scheme = col.scheme,width=10, sparse=c(F,T),cap=c(0.99,0.99)) ``` 3. Visualization of cell clusters generated by Seurat WNN analysis, alongside cell annotations ```{r seurat clusters plot, fig.width=10} p3 <- DimPlot(bm, reduction = "wnn.umap", group.by = "seurat_clusters", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("Seurat") p4 <- DimPlot(bm, reduction = "wnn.umap", group.by = "celltype.l2", label = TRUE, label.size = 2.5, repel = TRUE) + NoLegend() + ggtitle("celltype") p3 + p4 ``` 4. Heatmap of cell clusters generated by Seurat WNN analysis ```{r Seurat heatmap, fig.height=6, fig.width=11} bm.ic$clusters <- bm$seurat_clusters wnn.hm <- plotHeatmap(fit=bm.ic,xList=BMlist,type=c("gaussian","gaussian"), threshold=rep(0.25, 2),feature.order=c(T,T),feature.scale=c(F,F), dist.method = "euclidean", hclust.method="ward.D",col.scheme = col.scheme,width=10, sparse=c(F,T),cap=c(0.99,0.99)) ``` Finally, we compare the integrative clustering results of the iCluster and Seurat WNN using adjusted rand index. In general, the higher of the index, the better the results. ```{r iClusters vs. cell annoations} # iCluster vs. cell type annotation adjustedRandIndex(bm$iCluster, bm$celltype.l2) # seurat WNN clusters vs. cell type annotation adjustedRandIndex(bm$seurat_clusters, bm$celltype.l2) # iClusters vs. seurat WNN clusters adjustedRandIndex(bm$iCluster, bm$seurat_clusters) # table of annotated cell types and iClusters icTable <- table(bm$celltype.l2,bm.iClusters) knitr::kable(icTable) ```
**Session Info** ```{r} sessionInfo() ```