--- title: "scCATCH tutorial" author: "Xin Shao" date: "`r Sys.Date()`" output: prettydoc::html_pretty: theme: cayman highlight: github vignette: > %\VignetteIndexEntry{scCATCH tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This package is a single cell Cluster-based auto-Annotation Toolkit for Cellular Heterogeneity (scCATCH) from cluster potential marker genes identification to cluster annotation based on evidence-based score by matching the potential marker genes with known cell markers in tissue-specific cell taxonomy reference database (CellMatch). The scCATCH mainly includes two function `findmarkergene` and `findcelltype` to realize the automatic annotation for each identified cluster. scCATCH can be used to annotate scRNA-seq data from tissue with cancer and without cancer. ### General usage [1] For scRNA-seq data, we suggest to revise the gene symbols with `rev_gene()`. `geneinfo` is the system data.frame containing the information of human and mouse from NCBI gene(updated in June. 19, 2022). To use your own `geneinfo` data.frame, please refer to `demo_geneinfo` to build a new one, e.g., rat, zebrafish, Drosophila, C. elegans, etc. ```{r setup, echo=TRUE} library(scCATCH) load(paste0(system.file(package = "scCATCH"), "/extdata/mouse_kidney_203.rda")) # demo_geneinfo demo_geneinfo() # revise gene symbols mouse_kidney_203 <- rev_gene(data = mouse_kidney_203, data_type = "data", species = "Mouse", geneinfo = geneinfo) ``` [2] create scCATCH object with `createscCATCH()`. Users need to provide the normalized data and the cluster for each cell. ```{r createscCATCH, echo=TRUE} obj <- createscCATCH(data = mouse_kidney_203, cluster = mouse_kidney_203_cluster) ``` [3] find highly expressed genes with `findmarkergene()`. Users need to provided the speices, tissue, or cancer information. `cellmatch` is the system data.frame containing the known markers of human and mouse. To use your own marker data.frame, please refer to `demo_marker` to build a new one, e.g., rat, zebrafish, Drosophila, C. elegans, etc. ```{r findmarkergene, echo=TRUE} # demo_geneinfo demo_marker() # find highly expressed genes obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch, tissue = "Kidney") ``` [4] Evidence-based score and annotation for each cluster with `findcelltype()` ```{r findcelltype, echo=TRUE} obj <- findcelltype(object = obj) # Results is stored in obj obj@celltype ``` Note: There two methods to find marker genes. Set `use_method` `1` to compare with every other cluster and `2` to compare with other clusters together like the strategy in `Seurat`. Besides, when setting `use_method` `1`, users can set `comp_cluster`, it represent the number of clusters to compare. Default is to compare all other cluster for each cluster. Set it between 1 and length of unique clusters. More marker genes will be obtained for smaller `comp_cluster`. ```{r findcelltype_new, echo=TRUE} # The most strict condition to identify marker genes obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch,tissue = "Kidney", use_method = "1") # The most loose condition to identify marker genes obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch, tissue = "Kidney", use_method = "2") # Other conditions to identify marker genes obj <- findmarkergene(object = obj,species = "Mouse", marker = cellmatch, tissue = "Kidney", use_method = "1", comp_cluster = 1) ``` Moreover, users can adjust the `cell_min_pct`, `logfc`, and `pvalue` to identify the different marker genes. ### Custom usage Users are allowed to use the custom `cellmatch` for cell type prediction when [1] users want to select different combination of tissues or cancers for annotation; [2] users want to add more marker genes to `cellmatch` for annotation; [3] users want to use markers from different species other than human and mouse. In this way, please set `if_use_custom_marker` `TRUE` in `findmarkergene()` function and do not need to set `species`,`tissue`, and `cancer` [1] Different combination of tissues or cancers ```{r findcelltype1, echo=TRUE} # Example cellmatch_new <- cellmatch[cellmatch$species == "Mouse" & cellmatch$tissue %in% c("Kidney", "Liver", "Lung", "Brain"), ] obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new) obj <- findcelltype(obj) # Example cellmatch_new <- cellmatch[cellmatch$species == "Mouse" & cellmatch$cancer %in% c("Lung Cancer", "Lymph node", "Renal Cell Carcinoma", "Prostate Cancer"), ] obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new) obj <- findcelltype(obj) # Example cellmatch_new <- cellmatch[cellmatch$species == "Mouse", ] cellmatch_new <- cellmatch[cellmatch$cancer %in% c("Lung Cancer", "Lymph node", "Renal Cell Carcinoma", "Prostate Cancer") | cellmatch$tissue %in% c("Kidney", "Liver", "Lung", "Brain"), ] obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new) obj <- findcelltype(obj) ``` [2] Add more marker genes to `cellmatch` for annotation ```{r findcelltype2, echo=TRUE} # Example # cellmatch_new is provided by users # cellmatch_new <- rbind(cellmatch, cellmatch_new) # Then use the new cellmatch # a. define the species, tissue, and cancer obj <- findmarkergene(object = obj, species = "Mouse", marker = cellmatch_new, tissue = "Kidney") obj <- findcelltype(obj) # b. directly use custom cellmatch obj <- findmarkergene(object = obj, if_use_custom_marker = TRUE, marker = cellmatch_new) obj <- findcelltype(obj) ``` [3] Use markers from different species ```{r findcelltype3, echo=TRUE} # Please refer to demo_marker to build a marker data.frame (new_cellmatch) for another species, e.g., rat # Then use the new marker obj <- findmarkergene(object = obj, species = "Rat", if_use_custom_marker = TRUE, marker = cellmatch_new, tissue = "Kidney") obj <- findcelltype(obj) ``` ### About Please refer to the [scCATCH](https://github.com/ZJUFanLab/scCATCH) on GitHub for more information. Available tissues and cancers see the [wiki page](https://github.com/ZJUFanLab/scCATCH/wiki) ### Cite Shao et al., scCATCH:Automatic Annotation on Cell Types of Clusters from Single-Cell RNA Sequencing Data, iScience, Volume 23, Issue 3, 27 March 2020. [doi: 10.1016/j.isci.2020.100882](https://www.sciencedirect.com/science/article/pii/S2589004220300663). [PMID:32062421](https://pubmed.ncbi.nlm.nih.gov/32062421/)