--- title: "TabulaMurisSenisData" author: "Charlotte Soneson, Dania Machlab, Federico Marini" date: "`r BiocStyle::doc_date()`" package: "`r BiocStyle::pkg_ver('TabulaMurisSenisData')`" output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{TabulaMurisSenisData} %\VignetteEncoding{UTF-8} %\VignettePackage{TabulaMurisSenisData} %\VignetteKeywords{GeneExpression, RNASeq, SingleCell} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console bibliography: TabulaMurisSenisData.bib --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, message = FALSE} library(SingleCellExperiment) library(TabulaMurisSenisData) library(ggplot2) ``` # Introduction This package provides access to the processed bulk and single-cell RNA-seq data from the _Tabula Muris Senis_ data set [@Schaum2019-nf;@Tabula_Muris_Consortium2020-um]. The processed bulk RNA-seq data was downloaded from GEO (accession number [GSE132040](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132040)). The single-cell data (droplet and FACS) was obtained from FigShare (see below for links). All data sets are provided as `SingleCellExperiment` objects for easy access and use within the Bioconductor ecosystem. # Bulk RNA-seq data The bulk RNA-seq data can be accessed via the `TabulaMurisSenisBulk()` function. By setting the argument `infoOnly` to `TRUE`, we can get information about the size of the data set before downloading it. ```{r} tmp <- TabulaMurisSenisBulk(infoOnly = TRUE) tms_bulk <- TabulaMurisSenisBulk() tms_bulk ``` We list the available tissues. ```{r} table(colData(tms_bulk)$organ) ``` # Single-cell RNA-seq data ## Droplet data The data files for the droplet single-cell RNA-seq data were downloaded from FigShare: * the [full data (all tissues)](https://figshare.com/articles/dataset/Processed_files_to_use_with_scanpy_/8273102?file=23938934) * the [data from individual tissues](https://figshare.com/articles/dataset/Tabula_Muris_Senis_Data_Objects/12654728) We list the available tissues. ```{r} listTabulaMurisSenisTissues(dataset = "Droplet") ``` As for the bulk data, we can print the size of the data set before downloading it. ```{r} tmp <- TabulaMurisSenisDroplet(tissues = "All", infoOnly = TRUE) tms_droplet <- TabulaMurisSenisDroplet(tissues = "All") tms_droplet ``` We plot the UMAP of the entire data set and color by tissue, to re-create the plot from [here](https://tabula-muris-senis.ds.czbiohub.org). ```{r} # tissue colors tissue_cols <- c(Pancreas = "#3182bd", Thymus = "#31a354", Trachea = "#636363", Bladder = "#637939", Lung = "#7b4173", Large_Intestine = "#843c39", Fat = "#969696", Tongue = "#a1d99b", Mammary_Gland = "#ce6dbd", Limb_Muscle = "#d6616b", Marrow = "#de9ed6", Skin = "#e6550d", Liver = "#e7969c", Heart_and_Aorta = "#e7ba52", Kidney = "#e7cb94", Spleen = "#fd8d3c") # get dataset with all tissues se <- tms_droplet$All se # prepare data set for ggplot ds <- as.data.frame(reducedDim(se, "UMAP")) ds <- cbind(ds, tissue = colData(se)$tissue) head(ds) # plot ggplot(ds, aes(x = UMAP1, y = UMAP2, color = tissue)) + geom_point(size = 0.05) + scale_color_manual(values = tissue_cols) + theme_classic() + guides(colour = guide_legend(override.aes = list(size = 5))) ``` ## FACS data The data files for the FACS single-cell RNA-seq data were downloaded from FigShare: * the [full data (all tissues)](https://figshare.com/articles/dataset/Processed_files_to_use_with_scanpy_/8273102?file=23937842) * the [data from individual tissues](https://figshare.com/articles/dataset/Tabula_Muris_Senis_Data_Objects/12654728) We list the available tissues. ```{r} listTabulaMurisSenisTissues(dataset = "FACS") ``` Also here, we can print the size of the data set before downloading it. ```{r} tmp <- TabulaMurisSenisFACS(tissues = "All", infoOnly = TRUE) tms_facs <- TabulaMurisSenisFACS(tissues = "All") tms_facs ``` We plot the UMAP of the entire data set and color by tissue, to re-create the plot from [here](https://tabula-muris-senis.ds.czbiohub.org). ```{r} # tissue colors tissue_cols <- c(Skin = "#e6550d", Pancreas = "#3182bd", Limb_Muscle = "#d6616b", Heart = "#e7ba52", Spleen = "#fd8d3c", Diaphragm = "#8c6d31", Trachea = "#636363", Tongue = "#a1d99b", Thymus = "#31a354", `Brain_Non-Myeloid` = "#cedb9c", Brain_Myeloid = "#b5cf6b", Bladder = "#637939", Large_Intestine = "#843c39", BAT = "#9c9ede", GAT = "#bd9e39", MAT = "#a55194", SCAT = "#6baed6", Lung = "#7b4173", Liver = "#e7969c", Marrow = "#de9ed6", Kidney = "#e7cb94", Aorta = "#393b79", Mammary_Gland = "#ce6dbd") # get dataset with all tissues se <- tms_facs$All se # prepare data set for ggplot ds <- as.data.frame(reducedDim(se, "UMAP")) ds <- cbind(ds, tissue = colData(se)$tissue) head(ds) # plot ggplot(ds, aes(x = UMAP1, y = UMAP2, color = tissue)) + geom_point(size = 0.05) + scale_color_manual(values = tissue_cols) + theme_classic() + guides(colour = guide_legend(override.aes = list(size = 5))) ``` # Explore data with `iSEE` The _Tabula Muris Senis_ datasets are provided in the form of `SingleCellExperiment` objects. A natural companion to this data structure is the `r Biocpkg("iSEE")` package, which can be used for interactive and reproducible data exploration. Any analysis steps should be performed in advance before calling `iSEE`, and since these datasets can be quite big, the operations can be time consuming, and/or require a considerable amount of resources. ```{r} sce_all_facs <- TabulaMurisSenisFACS(tissues = "All", processedCounts = TRUE)$All sce_all_facs ``` We can see that the `sce_all_facs` contains both raw and processed counts (in the assays slots `counts` and `logcounts`), but also the PCA and UMAP embeddings as provided in the original publication. ```{r} assayNames(sce_all_facs) reducedDimNames(sce_all_facs) ``` If desired, additional processing steps can be performed on the `sce_all_facs` - e.g., storing signature scores (computed via `r Biocpkg("AUCell")`) as `colData` elements. Once these steps are completed, launching `iSEE` is as easy as ```{r include=FALSE} library(iSEE) ``` ```{r launchisee, eval=FALSE} library(iSEE) iSEE(sce_all_facs) ``` This will launch `iSEE` with a standard default set of panels. Optionally, we can configure the `initial` set to start the app in the desired configuration - below, we show how to start `iSEE` with: - a `ReducedDimensionPlot` displaying the UMAP, colored by the expression of a feature; - a `RowDataTable`, for selecting a feature to be transmitted to the other panels (showing all genes whose name contains "Col1"); - a `FeatureAssayPlot`, for showing the expression of the gene selected in the `RowDataTable`, split by tissue. ```{r} initial <- list() ################################################################################ # (Compact) Settings for Reduced dimension plot 1 ################################################################################ initial[["ReducedDimensionPlot1"]] <- new( "ReducedDimensionPlot", DataBoxOpen = TRUE, Type = "UMAP", VisualBoxOpen = TRUE, ColorBy = "Feature name", ColorByFeatureName = "Col1a1", ColorByFeatureSource = "RowDataTable1", ColorByFeatureDynamicSource = FALSE ) ################################################################################ # (Compact) Settings for Row data table 1 ################################################################################ initial[["RowDataTable1"]] <- new( "RowDataTable", Selected = "Col1a1", Search = "Col1" ) ################################################################################ # (Compact) Settings for Feature assay plot 1 ################################################################################ initial[["FeatureAssayPlot1"]] <- new( "FeatureAssayPlot", DataBoxOpen = TRUE, Assay = "logcounts", XAxis = "Column data", XAxisColumnData = "tissue", YAxisFeatureName = "Col1a1", YAxisFeatureSource = "RowDataTable1" ) ```

Click here to display the complete configuration chunk

```{r eval=FALSE} initial <- list() ################################################################################ # Settings for Reduced dimension plot 1 ################################################################################ initial[["ReducedDimensionPlot1"]] <- new("ReducedDimensionPlot", Type = "UMAP", XAxis = 1L, YAxis = 2L, FacetRowByColData = "FACS.selection", FacetColumnByColData = "FACS.selection", ColorByColumnData = "FACS.selection", ColorByFeatureNameAssay = "logcounts", ColorBySampleNameColor = "#FF0000", ShapeByColumnData = "FACS.selection", SizeByColumnData = "n_genes", FacetRowBy = "None", FacetColumnBy = "None", ColorBy = "Feature name", ColorByDefaultColor = "#000000", ColorByFeatureName = "Col1a1", ColorByFeatureSource = "RowDataTable1", ColorByFeatureDynamicSource = FALSE, ColorBySampleName = "A10_B000497_B009023_S10.mm10-plus-0-0", ColorBySampleSource = "---", ColorBySampleDynamicSource = FALSE, ShapeBy = "None", SizeBy = "None", SelectionAlpha = 0.1, ZoomData = numeric(0), BrushData = list(), VisualBoxOpen = TRUE, VisualChoices = "Color", ContourAdd = FALSE, ContourColor = "#0000FF", PointSize = 1, PointAlpha = 1, Downsample = FALSE, DownsampleResolution = 200, CustomLabels = FALSE, CustomLabelsText = "A10_B000497_B009023_S10.mm10-plus-0-0", FontSize = 1, LegendPointSize = 1, LegendPosition = "Bottom", HoverInfo = TRUE, LabelCenters = FALSE, LabelCentersBy = "FACS.selection", LabelCentersColor = "#000000", VersionInfo = list(iSEE = structure(list( c(2L, 5L, 1L)), class = c("package_version", "numeric_version" ))), PanelId = c(ReducedDimensionPlot = 1L), PanelHeight = 500L, PanelWidth = 4L, SelectionBoxOpen = FALSE, RowSelectionSource = "---", ColumnSelectionSource = "---", DataBoxOpen = FALSE, RowSelectionDynamicSource = FALSE, ColumnSelectionDynamicSource = FALSE, RowSelectionRestrict = FALSE, ColumnSelectionRestrict = FALSE, SelectionHistory = list()) ################################################################################ # Settings for Row data table 1 ################################################################################ initial[["RowDataTable1"]] <- new("RowDataTable", Selected = "Col1a1", Search = "Col1", SearchColumns = c("", "", "", "", ""), HiddenColumns = character(0), VersionInfo = list( iSEE = structure(list(c(2L, 5L, 1L)), class = c("package_version", "numeric_version"))), PanelId = c(RowDataTable = 1L), PanelHeight = 500L, PanelWidth = 4L, SelectionBoxOpen = FALSE, RowSelectionSource = "---", ColumnSelectionSource = "---", DataBoxOpen = FALSE, RowSelectionDynamicSource = FALSE, ColumnSelectionDynamicSource = FALSE, RowSelectionRestrict = FALSE, ColumnSelectionRestrict = FALSE, SelectionHistory = list()) ################################################################################ # Settings for Feature assay plot 1 ################################################################################ initial[["FeatureAssayPlot1"]] <- new("FeatureAssayPlot", Assay = "logcounts", XAxis = "Column data", XAxisColumnData = "tissue", XAxisFeatureName = "0610005C13Rik", XAxisFeatureSource = "---", XAxisFeatureDynamicSource = FALSE, YAxisFeatureName = "Col1a1", YAxisFeatureSource = "RowDataTable1", YAxisFeatureDynamicSource = FALSE, FacetRowByColData = "FACS.selection", FacetColumnByColData = "FACS.selection", ColorByColumnData = "age", ColorByFeatureNameAssay = "logcounts", ColorBySampleNameColor = "#FF0000", ShapeByColumnData = "FACS.selection", SizeByColumnData = "n_genes", FacetRowBy = "None", FacetColumnBy = "None", ColorBy = "None", ColorByDefaultColor = "#000000", ColorByFeatureName = "0610005C13Rik", ColorByFeatureSource = "---", ColorByFeatureDynamicSource = FALSE, ColorBySampleName = "A10_B000497_B009023_S10.mm10-plus-0-0", ColorBySampleSource = "---", ColorBySampleDynamicSource = FALSE, ShapeBy = "None", SizeBy = "None", SelectionAlpha = 0.1, ZoomData = numeric(0), BrushData = list(), VisualBoxOpen = FALSE, VisualChoices = "Color", ContourAdd = FALSE, ContourColor = "#0000FF", PointSize = 1, PointAlpha = 1, Downsample = FALSE, DownsampleResolution = 200, CustomLabels = FALSE, CustomLabelsText = "A10_B000497_B009023_S10.mm10-plus-0-0", FontSize = 1, LegendPointSize = 1, LegendPosition = "Bottom", HoverInfo = TRUE, LabelCenters = FALSE, LabelCentersBy = "FACS.selection", LabelCentersColor = "#000000", VersionInfo = list(iSEE = structure(list( c(2L, 5L, 1L)), class = c("package_version", "numeric_version" ))), PanelId = c(FeatureAssayPlot = 1L), PanelHeight = 500L, PanelWidth = 4L, SelectionBoxOpen = FALSE, RowSelectionSource = "---", ColumnSelectionSource = "---", DataBoxOpen = FALSE, RowSelectionDynamicSource = FALSE, ColumnSelectionDynamicSource = FALSE, RowSelectionRestrict = FALSE, ColumnSelectionRestrict = FALSE, SelectionHistory = list()) initial ```
`iSEE` can then be launched with the following command: ```{r eval=FALSE} iSEE(sce_all_facs, initial = initial) ``` ```{r echo=FALSE, fig.cap="Screenshot of the iSEE app running on the FACS single cell dataset", out.width='100%'} knitr::include_graphics("ss_iSEE_facsdataset.jpg") ``` Note that these lengthy configuration options can be readily exported by clicking on the dedicated button "Display panel settings" in `iSEE` - we can do that anytime after we are done designing the configuration using the app interface. # Session info {-} ```{r} sessionInfo() ``` # References