--- title: "Introduction to scLang" author: "Andrei-Florian Stoica" package: scLang date: March 3, 2026 output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{scLang} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction `scLang` is a suite for package development for scRNA-seq analysis. It offers functions that can operate on both `Seurat` and `SingleCellExperiment` objects. These functions are primarily aimed to help developers build tools compatible with both types of input. # Installation To install `scLang`, run the following commands in an R session: ```{r setup, eval=FALSE} if (!require("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("scLang") ``` # Prerequisites In addition to `scLang`, you need to install [scater](https://bioconductor.org/packages/release/bioc/html/scater.html) and [scRNAseq](https://bioconductor.org/packages/release/data/experiment/html/scRNAseq.html) for this tutorial. # Loading and preparing data This tutorial uses an scRNA-seq human pancreas dataset. After loading the required packages, download the dataset using the `BaronPancreasData` function from `scRNAseq`. The dataset will be stored as a SingleCellExperiment object. ```{r message=FALSE, warning=FALSE, results=FALSE} library(scLang) library(scRNAseq) library(scater) library(Seurat) sceObj <- BaronPancreasData('human') ``` Next, we will normalize and log-transform the data using the `logNormCounts` function from `scuttle` (loaded automatically with `scater`): ```{r message=FALSE, warning=FALSE, results=FALSE} sceObj <- logNormCounts(sceObj) ``` We will also need PCA and UMAP dimensions. These can be computed using the `runPCA` and `runUMAP` functions from `scater`: ```{r message=FALSE, warning=FALSE, results=FALSE} sceObj <- runPCA(sceObj) sceObj <- runUMAP(sceObj) ``` Now we will convert the dataset to a Seurat object: ```{r message=FALSE, warning=FALSE, results=FALSE} seuratObj <- as.Seurat(sceObj) ``` # Extracting the expression matrix The `scExpMat` function extracts the expression matrix from a `Seurat` or `SingleCellExperiment` object: ```{r message=FALSE, warning=FALSE} mat1 <- scExpMat(sceObj) dim(mat1) mat2 <- scExpMat(seuratObj) identical(mat1, mat2) ``` **Note**: By default, this function extracts normalized and log-transformed data, looking for the `data` assay for a `Seurat` object, and for the `logcounts` assay for a `SingleCellExperiment` object. This behavior can be changed using the `dataType` parameter. `scExpMat` can also take a matrix as an argument. This option is useful when building functions allowing users to use either a single-cell expression matrix or an object of a dedicated class (`Seurat`, `SingleCellExperiment`) as input: ```{r} mat2 <- scExpMat(mat1) identical(mat1, mat2) ``` By default, `scExpMat`converts the expression matrix to a dense matrix. If this behavior is not desired, conversion can be skipped by setting `densify` to `FALSE`: ```{r} is(mat1)[1] mat2 <- scExpMat(sceObj, densify=FALSE) is(mat2)[2] ``` `scExpMat`can also extract the expression data only for selected genes: ```{r} mat2 <- scExpMat(sceObj, genes=rownames(sceObj)[seq(30, 80)]) dim(mat2) ``` # Extracting and altering metadata/coldata columns The `scCol` function extracts a column from the metadata of the `Seurat` object or the coldata of the `SingleCellExperiment` object: ```{r} col1 <- scCol(seuratObj, 'label') col2 <- scCol(sceObj, 'label') identical(col1, col2) head(col1) ``` It can also be used to insert a new column. Here, we just make a modified copy of the `label` column for the `Seurat` object: ```{r} scCol(seuratObj, 'labelCopy') <- paste0(scCol(seuratObj, 'label'), '_copy') head(seuratObj[['labelCopy']]) ``` # Extracting and altering the metadata/coldata or its column names The `metadataDF` function extracts the metadata/coldata data frame from a `Seurat` or `SingleCellExpression` object: ```{r} df1 <- metadataDF(seuratObj) df2 <- metadataDF(sceObj) identical(df1, df2) head(df1)[, c(1, 2)] ``` The `metadataNames` function extracts the column names of the metadata/coldata data frame from a Seurat or SingleCellExpression object: ```{r} colNames1 <- metadataNames(seuratObj) colNames2 <- metadataNames(sceObj) identical(colNames1, colNames2) head(colNames1) ``` # Creating frequency tables The `scColCounts` and `scColPairCounts` functions are wrappers around `dplyr::count` and enable counting frequencies of elements from one or two categorical columns in a Seurat or SingleCellExpression object: ```{r} freq1 <- scColCounts(sceObj, 'donor') freq2 <- scColCounts(seuratObj, 'donor') identical(freq1, freq2) head(freq1) freq1 <- scColPairCounts(sceObj, 'donor', 'label') freq2 <- scColPairCounts(seuratObj, 'donor', 'label') identical(freq1, freq2) head(freq1) ``` # Visualization scLang includes three visualization functions that adapt Seurat visualization tools, extending their usage to `SingleCellExpression` objects in addition to `Seurat` objects. The `dimPlot` function mimics the essential behavior of the `DimPlot` function from Seurat: ```{r} dimPlot(sceObj, groupBy='label') ``` The `featurePlot` functions mimics the `FeaturePlot` function from Seurat (though using a different color scheme): ```{r} featurePlot(sceObj, 'SOX4') ``` The `violinPlot` functions mimics the `ViolinPlot` function from Seurat: ```{r} violinPlot(sceObj, 'SOX4', groupBy='label') ``` # Session information {-} ```{r} sessionInfo() ```