---
title: "Introduction to scLang"
author: "Andrei-Florian Stoica"
package: scLang
date: March 3, 2026
output: BiocStyle::html_document
vignette: >
    %\VignetteIndexEntry{scLang}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

`scLang` is a suite for package development for scRNA-seq analysis. 
It offers functions that can operate on both `Seurat` and 
`SingleCellExperiment` objects. These functions are primarily 
aimed to help developers build tools compatible with both types of input.

# Installation

To install `scLang`, run the following commands in an R session:

```{r setup, eval=FALSE}
if (!require("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

BiocManager::install("scLang")
```

# Prerequisites
In addition to `scLang`, you need to install 
[scater](https://bioconductor.org/packages/release/bioc/html/scater.html) and 
[scRNAseq](https://bioconductor.org/packages/release/data/experiment/html/scRNAseq.html)
for this tutorial.

# Loading and preparing data

This tutorial uses an scRNA-seq human pancreas dataset. After loading the 
required packages, download the dataset using the `BaronPancreasData` function 
from `scRNAseq`. The dataset will be stored as a SingleCellExperiment object. 

```{r message=FALSE, warning=FALSE, results=FALSE}
library(scLang)
library(scRNAseq)
library(scater)
library(Seurat)

sceObj <- BaronPancreasData('human')
```

Next, we will normalize and log-transform the data using the `logNormCounts` 
function from `scuttle` (loaded automatically with `scater`):

```{r message=FALSE, warning=FALSE, results=FALSE}
sceObj <- logNormCounts(sceObj)
```

We will also need PCA and UMAP dimensions. These can be computed using the 
`runPCA` and `runUMAP` functions from `scater`:

```{r message=FALSE, warning=FALSE, results=FALSE}
sceObj <- runPCA(sceObj)
sceObj <- runUMAP(sceObj)
```

Now we will convert the dataset to a Seurat object:

```{r message=FALSE, warning=FALSE, results=FALSE}
seuratObj <- as.Seurat(sceObj)
```


# Extracting the expression matrix

The `scExpMat` function extracts the expression matrix from a `Seurat` or 
`SingleCellExperiment` object:

```{r message=FALSE, warning=FALSE}
mat1 <- scExpMat(sceObj)
dim(mat1)
mat2 <- scExpMat(seuratObj)
identical(mat1, mat2)
```

**Note**: By default, this function extracts normalized and log-transformed 
data, looking for the `data` assay for a `Seurat` object, and for the 
`logcounts` assay for a `SingleCellExperiment` object. This behavior 
can be changed using the `dataType` parameter.

`scExpMat` can also take a matrix as an argument. This option is useful when 
building functions allowing users to use either a single-cell expression matrix 
or an object of a dedicated class (`Seurat`, `SingleCellExperiment`) as input:

```{r}
mat2 <- scExpMat(mat1)
identical(mat1, mat2)
```

By default, `scExpMat`converts the expression matrix to a dense matrix. 
If this behavior is not desired, conversion can be skipped by setting `densify`
to `FALSE`:

```{r}
is(mat1)[1]
mat2 <- scExpMat(sceObj, densify=FALSE)
is(mat2)[2]
```

`scExpMat`can also extract the expression data only for selected genes:

```{r}
mat2 <- scExpMat(sceObj, genes=rownames(sceObj)[seq(30, 80)])
dim(mat2)
```

# Extracting and altering metadata/coldata columns

The `scCol` function extracts a column from the metadata of the `Seurat` 
object or the coldata of the `SingleCellExperiment` object:

```{r}
col1 <- scCol(seuratObj, 'label')
col2 <- scCol(sceObj, 'label')
identical(col1, col2)
head(col1)
```

It can also be used to insert a new column. Here, we just make a modified
copy of the `label` column for the `Seurat` object:

```{r}
scCol(seuratObj, 'labelCopy') <- paste0(scCol(seuratObj, 'label'), '_copy')
head(seuratObj[['labelCopy']])
```

# Extracting and altering the metadata/coldata or its column names

The `metadataDF` function extracts the metadata/coldata data frame from a
`Seurat` or `SingleCellExpression` object:

```{r}
df1 <- metadataDF(seuratObj)
df2 <- metadataDF(sceObj)
identical(df1, df2)
head(df1)[, c(1, 2)]
```

The `metadataNames` function extracts the column names of the metadata/coldata 
data frame from a Seurat or SingleCellExpression object:

```{r}
colNames1 <- metadataNames(seuratObj)
colNames2 <- metadataNames(sceObj)
identical(colNames1, colNames2)
head(colNames1)
```

# Creating frequency tables

The `scColCounts` and `scColPairCounts` functions are wrappers around 
`dplyr::count` and enable counting frequencies of elements from one or two 
categorical columns in a Seurat or SingleCellExpression object:

```{r}
freq1 <- scColCounts(sceObj, 'donor')
freq2 <- scColCounts(seuratObj, 'donor')
identical(freq1, freq2)
head(freq1)
freq1 <- scColPairCounts(sceObj, 'donor', 'label')
freq2 <- scColPairCounts(seuratObj, 'donor', 'label')
identical(freq1, freq2)
head(freq1)
```

# Visualization

scLang includes three visualization functions that adapt Seurat visualization
tools, extending their usage to `SingleCellExpression` objects in addition to
`Seurat` objects.

The `dimPlot` function mimics the essential behavior of the `DimPlot` function 
from Seurat:

```{r}
dimPlot(sceObj, groupBy='label')
```

The `featurePlot` functions mimics the `FeaturePlot` function from Seurat 
(though using a different color scheme):

```{r}
featurePlot(sceObj, 'SOX4')
```

The `violinPlot` functions mimics the `ViolinPlot` function from Seurat:

```{r}
violinPlot(sceObj, 'SOX4', groupBy='label')
```

# Session information {-}
```{r}
sessionInfo()
```