
SlimR is an R package designed for annotating single-cell and
spatial-transcriptomics (ST) datasets. It supports the creation of a
unified marker list, Markers_list, using sources including:
the package’s built-in curated species-specific cell type and marker
reference databases (e.g., ‘Cellmarker2’, ‘PanglaoDB’, ‘scIBD’,
‘TCellSI’), Seurat objects containing cell label information, or
user-provided Excel tables mapping cell types to markers.
SlimR can predict calculate parameters by machine learning algorithms
(e.g., ‘Random Forest’, ‘Gradient Boosting’, ‘Support Vector Machine’,
‘Ensemble Learning’), and based on Markers_list, calculate gene
expression of different cell types and predict annotation information
and calculate corresponding AUC by Celltype_Calculate(),
and annotate it by Celltype_Annotation(), then verify it by
Celltype_Verification(). At the same time, it can calculate
gene expression corresponding to the cell type to generate a reference
map for manual annotation (e.g., ‘Heat Map’, ‘Feature Plots’, ‘Combined
Plots’).
Install SlimR directly from CRAN using: (Stable version, recommended when the version equivalent to GitHub package version)
install.packages("SlimR")Note: Try adjusting the CRAN image to Global (CDN)
or use BiocManager::install("SlimR") if you encounter a
version mismatch during installation.
Install SlimR directly from GitHub using: (Development version, recommended when the version is higher than CRAN package version)
devtools::install_github("Zhaoqing-wang/SlimR")Note: If the function doesn’t work, please run
install.packages('devtools') first.
Load the package in your R environment:
library(SlimR)For Seurat objects with multiple layers in the assay, please run
SeuratObject::JoinLayers() first.
# For example, if you want to use the 'RNA' layer in the multilayered Seurat object assay.
sce@assays$RNA <- SeuratObject::JoinLayers(sce@assays$RNA)Important: To ensure accuracy of the annotation, make sure that the entered Seurat object has run the standard process and removed batch effects.
Note: It is recommended to use the clustree package
to determine the appropriate resolution for the input Seurat
object.
SlimR requires R (≥ 3.5) and depends on the following packages:
cowplot, dplyr, ggplot2,
patchwork, pheatmap, readxl,
scales, Seurat, tidyr,
tools. If installation fails, please install missing
dependencies using:
# Install dependencies if needed:
install.packages(c("cowplot", "dplyr", "ggplot2", "patchwork",
"pheatmap", "readxl", "scales", "Seurat",
"tidyr", "tools"))SlimR requires a standardized list format for storing marker information, metrics (can be omitted), and corresponding cell types (list names = cell types (necessary), first column = markers (necessary), subsequent columns = metrics (can be omitted)).
Cellmarkers2: A database of cell types and markers covering different species and tissue types.
Reference: Hu et al. (2023) doi:10.1093/nar/gkac947.
Cellmarker2 <- SlimR::Cellmarker2Cellmarker2_table <- SlimR::Cellmarker2_table
View(Cellmarker2_table)Markers_list:Markers_list_Cellmarker2 <- Markers_filter_Cellmarker2(
Cellmarker2,
species = "Human",
tissue_class = "Intestine",
tissue_type = NULL,
cancer_type = NULL,
cell_type = NULL
)Important: Select at least the species and
tissue_class parameters to ensure the accuracy of the
annotation.
Link: Output Markers_list usable in sections 3.1,
4.1, 4.2, 4.3 and 5.1. Click
to section3 automated annotation workflow.
PanglaoDB: Database of cell types and markers covering different species and tissue types.
Reference: Franzén et al. (2019) doi:10.1093/database/baz046.
PanglaoDB <- SlimR::PanglaoDBPanglaoDB_table <- SlimR::PanglaoDB_table
View(PanglaoDB_table)Markers_list:Markers_list_panglaoDB <- Markers_filter_PanglaoDB(
PanglaoDB,
species_input = 'Human',
organ_input = 'GI tract'
)Important: Select the species_input and
organ_input parameters to ensure the accuracy of the
annotation.
Link: Output Markers_list usable in sections 3.1,
4.1, 4.2, 4.3 and 5.2. Click
to section3 automated annotation workflow.
scIBD: A database of human intestine markers.
Reference: Nie et al. (2023) doi:10.1038/s43588-023-00464-9.
Markers_list_scIBD <- SlimR::Markers_list_scIBDImportant: This is for human intestinal annotation only. The input Seurat object was ensured to be a human intestinal type to ensure the accuracy of the labeling.
Note: The Markers_list_scIBD was generated using
section 2.5.2 and the parameters sort_by = "logFC" and
gene_filter = 20 were set.
Link: Output Markers_list usable in sections 3.1,
4.1, 4.2, 4.3 and 5.3. Click
to section3 automated annotation workflow.
TCellSI: A database of T cell markers of different sub types.
Reference: Yang et al. (2024) doi:10.1002/imt2.231.
Markers_list_TCellSI <- SlimR::Markers_list_TCellSIImportant: This is only for T cell subset annotation. Ensure that the input Seurat object is of T cell type to guarantee the accuracy of the annotation.
Note: The Markers_list_TCellSI was generated using
section 2.6.
Link: Output Markers_list usable in sections 3.1,
4.1, 4.2, 4.3 and 5.4. Click
to section3 automated annotation workflow.
Markers_list:The standard Markers_list can be generated by the
built-in read_seurat_markers() function after obtaining
Markers through the Seurat::FindAllMarkers() function.
seurat_markers <- Seurat::FindAllMarkers(
object = sce,
group.by = "Cell_type",
only.pos = TRUE)
Markers_list_Seurat <- Read_seurat_markers(seurat_markers,
sources = "Seurat",
sort_by = "FSS",
gene_filter = 20
)Note: Recommend use the parameter sort_by = "FSS" to
use the ‘Feature Significance Score’ (FSS, product value of
log2FC and Expression ratio) or use the
parameter sort_by = "avg_log2FC" as the ranking
basis.
presto to Speed Up: (Alternative)For large data sets, the presto::wilcoxauc() function
can be used to speed up the operation. (Alternative, ~10x faster,
sacrifice partial accuracy)
seurat_markers <- dplyr::filter(
presto::wilcoxauc(
X = sce,
group_by = "Cell_type",
seurat_assay = "RNA"
),
padj < 0.05, logFC > 0.5
)
Markers_list_Seurat <- Read_seurat_markers(seurat_markers,
sources = "presto",
sort_by = "FSS",
gene_filter = 20
)Improtant: This feature depends on the presto
packages, please run
devtools::install_github('immunogenomics/presto')
first.
Note: Recommend use the parameter sort_by = "logFC"
or use the parameter sort_by = "FSS" to use the ‘Feature
Significance Score’ (FSS, product value of log2FC and
Expression ratio) as the ranking basis.
Link: Output Markers_list usable in sections 3.1,
4.1, 4.2, 4.3 and 5.3. Click
to section3 automated annotation workflow.
Format Requirements:
Each sheet name = cell type (necessary)
First row = column headers (necessary)
First column = markers (necessary)
Subsequent columns = metrics (can be omitted)
Markers_list_Excel <- Read_excel_markers("D:/Laboratory/Marker_load.xlsx")Link: Output Markers_list usable in sections 3.1,
4.1, 4.2, 4.3 and 5.4. Click
to section3 automated annotation workflow.
SlimR integrates multiple machine learning algorithms (e.g., Random
Forest, Gradient Boosting, Support Vector Machine, Ensemble Learning) to
automatically determine optimal min_expression and
specificity_weight parameters in section 3.2 for cell types
probability calculate.
# Basic usage uses default genes
SlimR_params <- Parameter_Calculate(
seurat_obj = sce,
features = c("CD3E", "CD4", "CD8A"),
assay = "RNA",
cluster_col = "seurat_clusters",
method = "ensemble",
n_models = 3,
return_model = FALSE,
verbose = TRUE
)
# Use with custom method: use the genes corresponding to a specific cell type in 'Markers_list' as input
SlimR_params <- Parameter_Calculate(
seurat_obj = sce,
features = unique(Markers_list_Cellmarker2$`B cell`$marker),
assay = "RNA",
cluster_col = "seurat_clusters",
method = "rf",
return_model = FALSE,
verbose = TRUE
)Important: This scheme is optional and can be skipped to section 3.2 for cell type probability calculation using default parameters.
Note: Using the parameter method = "rf" in the
function Parameter_Calculate () can modify the machine
learning model used.Machine learning method: rf (Random
Forest), gbm (Gradient Boosting), svm (Support
Vector Machine), or ensemble (Ensemble Learning;
default)
Uses markers_list to calculate probability, prediction
results, calculate corresponding AUC (optional) and generate heat map
and ROC graphs (optional) for cell annotation.
SlimR_anno_result <- Celltype_Calculate(seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_clusters",
assay = "RNA",
min_expression = 0.1,
specificity_weight = 3,
threshold = 0.8,
compute_AUC = TRUE,
plot_AUC = TRUE,
AUC_correction = TRUE,
colour_low = "navy",
colour_high = "firebrick3"
)You can use the
min_expression = SlimR_params$min_expression and
specificity_weight = SlimR_params$specificity_weight
parameter in function Celltype_Calculate() if you have run
the Parameter_Calculate () function in section 3.1
above.
Important: The parameter cluster_col in the
function Celltype_Calculate() and the function
Celltype_Annotation() must be strictly the same to avoid
false matches.
Note: Using the parameter AUC_correction = TRUE
takes a little longer to compute (~20% longer than only setting
parameter plot_AUC = TRUE; ~40% longer than only setting
parameter compute_AUC = TRUE), but it is recommended to
correct the predicted cell type this way in order to obtain more
accurate cell type prediction results. The lower the parameter
threshold, the more alternative cell types will be checked
by AUC, and the longer the run time will be.
Check the annotation probability of the cell type to be annotated in
the input cluster_col column and cell types in
Markers_list with the following code.
print(SlimR_anno_result$Heatmap_plot)Note: If the heat map is not generated properly, please run the
function library(pheatmap) first.
Cell type information results predicted by SlimR can be viewed with the following code.
View(SlimR_anno_result$Prediction_results)Furthermore, the ROC curve and AUC value of the corresponding
cluster_col and predicted cell types can be viewed by the
following code.
print(SlimR_anno_result$AUC_plot)Improtant: This feature depends on the parameter
plot_AUC = TRUE.
Note: If the heat map is not generated properly, please run the
function library(ggplot2) first.
After viewing the list of predicted cell types and the corresponding AUC values, the predicted cell types can be corrected with the following code.
Example 1:
# For example, cluster '15' in 'cluster_col' corresponds to cell type 'Intestinal stem cell'.
SlimR_anno_result$Prediction_results$Predicted_cell_type[
SlimR_anno_result$Prediction_results$cluster_col == 15
] <- "Intestinal stem cell"Example 2:
# For example, a predicted cell type with an AUC of 0.5 or less should be labeled 'Unknown'.
SlimR_anno_result$Prediction_results$Predicted_cell_type[
SlimR_anno_result$Prediction_results$AUC <= 0.5
] <- "Unknown"After modifying the corresponding predicted cell type, the following code is used to view the updated predicted cell type table.
View(SlimR_anno_result$Prediction_results)Improtant: It is strongly recommended that if you need to
correct the cell type, use cell types in
SlimR_anno_result$Prediction_results$Alternative_cell_type.
Assigns SlimR predicted cell types information in
SlimR_anno_result$Prediction_results$Predicted_cell_type to
the Seurat object based on cluster annotations, and stores the results
into seurat_obj@meta.data$annotation_col.
sce <- Celltype_Annotation(seurat_obj = sce,
cluster_col = "seurat_clusters",
SlimR_anno_result = SlimR_anno_result,
plot_UMAP = TRUE,
annotation_col = "Cell_type_SlimR"
)Important: The parameter cluster_col in the
function Celltype_Calculate() and the function
Celltype_Annotation() must be strictly the same to avoid
false matches. And the parameter annotation_col in the
function Celltype_Annotation() and the function
Celltype_Verification() must be strictly the same to avoid
false matches.
Use the cell group identity information in
seurat_obj@meta.data$annotation_col and use the ‘Feature
Significance Score’ (FSS, product value of log2FC and
Expression ratio) as the ranking basis.
Celltype_Verification(seurat_obj = sce,
SlimR_anno_result = SlimR_anno_result,
gene_number = 5,
assay = "RNA",
colour_low = "white",
colour_high = "navy",
annotation_col = "Cell_type_SlimR"
)Important: The parameter annotation_col in the
function Celltype_Annotation() and the function
Celltype_Verification() must be strictly the same to avoid
false matches.
Note: Cell types located in
SlimR_anno_result$Prediction_results were verified using
the markers information from
SlimR_anno_result$Expression_list; cell types that are not
in the above list are validated using the markers information from the
function FindMarkers().
Generate a heat map to estimate the likelihood that various cell clusters exhibited similarity to control cell types:
Celltype_Annotation_Heatmap(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_cluster",
min_expression = 0.1,
specificity_weight = 3,
colour_low = "navy",
colour_high = "firebrick3"
)Note: Now this function has been incorporated into
Celltype_Calculate(), and it is recommended to use
Celltype_Calculate() instead.
Generates per-cell-type expression dot plot with metric heat map (when the metric information exists):
Celltype_Annotation_Features(
seurat_obj = sce,
gene_list = Markers_list,
gene_list_type = "Cellmarker2",
species = "Human",
save_path = "./SlimR/Celltype_Annotation_Features/",
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)Each resulting combined image consists of a dot plot above and a heat map below (if metric information present). Dot plot show the expression level and expression ratio relationship between the cell type and corresponding markers. Below it, there is a metric heat map for the corresponding markers (if the metric information exists).
Generates per-cell-type expression combined plots:
Celltype_Annotation_Combined(
seurat_obj = sce,
gene_list = Markers_list,
species = "Human",
cluster_col = "seurat_cluster",
assay = "RNA",
save_path = "./SlimR/Celltype_Annotation_Combined/",
colour_low = "white",
colour_high = "navy"
)Each generated combined plot shows the box plot of the expression levels of the corresponding markers for that cell type, with the colors corresponding to the average expression levels of the markers.
Functions in section 5.1, 5.2, 5.3 and 5.4 has been incorporated into
Celltype_Annotation_Features(), and it is recommended to
use Celltype_Annotation_Features() and set corresponding
parameters (for example, gene_list_type = "Cellmarker2")
instead. For more information, please refer to section 4.2.
Celltype_annotation_Cellmarker2(
seurat_obj = sce,
gene_list = Markers_list_Cellmarker2,
species = "Human",
cluster_col = "seurat_cluster",
assay = "RNA",
save_path = "./SlimR/Celltype_annotation_Cellmarkers2/",
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)Note: To call this function, set the parameter
gene_list_type = "Cellmarker2" in the function
Celltype_Annotation_Features().
Celltype_annotation_PanglaoDB(
seurat_obj = sce,
gene_list = Markers_list_panglaoDB,
species = "Human",
cluster_col = "seurat_cluster",
assay = "RNA",
save_path = "./SlimR/Celltype_annotation_PanglaoDB/",
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)Note: To call this function, set the parameter
gene_list_type = "PanglaoDB" in the function
Celltype_Annotation_Features().
Celltype_annotation_Seurat(
seurat_obj = sce,
gene_list = Markers_list_Seurat,
species = "Human",
cluster_col = "seurat_cluster",
assay = "RNA",
save_path = "./SlimR/Celltype_annotation_Seurat/",
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)Note: To call this function, set the parameter
gene_list_type = "Seurat" in the function
Celltype_Annotation_Features().
Celltype_annotation_Excel(
seurat_obj = sce,
gene_list = Markers_list_Excel,
species = "Human",
cluster_col = "seurat_cluster",
assay = "RNA",
save_path = "./SlimR/Celltype_annotation_Excel/",
colour_low = "white",
colour_high = "navy",
colour_low_mertic = "white",
colour_high_mertic = "navy"
)Note: To call this function, set the parameter
gene_list_type = "Excel" in the function
Celltype_Annotation_Features. This function also works with
Markers_list without metric information or with metric
information generated in other ways.
Thank you for using SlimR. For questions, issues, or suggestions, please submit them in the issue section or discussion section on GitHub (suggested) or send an email (alternative):
zhaoqingwang@mail.sdu.edu.cn
Zhaoqing Wang