scran 1.39.2
Woah, hi there. Didn’t except a visitor to this vignette. Most of the functionality in scran is now deprecated or defunct, having since been superceded by the scrapper package. We just keep this package around so that users trying to call the old functions are redirected to their replacements. And also, scran got me a well-paid gig that funds my retirement, so I feel an obligation to repay the favor to an loyal old draft horse.
# Show your appreciation for this package by installing it
# to bump up its download statistics.
install.packages("BiocManager")
BiocManager::install("scran")
Anyway, this vignette will go through some of the remaining non-deprecated functions in this package. In general, these functions are no longer part of routine single-cell analyses but we keep them around for historical purposes. We’ll demonstrate on a small single-cell pancreas dataset:
library(scran)
library(scRNAseq)
sce <- GrunPancreasData()
sce
## class: SingleCellExperiment
## dim: 20064 1728
## metadata(0):
## assays(1): counts
## rownames(20064): A1BG-AS1__chr19 A1BG__chr19 ... ZZEF1__chr17
## ZZZ3__chr1
## rowData names(2): symbol chr
## colnames(1728): D2ex_1 D2ex_2 ... D17TGFB_95 D17TGFB_96
## colData names(2): donor sample
## reducedDimNames(0):
## mainExpName: endogenous
## altExpNames(1): ERCC
# Quickly analyzing it to generate bits and pieces that we can
# reuse in the rest of this vignette.
library(scrapper)
results <- analyze.se(sce, num.threads=2)
sce.ready <- results$x
When using the computeSumFactors() function from scuttle, it can be beneficial to initially compute normalization factors within clusters.
This reduces our sensitivity to violations of the assumption of a non-DE majority of genes.
The quickCluster() function generates a quick-and-dirty clustering that can be passed to clusters=:
library(scran)
clusters <- quickCluster(sce.ready)
sce.normed <- computeSumFactors(sce.ready, clusters=clusters)
summary(sizeFactors(sce.normed))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.002541 0.196259 0.753183 1.000000 1.384071 17.053692
If necessary, further subclustering can be performed conveniently using the quickSubCluster() wrapper function.
This splits the input SingleCellExperiment into several smaller objects containing cells from each cluster and performs another round of clustering within that cluster,
using a freshly identified set of HVGs to improve resolution for internal structure.
subout <- quickSubCluster(sce.ready, groups=clusters)
table(metadata(subout)$subcluster) # formatted as '<parent>.<subcluster>'
##
## 1.1 1.2 1.3 1.4 1.5 1.6 2.1 2.2 2.3 2.4 2.5 3.1 3.2 3.3 3.4 4.1 4.2 4.3 4.4 4.5
## 25 44 18 37 25 16 87 56 89 40 31 68 108 90 35 84 34 50 50 20
## 5.1 5.2 5.3 5.4 5.5 5.6 5.7 6.1 6.2 6.3 6.4 6.5 6.6 7.1 7.2 7.3 7.4 7.5
## 42 43 33 34 15 42 36 21 63 67 25 27 16 22 17 20 31 21
A common question is how many PCs to retain; more PCs will capture more biological signal at the cost of retaining more noise and requiring more computational work.
One approach to choosing the number of PCs is to use the technical component estimates to determine the proportion of variance that should be retained.
This is implemented in denoisePCANumber():
var.explained <- metadata(sce.ready)$PCA$variance.explained
stats <- rowData(sce.ready)
total.tech.var.in.hvgs <- sum(stats[stats$hvg,"fitted"])
total.var.in.hvgs <- sum(stats[stats$hvg,"variances"])
chosen <- denoisePCANumber(var.explained, total.tech.var.in.hvgs, total.var.in.hvgs)
chosen
## [1] 10
We could then use only the specified number of top PCs in our downstream analyses:
reducedDim(sce.ready, "PCA.denoised") <- reducedDim(sce.ready, "PCA")[,1:chosen]
sessionInfo()
## R version 4.6.0 alpha (2026-04-05 r89794)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scrapper_1.5.16 scRNAseq_2.25.0
## [3] BiocParallel_1.45.0 scran_1.39.2
## [5] scuttle_1.21.3 SingleCellExperiment_1.33.2
## [7] SummarizedExperiment_1.41.1 Biobase_2.71.0
## [9] GenomicRanges_1.63.2 Seqinfo_1.1.0
## [11] IRanges_2.45.0 S4Vectors_0.49.1
## [13] BiocGenerics_0.57.0 generics_0.1.4
## [15] MatrixGenerics_1.23.0 matrixStats_1.5.0
## [17] knitr_1.51 BiocStyle_2.39.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.3.0 bitops_1.0-9 httr2_1.2.2
## [4] rlang_1.2.0 magrittr_2.0.5 otel_0.2.0
## [7] gypsum_1.7.0 compiler_4.6.0 RSQLite_2.4.6
## [10] GenomicFeatures_1.63.2 png_0.1-9 vctrs_0.7.2
## [13] ProtGenerics_1.43.0 pkgconfig_2.0.3 crayon_1.5.3
## [16] fastmap_1.2.0 dbplyr_2.5.2 XVector_0.51.0
## [19] Rsamtools_2.27.1 rmarkdown_2.31 UCSC.utils_1.7.1
## [22] bit_4.6.0 xfun_0.57 bluster_1.21.1
## [25] cachem_1.1.0 beachmat_2.27.3 cigarillo_1.1.0
## [28] GenomeInfoDb_1.47.2 jsonlite_2.0.0 blob_1.3.0
## [31] rhdf5filters_1.23.3 DelayedArray_0.37.1 Rhdf5lib_1.33.6
## [34] irlba_2.3.7 parallel_4.6.0 cluster_2.1.8.2
## [37] R6_2.6.1 bslib_0.10.0 limma_3.67.0
## [40] rtracklayer_1.71.3 jquerylib_0.1.4 Rcpp_1.1.1
## [43] bookdown_0.46 Matrix_1.7-5 igraph_2.2.3
## [46] tidyselect_1.2.1 abind_1.4-8 yaml_2.3.12
## [49] codetools_0.2-20 curl_7.0.0 alabaster.sce_1.11.0
## [52] lattice_0.22-9 tibble_3.3.1 KEGGREST_1.51.1
## [55] evaluate_1.0.5 BiocFileCache_3.1.0 alabaster.schemas_1.11.0
## [58] ExperimentHub_3.1.0 Biostrings_2.79.5 pillar_1.11.1
## [61] BiocManager_1.30.27 filelock_1.0.3 RCurl_1.98-1.18
## [64] ensembldb_2.35.0 BiocVersion_3.23.1 alabaster.base_1.11.4
## [67] alabaster.ranges_1.11.0 glue_1.8.0 metapod_1.19.2
## [70] lazyeval_0.2.3 alabaster.matrix_1.11.0 tools_4.6.0
## [73] AnnotationHub_4.1.0 BiocIO_1.21.0 BiocNeighbors_2.5.4
## [76] ScaledMatrix_1.19.0 GenomicAlignments_1.47.0 locfit_1.5-9.12
## [79] XML_3.99-0.23 rhdf5_2.55.16 grid_4.6.0
## [82] AnnotationDbi_1.73.0 edgeR_4.9.4 HDF5Array_1.39.1
## [85] BiocSingular_1.27.1 restfulr_0.0.16 cli_3.6.5
## [88] rsvd_1.0.5 rappdirs_0.3.4 S4Arrays_1.11.1
## [91] dplyr_1.2.1 AnnotationFilter_1.35.0 alabaster.se_1.11.0
## [94] sass_0.4.10 digest_0.6.39 SparseArray_1.11.13
## [97] dqrng_0.4.1 rjson_0.2.23 memoise_2.0.1
## [100] htmltools_0.5.9 lifecycle_1.0.5 h5mread_1.3.3
## [103] httr_1.4.8 statmod_1.5.1 bit64_4.6.0-1