Title: | Single-Cell Interpretable Tensor Decomposition |
Version: | 1.0.4 |
Date: | 2023-09-06 |
Maintainer: | Jonathan Mitchel <jonathan.mitchel3@gmail.com> |
Description: | Single-cell Interpretable Tensor Decomposition (scITD) employs the Tucker tensor decomposition to extract multicell-type gene expression patterns that vary across donors/individuals. This tool is geared for use with single-cell RNA-sequencing datasets consisting of many source donors. The method has a wide range of potential applications, including the study of inter-individual variation at the population-level, patient sub-grouping/stratification, and the analysis of sample-level batch effects. Each "multicellular process" that is extracted consists of (A) a multi cell type gene loadings matrix and (B) a corresponding donor scores vector indicating the level at which the corresponding loadings matrix is expressed in each donor. Additional methods are implemented to aid in selecting an appropriate number of factors and to evaluate stability of the decomposition. Additional tools are provided for downstream analysis, including integration of gene set enrichment analysis and ligand-receptor analysis. Tucker, L.R. (1966) <doi:10.1007/BF02289464>. Unkel, S., Hannachi, A., Trendafilov, N. T., & Jolliffe, I. T. (2011) <doi:10.1007/s13253-011-0055-9>. Zhou, G., & Cichocki, A. (2012) <doi:10.2478/v10175-012-0051-4>. |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R (≥ 4.0.0), Matrix |
Imports: | rTensor, ica, fgsea, circlize, reshape2, parallel, ComplexHeatmap, ggplot2, mgcv, utils, Rcpp, RColorBrewer, dplyr, edgeR, sva, stats, Rmisc, ggpubr, msigdbr, sccore, NMF |
Suggests: | methods, knitr, rmarkdown, testthat, coda.base, grid, simplifyEnrichment, WGCNA, cowplot, matrixStats, stringr, zoo, rlang, AnnotationDbi, GO.db, conos, pagoda2, betareg, slam, tm |
RoxygenNote: | 7.2.3 |
LinkingTo: | Rcpp, RcppArmadillo, RcppProgress |
NeedsCompilation: | yes |
Packaged: | 2023-09-08 15:38:51 UTC; jmitchel |
Author: | Jonathan Mitchel [cre, aut], Evan Biederstedt [aut], Peter Kharchenko [aut] |
Repository: | CRAN |
Date/Publication: | 2023-09-08 16:00:02 UTC |
Apply ComBat batch correction to pseudobulk matrices. Generally, this should be done through calling the form_tensor() wrapper function.
Description
Apply ComBat batch correction to pseudobulk matrices. Generally, this should be done through calling the form_tensor() wrapper function.
Usage
apply_combat(container, batch_var)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
batch_var |
character A batch variable from metadata to remove |
Value
The project container with the batc corrected pseudobulked matrices.
Calculate F-Statistics for the association between donor scores for each factor donor values of shuffled gene_ctype fibers
Description
Calculate F-Statistics for the association between donor scores for each factor donor values of shuffled gene_ctype fibers
Usage
calculate_fiber_fstats(tensor_data, tucker_results, s_fibers)
Arguments
tensor_data |
list The tensor data including donor, gene, and cell type labels as well as the tensor array itself |
tucker_results |
list The results from Tucker decomposition. Includes a scores matrix as the first element and the loadings tensor unfolded as the second element. |
s_fibers |
list Gene and cell type indices for the randomly selected fibers |
Value
A numeric vector of F-statistics for associations between all shuffled fibers and donor scores.
Helper function to check whether receptor is present in target cell type
Description
Helper function to check whether receptor is present in target cell type
Usage
check_rec_pres(
container,
lig_ct_exp,
rec_elements,
target_ct,
percentile_exp_rec
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
lig_ct_exp |
numeric Scaled expression for a ligand in the source cell type |
rec_elements |
character One or more components of a receptor complex |
target_ct |
character The name of the target cell type |
percentile_exp_rec |
numeric The percentile of ligand expression above which all donors need to have at least 5 cells expressing the receptor. |
Value
A logical indicating whether receptor is present or not.
Clean data to remove genes only expressed in a few cells and donors with very few cells. Generally, this should be done through calling the form_tensor() wrapper function.
Description
Clean data to remove genes only expressed in a few cells and donors with very few cells. Generally, this should be done through calling the form_tensor() wrapper function.
Usage
clean_data(container, donor_min_cells = 5)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
donor_min_cells |
numeric Minimum threshold for number of cells per donor (default=5) |
Value
The project container with cleaned counts matrices in each container$scMinimal_ctype$<ctype>$count_data.
Calculates column mean and variance. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp
Description
Calculates column mean and variance. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp
Usage
colMeanVars(sY, rowSel, ncores = 1L)
Arguments
sY |
sparse matrix Gene by cell matrix of counts |
rowSel |
numeric The selected rows (genes) |
ncores |
numeric The number of cores |
Value
data.frame with columns of mean, variance, and number of observeatios for each gene across samples
Examples
library(Matrix)
donor_by_gene <- rbind(c(9,2,1,5), c(3,3,1,2))
donor_by_gene <- Matrix(donor_by_gene, sparse = TRUE)
result <- colMeanVars(donor_by_gene, rowSel = NULL, ncores=1)
Plot a pairwise comparison of factors from two separate decompositions
Description
Plot a pairwise comparison of factors from two separate decompositions
Usage
compare_decompositions(
tucker_res1,
tucker_res2,
decomp_names,
meta_anno1 = NULL,
meta_anno2 = NULL,
use_text = TRUE
)
Arguments
tucker_res1 |
list The container$tucker_res from first decomposition |
tucker_res2 |
list The container$tucker_res from first decomposition |
decomp_names |
character Names of the two decompositions that will go on the axes of the heatmap |
meta_anno1 |
matrix The result of calling get_meta_associations() corresponding to the first decomposition, which is stored in container$meta_associations (default=NULL) |
meta_anno2 |
matrix The result of calling get_meta_associations() corresponding to the second decomposition, which is stored in container$meta_associations (default=NULL) |
use_text |
logical If TRUE, then displays correlation coefficients in cells (default=TRUE) |
Value
No return value, as the resulting plots are drawn.
Examples
test_container <- run_tucker_ica(test_container, ranks=c(2,4),
tucker_type='regular', rotation_type='hybrid')
tucker_res1 <- test_container$tucker_results
test_container <- run_tucker_ica(test_container, ranks=c(2,4),
tucker_type='regular', rotation_type='ica_dsc')
tucker_res2 <- test_container$tucker_results
compare_decompositions(tucker_res1,tucker_res2,c('hybrid_method','ica_method'))
Compute and plot the LR interactions for one factor
Description
Compute and plot the LR interactions for one factor
Usage
compute_LR_interact(
container,
lr_pairs,
sig_thresh = 0.05,
percentile_exp_rec = 0.75,
add_ld_fact_sig = TRUE,
ncores = container$experiment_params$ncores
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
lr_pairs |
data.frame Data of ligand-receptor pairs. First column should be ligands and second column should be one or more receptors separated by an underscore such as receptor1_receptor2 in the case that multiple receptors are required for signaling. |
sig_thresh |
numeric The p-value significance threshold to use for module- factor associations and ligand-factor associations (default=0.05) |
percentile_exp_rec |
numeric The percentile above which the top donors expressing the ligand all must be expressing the receptor (default=0.75) |
add_ld_fact_sig |
logical Set to TRUE to append a heatmap showing significance of associations between each ligand hit and each factor (default=TRUE) |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
Value
The LR analysis results heatmap as ComplexHeatmap object. Adjusted p-values for all results are placed in container$lr_res.
Compute associations between donor proportions and factor scores
Description
Compute associations between donor proportions and factor scores
Usage
compute_associations(donor_balances, donor_scores, stat_type)
Arguments
donor_balances |
matrx The balances computed from donor cell type proportions |
donor_scores |
data.frame The donor scores matrix from tucker results |
stat_type |
character Either "fstat" to get F-Statistics, "adj_rsq" to get adjusted R-squared values, or "adj_pval" to get adjusted pvalues. |
Value
A numeric vector of association statistics (one for each factor)
Get donor proportions of each cell type or subtype
Description
Get donor proportions of each cell type or subtype
Usage
compute_donor_props(clusts, metadata)
Arguments
clusts |
integer Cluster assignments for each cell with names as cell barcodes |
metadata |
data.frame The $metadata field for the given scMinimal |
Value
A data.frame of cluster proportions for each donor.
Convert gene identifiers to gene symbols
Description
Convert gene identifiers to gene symbols
Usage
convert_gn(container, genes)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
genes |
character Vector of the gene identifiers to be converted to gene symbols |
Value
A character vector of gene symbols.
count_word. From older version of simplifyEnrichment package.
Description
count_word. From older version of simplifyEnrichment package.
Usage
count_word(term, exclude_words = NULL)
Arguments
term |
A vector of description texts. |
exclude_words |
The words that should be excluded. |
Value
A data frame with words and frequencies.
Run rank determination by svd on the tensor unfolded along each mode
Description
Run rank determination by svd on the tensor unfolded along each mode
Usage
determine_ranks_tucker(
container,
max_ranks_test,
shuffle_level = "cells",
shuffle_within = NULL,
num_iter = 100,
batch_var = NULL,
norm_method = "trim",
scale_factor = 10000,
scale_var = TRUE,
var_scale_power = 0.5,
seed = container$experiment_params$rand_seed
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
max_ranks_test |
numeric Vector of length 2 specifying the maximum number of donor and gene ranks to test |
shuffle_level |
character Either "cells" to shuffle cell-donor linkages or "tensor" to shuffle values within the tensor (default="cells") |
shuffle_within |
character A metadata variable to shuffle cell-donor linkages within (default=NULL) |
num_iter |
numeric Number of null iterations (default=100) |
batch_var |
character A batch variable from metadata to remove. No batch correction applied if NULL. (default=NULL) |
norm_method |
character The normalization method to use on the pseudobulked count data. Set to 'regular' to do standard normalization of dividing by library size. Set to 'trim' to use edgeR trim-mean normalization, whereby counts are divided by library size times a normalization factor. (default='trim') |
scale_factor |
numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000) |
scale_var |
logical TRUE to scale the gene expression variance across donors for each cell type. If FALSE then all genes are scaled to unit variance across donors for each cell type. (default=TRUE) |
var_scale_power |
numeric Exponent of normalized variance that is used for variance scaling. Variance for each gene is initially set to unit variance across donors (for a given cell type). Variance for each gene is then scaled by multiplying the unit scaled values by each gene's normalized variance (where the effect of the mean-variance dependence is taken into account) to the exponent specified here. If NULL, uses var_scale_power from container$experiment_params. (default=.5) |
seed |
numeric Seed passed to set.seed() (default=container$experiment_params$rand_seed) |
Value
The project container with a cowplot figure of rank determination plots in container$plots$rank_determination_plot.
Examples
test_container <- determine_ranks_tucker(test_container, max_ranks_test=c(3,5),
shuffle_level='tensor', num_iter=4, norm_method='trim', scale_factor=10000,
scale_var=TRUE, var_scale_power=.5)
Form the pseudobulk tensor as preparation for running the tensor decomposition.
Description
Form the pseudobulk tensor as preparation for running the tensor decomposition.
Usage
form_tensor(
container,
donor_min_cells = 5,
norm_method = "trim",
scale_factor = 10000,
vargenes_method = "norm_var",
vargenes_thresh = 500,
batch_var = NULL,
scale_var = TRUE,
var_scale_power = 0.5,
custom_genes = NULL,
verbose = TRUE
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
donor_min_cells |
numeric Minimum threshold for number of cells per donor (default=5) |
norm_method |
character The normalization method to use on the pseudobulked count data. Set to 'regular' to do standard normalization of dividing by library size. Set to 'trim' to use edgeR trim-mean normalization, whereby counts are divided by library size times a normalization factor. (default='trim') |
scale_factor |
numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000) |
vargenes_method |
character The method by which to select highly variable genes from each cell type. Set to 'anova' to select genes by anova. Set to 'norm_var' to select the top genes by normalized variance or 'norm_var_pvals' to select genes by significance of their overdispersion (default='norm_var') |
vargenes_thresh |
numeric The threshold to use in variable gene selection. For 'anova' and 'norm_var_pvals' this should be a p-value threshold. For 'norm_var' this should be the number of most variably expressed genes to select from each cell type (default=500) |
batch_var |
character A batch variable from metadata to remove (default=NULL) |
scale_var |
logical TRUE to scale the gene expression variance across donors for each cell type. If FALSE then all genes are scaled to unit variance across donors for each cell type. (default=TRUE) |
var_scale_power |
numeric Exponent of normalized variance that is used for variance scaling. Variance for each gene is initially set to unit variance across donors (for a given cell type). Variance for each gene is then scaled by multiplying the unit scaled values by each gene's normalized variance (where the effect of the mean-variance dependence is taken into account) to the exponent specified here. If NULL, uses var_scale_power from container$experiment_params. (default=.5) |
custom_genes |
character A vector of genes to include in the tensor. Overrides the default gene selection if not NULL. (default=NULL) |
verbose |
logical Set to TRUE to print out progress (default=TRUE) |
Value
The project container with a list of tensor data added in the container$tensor_data slot.
Examples
test_container <- form_tensor(test_container, donor_min_cells=0,
norm_method='trim', scale_factor=10000, vargenes_method='norm_var', vargenes_thresh=500,
scale_var = TRUE, var_scale_power = 1.5)
Generate loadings heatmaps for all factors
Description
Generate loadings heatmaps for all factors
Usage
get_all_lds_factor_plots(
container,
use_sig_only = FALSE,
nonsig_to_zero = FALSE,
annot = "none",
pathways_list = NULL,
sim_de_donor_group = NULL,
sig_thresh = 0.05,
display_genes = FALSE,
gene_callouts = FALSE,
callout_n_gene_per_ctype = 5,
callout_ctypes = NULL,
show_var_explained = TRUE,
reset_other_factor_plots = TRUE
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
use_sig_only |
logical If TRUE, includes only significant genes from jackstraw in the heatmap. If FALSE, includes all the variable genes. (default = FALSE) |
nonsig_to_zero |
logical If TRUE, makes the loadings of all nonsignificant genes 0 (default=FALSE) |
annot |
character If set to "pathways" then creates an adjacent heatmap showing which genes are in which pathways. If set to "sig_genes" then creates an adjacent heatmap showing which genes were significant from jackstraw. If set to "none" no adjacent heatmap is plotted. (default="none") |
pathways_list |
list A list of sets of pathways for each factor. List index should be the number corresponding to the factor. (default=NULL) |
sim_de_donor_group |
numeric To plot the ground truth significant genes from a simulation next to the heatmap, put the number of the donor group that corresponds to the factor being plotted. Here it should be a vector corresponding to the factors. (default=NULL) |
sig_thresh |
numeric Pvalue significance threshold to use. If use_sig_only is TRUE the threshold is used as a cutoff for genes to include. If annot is "sig_genes" this value is used in the gene significance colormap as a minimum threshold. (default=0.05) |
display_genes |
logical If TRUE, displays the names of gene names (default=FALSE) |
gene_callouts |
logical If TRUE, then adds gene callout annotations to the heatmap (default=FALSE) |
callout_n_gene_per_ctype |
numeric To use if gene_callouts is TRUE. Sets the number of largest magnitude significant genes from each cell type to include in gene callouts. (default=5) |
callout_ctypes |
list To use if gene_callouts is TRUE. Specifies which cell types to get gene callouts for. Each entry of the list should be a character vector of ctypes for the respective factor. If NULL, then gets gene callouts for largest magnitude significant genes for all cell types. (default=NULL) |
show_var_explained |
logical If TRUE then shows an anottation with the explained variance for each cell type (default=TRUE) |
reset_other_factor_plots |
logical If TRUE then removes any existing loadings plots (default=TRUE) |
Value
The project container with the list of all loadings heatmap plots placed in container$plots$all_lds_plots.
Examples
test_container <- get_all_lds_factor_plots(test_container)
Get gene callout annotations for a loadings heatmap
Description
Get gene callout annotations for a loadings heatmap
Usage
get_callouts_annot(
container,
tmp_casted_num,
factor_select,
sig_thresh,
top_n_per_ctype = 5,
ctypes = NULL
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
tmp_casted_num |
matrix The gene by cell type loadings matrix |
factor_select |
numeric The factor to investigate |
sig_thresh |
numeric Pvalue cutoff for significant genes |
top_n_per_ctype |
numeric The number of significant, largest magnitude genes from each cell type to generate callouts for (default=5) |
ctypes |
character The cell types for which to get the top genes to make callouts for. If NULL then uses all cell types. (default=NULL) |
Value
A HeatmapAnnotation object for the gene callouts.
Get explained variance of the reconstructed data using one cell type from one factor
Description
Get explained variance of the reconstructed data using one cell type from one factor
Usage
get_ctype_exp_var(container, factor_use, ctype)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_use |
numeric The factor to get variance explained for |
ctype |
character The cell type to get variance explained for |
Value
The explained variance numeric value for one cell type of one factor.
Compute and plot associations between donor factor scores and donor proportions of major cell types
Description
Compute and plot associations between donor factor scores and donor proportions of major cell types
Usage
get_ctype_prop_associations(container, stat_type, n_col = 2)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
stat_type |
character Either "fstat" to get F-Statistics, "adj_rsq" to get adjusted R-squared values, or "adj_pval" to get adjusted pvalues. |
n_col |
numeric The number of columns to organize the plots into (default=2) |
Value
The project container with a cowplot figure of results plots in container$plots$ctype_prop_factor_associations.
Compute and plot associations between donor factor scores and donor proportions of cell subtypes
Description
Compute and plot associations between donor factor scores and donor proportions of cell subtypes
Usage
get_ctype_subc_prop_associations(
container,
ctype,
res,
n_col = 2,
alt_name = NULL
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ctype |
character The cell type to get results for |
res |
numeric The clustering resolution to retrieve |
n_col |
numeric The number of columns to organize the plots into (default=2) |
alt_name |
character Alternate name for the cell type used in clustering (default=NULL) |
Value
The project container with a cowplot figure of results plots in container$plots$ctype_prop_factor_associations.
Partition main gene by cell matrix into per cell type matrices with significantly variable genes only. Generally, this should be done through calling the form_tensor() wrapper function.
Description
Partition main gene by cell matrix into per cell type matrices with significantly variable genes only. Generally, this should be done through calling the form_tensor() wrapper function.
Usage
get_ctype_vargenes(
container,
method,
thresh,
ncores = container$experiment_params$ncores,
seed = container$experiment_params$rand_seed
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
method |
character The method used to select significantly variable genes across donors within a cell type. Can be either "anova" to use basic anova with cells grouped by donor or "norm_var" to get the top overdispersed genes by normalized variance. Set to "norm_var_pvals" to use normalized variance p-values as calculated in pagoda2. |
thresh |
numeric A pvalue threshold to use for gene significance when method is set to "anova" or "empir". For the method "norm_var" thresh is the number of top overdispersed genes from each cell type to include. |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
seed |
numeric Seed passed to set.seed() (default=container$experiment_params$rand_seed) |
Value
The project container with pseudobulk matrices limted to the selected most variable genes.
Get metadata matrix of dimensions donors by variables (not per cell)
Description
Get metadata matrix of dimensions donors by variables (not per cell)
Usage
get_donor_meta(container, additional_meta = NULL, only_analyzed = TRUE)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
additional_meta |
character A vector of other variables to include (default=NULL) |
only_analyzed |
logical Set to TRUE to only include donors that were included in the formed tensor, otherwise set to FALSE (default=TRUE) |
Value
The project container with metadata per donor (not per cell) in container$donor_metadata.
Examples
test_container <- get_donor_meta(test_container, additional_meta='lanes')
Get the explained variance of the reconstructed data using one factor
Description
Get the explained variance of the reconstructed data using one factor
Usage
get_factor_exp_var(container, factor_use)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_use |
numeric The factor to investigate |
Value
The explained variance numeric value for one factor.
Calculate adjusted p-values for gene_celltype fiber-donor score associations
Description
Calculate adjusted p-values for gene_celltype fiber-donor score associations
Usage
get_fstats_pvals(fstats_real, fstats_shuffled)
Arguments
fstats_real |
numeric A vector of F-Statistics for gene-cell type-factor combinations |
fstats_shuffled |
numeric A vector of null F-Statistics |
Value
A vector of adjusted p-values for associations of the unshuffled fibers with factor donor scores.
Compute WGCNA gene modules for each cell type
Description
Compute WGCNA gene modules for each cell type
Usage
get_gene_modules(container, sft_thresh)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
sft_thresh |
numeric A vector indicating the soft threshold to use for each cell type. Length should be the same as container$experiment_params$ctypes_use |
Value
The project container with WGCNA gene co-expression modules added. The module eigengenes for each cell type are in container$module_eigengenes, and the module genes for each cell type are in container$module_genes.
Get logical vectors indicating which genes are in which pathways
Description
Get logical vectors indicating which genes are in which pathways
Usage
get_gene_set_vectors(container, gene_sets, tmp_casted_num)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
gene_sets |
character Vector of gene sets to extract genes for |
tmp_casted_num |
matrix The gene by cell type loadings matrix |
Value
A list of the logical vectors for each pathway.
Compute subtype proportion-factor association p-values for all subclusters of a given major cell type
Description
Compute subtype proportion-factor association p-values for all subclusters of a given major cell type
Usage
get_indv_subtype_associations(container, donor_props, factor_select)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
donor_props |
matrix Donor proportions of subtypes |
factor_select |
numeric The factor to get associations for |
Value
A vector of association statistics each cell subtype against a selected factor.
Extract the intersection of gene sets which are enriched in two or more cell types for a factor
Description
Extract the intersection of gene sets which are enriched in two or more cell types for a factor
Usage
get_intersecting_pathways(
container,
factor_select,
these_ctypes_only,
up_down,
thresh = 0.05
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to investigate |
these_ctypes_only |
character A vector of cell types for which to get gene sets that are enriched in all of these and not in any other cell types |
up_down |
character Set to "up" to get the gene sets for the positive loading genes. Set to "down" to get the gene sets for the negative loadings genes. |
thresh |
numeric Pvalue significance threshold for selecting enriched sets (default=0.05) |
Value
A vector of the intersection of pathways that are significantly enriched in two or more cell types for a factor.
Get the leading edge genes from GSEA results
Description
Get the leading edge genes from GSEA results
Usage
get_leading_edge_genes(container, factor_select, gsets, num_genes_per = 5)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to get results for |
gsets |
character A vector of gene set names to get leading edge genes for. |
num_genes_per |
numeric The maximum number of leading edge genes to get for each gene set (default=5) |
Value
A named character vector of gene sets, with leading edge genes as the names.
Compute gene-factor associations using univariate linear models
Description
Compute gene-factor associations using univariate linear models
Usage
get_lm_pvals(container, n.cores = container$experiment_params$ncores)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
n.cores |
Number of cores to use (default = container$experiment_params$ncores) |
Value
The project container with a vector of adjusted p-values for the gene-factor associations in container$gene_score_associations.
Examples
test_container <- get_lm_pvals(test_container, n.cores=1)
Computes the max correlation between each factor of the decomposition done using the whole dataset to each factor computed using the subsampled/bootstrapped dataset
Description
Computes the max correlation between each factor of the decomposition done using the whole dataset to each factor computed using the subsampled/bootstrapped dataset
Usage
get_max_correlations(res_full, res_sub, res_use)
Arguments
res_full |
matrix Either the donor scores or loadings matrix from the original decomposition |
res_sub |
matrix Either the donor scores or loadings matrix from the new decomposition |
res_use |
character Can either be 'loadings' or 'dscores' and should correspond with the data matrix used |
Value
a vector of the max correlations for each original factor
Get metadata associations with factor donor scores
Description
Get metadata associations with factor donor scores
Usage
get_meta_associations(container, vars_test, stat_use = "rsq")
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
vars_test |
character The names of meta variables to get associations for |
stat_use |
character Set to either 'rsq' to get r-squared values or 'pval' to get adjusted pvalues (default='rsq) |
Value
The project container with a matrix of metadata associations with each factor in container$meta_associations.
Examples
test_container <- get_meta_associations(test_container, vars_test='lanes', stat_use='pval')
Evaluate the minimum number for significant genes in any factor for a given number of factors extracted by the decomposition
Description
Evaluate the minimum number for significant genes in any factor for a given number of factors extracted by the decomposition
Usage
get_min_sig_genes(
container,
donor_rank_range,
gene_ranks,
use_lm = TRUE,
tucker_type = "regular",
rotation_type = "hybrid",
n_fibers = 100,
n_iter = 500,
n.cores = container$experiment_params$ncores,
thresh = 0.05
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses. Should have |
donor_rank_range |
numeric Range of possible number of donor factors to use. |
gene_ranks |
numeric The number of gene ranks to use in the decomposition |
use_lm |
logical Set to true to use get_lm_pvals otherwise uses jackstraw (default=TRUE) |
tucker_type |
character Set to 'regular' to run regular tucker or to 'sparse' to run tucker with sparsity constraints (default='regular') |
rotation_type |
character Set to 'hybrid' to perform hybrid rotation on resulting donor factor matrix and loadings. Otherwise set to 'ica_lds' to perform ica rotation on loadings or ica_dsc to perform ica on donor scores. (default='hybrid') |
n_fibers |
numeric The number of fibers the randomly shuffle in each jackstraw iteration (default=100) |
n_iter |
numeric The number of jackstraw shuffling iterations to complete (default=500) |
n.cores |
Number of cores to use in get_lm_pvals() (default = container$experiment_params$ncores) |
thresh |
numeric Pvalue threshold for significant genes in calculating the number of significant genes identified per factor. (default=0.05) |
Value
The project container with a plot of the minimum significant genes for each decomposition with varying number of donor factors located in container$plots$min_sig_genes.
Examples
test_container <- get_min_sig_genes(test_container, donor_rank_range=c(2:4),
gene_ranks=4, tucker_type='regular', rotation_type='hybrid', n.cores=1)
Identify gene sets that are enriched within specified gene co-regulatory modules. Uses a hypergeometric test for over-representation. Used in plot_multi_module_enr().
Description
Identify gene sets that are enriched within specified gene co-regulatory modules. Uses a hypergeometric test for over-representation. Used in plot_multi_module_enr().
Usage
get_module_enr(container, ctype, mod_select, db_use = "GO", adjust_pval = TRUE)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ctype |
character The name of cell type for the cell type module to test |
mod_select |
numeric The module number for the cell type module to test |
db_use |
character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", "BioCarta", "Hallmark", "TF", and "immuno". More than one database can be used. (default="GO") |
adjust_pval |
logical Set to TRUE to apply FDR correction (default=TRUE) |
Value
A vector of p-values for the tested gene sets.
Get normalized variance for each gene, taking into account mean-variance trend
Description
Get normalized variance for each gene, taking into account mean-variance trend
Usage
get_normalized_variance(container)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
Value
The project container with vectors of normalized variances values in scMinimal objects for each cell type. Generally, this should be done through calling the form_tensor() wrapper function.
Plot factor-batch associations for increasing number of donor factors
Description
Plot factor-batch associations for increasing number of donor factors
Usage
get_num_batch_ranks(
container,
donor_ranks_test,
gene_ranks,
batch_var,
thresh = 0.5,
tucker_type = "regular",
rotation_type = "hybrid"
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
donor_ranks_test |
numeric The number of donor rank values to test |
gene_ranks |
numeric The number of gene ranks to use throughout |
batch_var |
character The name of the batch meta variable |
thresh |
numeric The threshold r-squared cutoff for considering a factor to be a batch factor. Can be a vector of multiple values to get plots at varying thresholds. (default=0.5) |
tucker_type |
character Set to 'regular' to run regular tucker or to 'sparse' to run tucker with sparsity constraints (default='regular') |
rotation_type |
character Set to 'hybrid' to optimize loadings via our hybrid method (see paper for details). Set to 'ica_dsc' to perform ICA rotation on resulting donor factor matrix. Set to 'ica_lds' to optimize loadings by the ICA rotation. (default='hybrid') |
Value
A ggpubr figure of ggplot objects showing batch-factor associations and placed in container$plots$num_batch_factors slot
Examples
test_container <- get_num_batch_ranks(test_container, donor_ranks_test=c(2:4),
gene_ranks=10, batch_var='lanes', thresh=0.5, tucker_type='regular', rotation_type='hybrid')
Get the donor scores and loadings matrix for a single-factor
Description
Get the donor scores and loadings matrix for a single-factor
Usage
get_one_factor(container, factor_select)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The number corresponding to the factor to extract |
Value
A list with the first element as the donor scores and the second element as the corresponding loadings matrix for one factor.
Examples
f1_res <- get_one_factor(test_container, factor_select=1)
Get significant genes for a factor
Description
Get significant genes for a factor
Usage
get_one_factor_gene_pvals(container, factor_select)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The number corresponding to the factor to extract |
Value
A gene by cell type matrix of gene significance p-values for a factor
Collapse data from cell-level to donor-level via summing counts. Generally, this should be done through calling the form_tensor() wrapper function.
Description
Collapse data from cell-level to donor-level via summing counts. Generally, this should be done through calling the form_tensor() wrapper function.
Usage
get_pseudobulk(container, shuffle = FALSE, shuffle_within = NULL)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
shuffle |
logical Set to TRUE to shuffle cell-donor linkages (default=FALSE) |
shuffle_within |
character A metadata variable to shuffle cell-donor linkages within (default=NULL) |
Value
The project container with pseudobulked count matrices in container$scMinimal_ctype$<ctype>$pseudobulk slots for each cell type.
Get F-Statistics for the real (non-shuffled) gene_ctype fibers
Description
Get F-Statistics for the real (non-shuffled) gene_ctype fibers
Usage
get_real_fstats(container, ncores)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ncores |
numeric The number of cores to use |
Value
A vector F-statistics for each gene_celltype-factor association of the unshuffled data.
Calculate reconstruction errors using svd approach
Description
Calculate reconstruction errors using svd approach
Usage
get_reconstruct_errors_svd(tnsr, max_ranks_test, shuffle_tensor)
Arguments
tnsr |
array A 3-dimensional array with dimensions of donors, genes, and cell types in that order |
max_ranks_test |
numeric Vector of length 3 with maximum number of ranks to test for donor, gene, and cell type modes in that order |
shuffle_tensor |
logical Set to TRUE to shuffle values within the tensor |
Value
A list of reconstruction errors for each mode of the tensor.
Get vectors indicating which genes are significant in which cell types for a factor of interest
Description
Get vectors indicating which genes are significant in which cell types for a factor of interest
Usage
get_significance_vectors(container, factor_select, ctypes)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to query |
ctypes |
character The cell types used in all the analysis ordered as they appear in the loadings matrix |
Value
A list of the adjusted p-values for expression of each gene in each cell type in association with a factor of interest.
Get list of cell subtype differential expression heatmaps
Description
Get list of cell subtype differential expression heatmaps
Usage
get_subclust_de_hmaps(container, all_ctypes, all_res)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
all_ctypes |
character A vector of the cell types to include |
all_res |
numeric A vector of resolutions matching the all_ctypes parameter |
Value
A list of cell subcluster DE marker gene heatmaps as grob objects.
Get scatter plot for association of a cell subtype proportion with scores for a factor
Description
Get scatter plot for association of a cell subtype proportion with scores for a factor
Usage
get_subclust_enr_dotplot(
container,
ctype,
res,
subtype,
factor_use,
ctype_cur = ctype
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ctype |
character The cell type to plot |
res |
numeric The subcluster resolution to use |
subtype |
numeric The number corresponding with the subtype of the major cell type to plot |
factor_use |
numeric The factor to plot |
ctype_cur |
character The name of the major cell type used in the main analysis |
Value
A ggplot object of each donor's cell subcluster proportions against donor scores for a selected factor.
Get a figure showing cell subtype proportion associations with each factor. Combines this plot with subtype UMAPs and differential expression heatmaps. Note that this function runs better if the number of cores in the conos object in container$embedding has n.cores set to a relatively small value < 10.
Description
Get a figure showing cell subtype proportion associations with each factor. Combines this plot with subtype UMAPs and differential expression heatmaps. Note that this function runs better if the number of cores in the conos object in container$embedding has n.cores set to a relatively small value < 10.
Usage
get_subclust_enr_fig(container, all_ctypes, all_res)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
all_ctypes |
character A vector of the cell types to include |
all_res |
numeric A vector of resolutions matching the all_ctypes parameter |
Value
A cowplot figure placed in the slot container$plots$subc_fig.
Get heatmap of subtype proportion associations for each celltype/subtype and each factor
Description
Get heatmap of subtype proportion associations for each celltype/subtype and each factor
Usage
get_subclust_enr_hmap(container, all_ctypes, all_res, all_factors)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
all_ctypes |
character A vector of the cell types to include |
all_res |
numeric A vector of resolutions matching the all_ctypes parameter |
all_factors |
numerc A vector of the factors to compute associations for |
Value
A ComplexHeatmap object in container$plots$subc_enr_hmap showing the univariate associations between cell subcluster proportions and each factor.
Get a figure to display subclusterings at multiple resolutions
Description
Get a figure to display subclusterings at multiple resolutions
Usage
get_subclust_umap(container, all_ctypes, all_res, n_col = 3)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
all_ctypes |
character A vector of the cell types to include |
all_res |
numeric A vector of resolutions matching the all_ctypes parameter |
n_col |
numeric The number of columns to organize the figure into (default=3) |
Value
The project container with a cowplot figure of all UMAP plots in container$plots$subc_umap_fig and the individual umap plots in container$plots$subc_umaps
Perform leiden subclustering to get cell subtypes
Description
Perform leiden subclustering to get cell subtypes
Usage
get_subclusters(
container,
ctype,
resolution,
min_cells_group = 50,
small_clust_action = "merge"
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ctype |
character The cell type to do subclustering for |
resolution |
numeric The leiden resolution to use |
min_cells_group |
numeric The minimum allowable cluster size (default=50) |
small_clust_action |
character Either 'remove' to remove subclusters or 'merge' to merge clusters below min_cells_group threshold to the nearest cluster above the size threshold (default='merge') |
Value
A vector of cell subclusters.
Compute and plot associations between factor scores and cell subtype composition for various clustering resolution parameters
Description
Compute and plot associations between factor scores and cell subtype composition for various clustering resolution parameters
Usage
get_subtype_prop_associations(
container,
max_res,
stat_type,
integration_var = NULL,
min_cells_group = 50,
use_existing_subc = FALSE,
alt_ct_names = NULL,
n_col = 2
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
max_res |
numeric The maximum clustering resolution to use. Minimum is 0.5. |
stat_type |
character Either "fstat" to get F-Statistics, "adj_rsq" to get adjusted R-squared values, or "adj_pval" to get adjusted pvalues. |
integration_var |
character The meta data variable to use for creating the joint embedding with Conos if not already provided in container$embedding (default=NULL) |
min_cells_group |
numeric The minimum allowable size for cell subpopulations (default=50) |
use_existing_subc |
logical Set to TRUE to use existing subcluster annotations (default=FALSE) |
alt_ct_names |
character Cell type names used in clustering if different from those used in the main analysis. Should match the order of container$experiment_params$ctypes_use. (default=NULL) |
n_col |
numeric The number of columns to organize the plots into (default=2) |
Value
The project container with a cowplot figure of cell subtype proportion-factor association results plots in container$plots$subtype_prop_factor_associations.
Calculates factor-stratified sums for each column. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp
Description
Calculates factor-stratified sums for each column. Adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/src/misc2.cpp
Usage
get_sums(sY, rowSel)
Arguments
sY |
sparse matrix Gene by cell matrix of counts |
rowSel |
factor The donor that each cell is from |
Value
matrix of summed counts per gene per sample
Visualize the similarity matrix and the clustering. Adapted from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R
Description
Visualize the similarity matrix and the clustering. Adapted from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R
Usage
ht_clusters(
mat,
cl,
dend = NULL,
col = c("white", "red"),
draw_word_cloud = is_GO_id(rownames(mat)[1]) || !is.null(term),
term = NULL,
min_term = 5,
order_by_size = FALSE,
exclude_words = character(0),
max_words = 10,
word_cloud_grob_param = list(),
fontsize_range = c(4, 16),
column_title = NULL,
ht_list = NULL,
use_raster = TRUE,
...
)
Arguments
mat |
A similarity matrix. |
cl |
Cluster labels inferred from the similarity matrix, e.g. from 'cluster_terms' or 'binary_cut'. |
dend |
Used internally. |
col |
A vector of colors that map from 0 to the 95^th percentile of the similarity values. |
draw_word_cloud |
Whether to draw the word clouds. |
term |
The full name or the description of the corresponding GO IDs. |
min_term |
Minimal number of functional terms in a cluster. All the clusters with size less than “min_term“ are all merged into one separated cluster in the heatmap. |
order_by_size |
Whether to reorder clusters by their sizes. The cluster that is merged from small clusters (size < “min_term“) is always put to the bottom of the heatmap. |
exclude_words |
Words that are excluded in the word cloud. |
max_words |
Maximal number of words visualized in the word cloud. |
word_cloud_grob_param |
A list of graphic parameters passed to 'word_cloud_grob'. |
fontsize_range |
The range of the font size. The value should be a numeric vector with length two. The minimal font size is mapped to word frequency value of 1 and the maximal font size is mapped to the maximal word frequency. The font size interlopation is linear. |
column_title |
Column title for the heatmap. |
ht_list |
A list of additional heatmaps added to the left of the similarity heatmap. |
use_raster |
Whether to write the heatmap as a raster image. |
... |
other parameters |
Value
A list containing a 'ComplexHeatmap::HeatmapList-class' object and GO term ordering.
Extract metadata for sex information if not provided already
Description
Extract metadata for sex information if not provided already
Usage
identify_sex_metadata(container, y_gene = "RPS4Y1", x_gene = "XIST")
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
y_gene |
character Gene name to use for identifying male donors (default='RPS4Y1') |
x_gene |
character Gene name to use for identifying female donors (default='XIST') |
Value
The project container with sex metadata added to the metadata.
Initialize parameters to be used throughout scITD in various functions
Description
Initialize parameters to be used throughout scITD in various functions
Usage
initialize_params(ctypes_use, ncores = 4, rand_seed = 10)
Arguments
ctypes_use |
character Names of the cell types to use for the analysis (default=NULL) |
ncores |
numeric Number of cores to use (default=4) |
rand_seed |
numeric Random seed to use (default=10) |
Value
A list of the experiment parameters to use.
Examples
param_list <- initialize_params(ctypes_use = c("CD4+ T", "CD8+ T"),
ncores = 1, rand_seed = 10)
Create an scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.
Description
Create an scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.
Usage
instantiate_scMinimal(
count_data,
meta_data,
metadata_cols = NULL,
metadata_col_nm = NULL
)
Arguments
count_data |
sparseMatrix Matrix of raw counts with genes as rows and cells as columns |
meta_data |
data.frame Metadata with cells as rows and variables as columns. Number of rows in metadata should equal number of columns in count matrix. |
metadata_cols |
character The names of the metadata columns to use (default=NULL) |
metadata_col_nm |
character New names for the selected metadata columns if wish to change their names. If NULL, then the preexisting column names are used. (default=NULL) |
Value
An scMinimal object holding counts and metadata for a project.
Examples
scMinimal <- instantiate_scMinimal(count_data=test_container$scMinimal_full$count_data,
meta_data=test_container$scMinimal_full$metadata)
Check if a character is a go ID
Description
Check if a character is a go ID
Usage
is_GO_id(x)
Arguments
x |
A character |
Value
A logical
Create a container to store all data and results for the project. You must provide a params list as generated by initialize_params(). You also need to provide either a Seurat object or both a count_data matrix and a meta_data matrix.
Description
Create a container to store all data and results for the project. You must provide a params list as generated by initialize_params(). You also need to provide either a Seurat object or both a count_data matrix and a meta_data matrix.
Usage
make_new_container(
params,
count_data = NULL,
meta_data = NULL,
seurat_obj = NULL,
scMinimal = NULL,
gn_convert = NULL,
metadata_cols = NULL,
metadata_col_nm = NULL,
label_donor_sex = FALSE
)
Arguments
params |
list A list of the experiment params to use as generated by initialize_params() |
count_data |
dgCMatrix Matrix of raw counts with genes as rows and cells as columns (default=NULL) |
meta_data |
data.frame Metadata with cells as rows and variables as columns. Number of rows in metadata should equal number of columns in count matrix (default=NULL) |
seurat_obj |
Seurat object that has been cleaned and includes the normalized, log-transformed counts. The meta.data should include a column with the header 'sex' and values of 'M' or 'F' if available. The metadata should also have a column with the header 'ctypes' with the corresponding names of the cell types as well as a column with header 'donors' that contains identifiers for each donor. (default=NULL) |
scMinimal |
environment A sub-container for the project typically consisting of gene expression data in its raw and processed forms as well as metadata (default=NULL) |
gn_convert |
data.frame Gene identifier -> gene name conversions table. Gene identifiers used in counts matrices should appear in the first column and the corresponding gene symbols should appear in the second column. Can remain NULL if the identifiers are already gene symbols. (default=NULL) |
metadata_cols |
character The names of the metadata columns to use (default=NULL) |
metadata_col_nm |
character New names for the selected metadata columns if wish to change their names. If NULL, then the preexisting column names are used. (default=NULL) |
label_donor_sex |
logical Set to TRUE to label donor sex in the meta data by using expressing of sex-associated genes (default=FALSE) |
Value
A project container of class environment that stores sub-containers for each cell type as well as results and plots from all analyses.
Merge small subclusters into larger ones
Description
Merge small subclusters into larger ones
Usage
merge_small_clusts(con, clusts, min_cells_group)
Arguments
con |
conos Object for the dataset with umap projection and groups as cell types |
clusts |
character The initially assigned subclusters by leiden clustering |
min_cells_group |
numeric The minimum allowable cluster size |
Value
The subcluster labels with small clusters below the size threshold merged into the nearest larger cluster.
Computes non-negative matrix factorization on the tensor unfolded along the donor dimension
Description
Computes non-negative matrix factorization on the tensor unfolded along the donor dimension
Usage
nmf_unfolded(container, ranks)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ranks |
numeric The number of factors to extract. Unlike with the Tucker decomposition, this should be a single number. |
Value
The project container with results of the decomposition in container$tucker_results. The results object is a list with the donor scores matrix in the first element and the unfolded loadings matrix in the second element.
Examples
test_container <- nmf_unfolded(test_container, 2)
Calculates the normalized variance for each gene. This is adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/R/Pagoda2.R Generally, this should be done through calling the form_tensor() wrapper function.
Description
Calculates the normalized variance for each gene. This is adapted from pagoda2. https://github.com/kharchenkolab/pagoda2/blob/main/R/Pagoda2.R Generally, this should be done through calling the form_tensor() wrapper function.
Usage
norm_var_helper(scMinimal)
Arguments
scMinimal |
environment A sub-container for the project typically consisting of gene expression data in its raw and processed forms as well as metadata |
Value
A list with the first element containing a vector of the normalized variance for each gene and the second element containing log-transformed adjusted p-values for the overdispersion of each gene.
Helper function to normalize and log-transform count data
Description
Helper function to normalize and log-transform count data
Usage
normalize_counts(count_data, scale_factor = 10000)
Arguments
count_data |
matrix or sparse matrix Gene by cell matrix of counts |
scale_factor |
numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000) |
Value
The normalized, log-transformed matrix.
Normalize the pseudobulked counts matrices. Generally, this should be done through calling the form_tensor() wrapper function.
Description
Normalize the pseudobulked counts matrices. Generally, this should be done through calling the form_tensor() wrapper function.
Usage
normalize_pseudobulk(container, method = "trim", scale_factor = 10000)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
method |
character The normalization method to use on the pseudobulked count data. Set to 'regular' to do standard normalization of dividing by library size. Set to 'trim' to use edgeR trim-mean normalization, whereby counts are divided by library size times a normalization factor. (default='trim') |
scale_factor |
numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000) |
Value
The project container with normalized pseudobulk matrices in container$scMinimal_ctype$<ctype>$pseudobulk slots.
Parse main counts matrix into per-celltype-matrices. Generally, this should be done through calling the form_tensor() wrapper function.
Description
Parse main counts matrix into per-celltype-matrices. Generally, this should be done through calling the form_tensor() wrapper function.
Usage
parse_data_by_ctypes(container)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
Value
The project container with separate scMinimal objects per cell type in the container$scMinimal_ctype slot
Computes singular-value decomposition on the tensor unfolded along the donor dimension
Description
Computes singular-value decomposition on the tensor unfolded along the donor dimension
Usage
pca_unfolded(container, ranks)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ranks |
numeric The number of factors to extract. Unlike with the Tucker decomposition, this should be a single number. |
Value
The project container with results of the decomposition in container$tucker_results. The results object is a list with the donor scores matrix in the first element and the unfolded loadings matrix in the second element.
Examples
test_container <- pca_unfolded(test_container, 2)
Plot a heatmap of differential genes. Code is adapted from Conos package. https://github.com/kharchenkolab/conos/blob/master/R/plot.R
Description
Plot a heatmap of differential genes. Code is adapted from Conos package. https://github.com/kharchenkolab/conos/blob/master/R/plot.R
Usage
plotDEheatmap_conos(
con,
groups,
container,
de = NULL,
min.auc = NULL,
min.specificity = NULL,
min.precision = NULL,
n.genes.per.cluster = 10,
additional.genes = NULL,
exclude.genes = NULL,
labeled.gene.subset = NULL,
expression.quantile = 0.99,
pal = (grDevices::colorRampPalette(c("dodgerblue1", "grey95", "indianred1")))(1024),
ordering = "-AUC",
column.metadata = NULL,
show.gene.clusters = TRUE,
remove.duplicates = TRUE,
column.metadata.colors = NULL,
show.cluster.legend = TRUE,
show_heatmap_legend = FALSE,
border = TRUE,
return.details = FALSE,
row.label.font.size = 10,
order.clusters = FALSE,
split = FALSE,
split.gap = 0,
cell.order = NULL,
averaging.window = 0,
...
)
Arguments
con |
conos (or p2) object |
groups |
groups in which the DE genes were determined (so that the cells can be ordered correctly) |
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
de |
differential expression result (list of data frames) |
min.auc |
optional minimum AUC threshold |
min.specificity |
optional minimum specificity threshold |
min.precision |
optional minimum precision threshold |
n.genes.per.cluster |
number of genes to show for each cluster |
additional.genes |
optional additional genes to include (the genes will be assigned to the closest cluster) |
exclude.genes |
an optional list of genes to exclude from the heatmap |
labeled.gene.subset |
a subset of gene names to show (instead of all genes). Can be a vector of gene names, or a number of top genes (in each cluster) to show the names for. |
expression.quantile |
expression quantile to show (0.98 by default) |
pal |
palette to use for the main heatmap |
ordering |
order by which the top DE genes (to be shown) are determined (default "-AUC") |
column.metadata |
additional column metadata, passed either as a data.frame with rows named as cells, or as a list of named cell factors. |
show.gene.clusters |
whether to show gene cluster color codes |
remove.duplicates |
remove duplicated genes (leaving them in just one of the clusters) |
column.metadata.colors |
a list of color specifications for additional column metadata, specified according to the HeatmapMetadata format. Use "clusters" slot to specify cluster colors. |
show.cluster.legend |
whether to show the cluster legend |
show_heatmap_legend |
whether to show the expression heatmap legend |
border |
show borders around the heatmap and annotations |
return.details |
if TRUE will return a list containing the heatmap (ha), but also raw matrix (x), expression list (expl) and other info to produce the heatmap on your own. |
row.label.font.size |
font size for the row labels |
order.clusters |
whether to re-order the clusters according to the similarity of the expression patterns (of the genes being shown) |
split |
logical If TRUE splits the heatmap by cell type (default=FALSE) |
split.gap |
numeric The distance to put in the gaps between split parts of the heatmap if split=TRUE (default=0) |
cell.order |
explicitly supply cell order |
averaging.window |
optional window averaging between neighboring cells within each group (turned off by default) - useful when very large number of cells shown (requires zoo package) |
... |
extra parameters are passed to pheatmap |
Value
ComplexHeatmap::Heatmap object (see return.details param for other output)
Plot matrix of donor scores extracted from Tucker decomposition
Description
Plot matrix of donor scores extracted from Tucker decomposition
Usage
plot_donor_matrix(
container,
meta_vars = NULL,
cluster_by_meta = NULL,
show_donor_ids = FALSE,
add_meta_associations = NULL,
show_var_explained = TRUE,
donors_sel = NULL,
h_w = NULL
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
meta_vars |
character Names of metadata variables to plot alongside the donor scores. Can include more than one variable. (default=NULL) |
cluster_by_meta |
character One metadata variable to cluster the heatmap by. If NULL, donor clustering is done using donor scores. (default=NULL) |
show_donor_ids |
logical Set to TRUE to show donor id as row name on the heamap (default=FALSE) |
add_meta_associations |
character Adds meta data associations with each factor as top annotation. These should be generated first with plot_meta_associations(). Set to 'pval' if used 'pval' in plot_meta_associations(), otherwise set to 'rsq'. If NULL, no annotation is added. (default=NULL) |
show_var_explained |
logical Set to TRUE to display the explained variance for each factor (default=TRUE) |
donors_sel |
character A vector of a subset of donors to include in the plot (default=NULL) |
h_w |
numeric Vector specifying height and width (defualt=NULL) |
Value
The project container with a heatmap plot of donor scores in container$plots$donor_matrix.
Examples
test_container <- plot_donor_matrix(test_container, show_donor_ids = TRUE)
Plot donor celltype/subtype proportions against each factor
Description
Plot donor celltype/subtype proportions against each factor
Usage
plot_donor_props(
donor_props,
donor_scores,
significance,
ctype_mapping = NULL,
stat_type = "adj_pval",
n_col = 2
)
Arguments
donor_props |
data.frame Donor proportions as output from compute_donor_props() |
donor_scores |
data.frame Donor scores from tucker results |
significance |
numeric F-Statistics as output from compute_associations() |
ctype_mapping |
character The cell types corresponding with columns of donor_props (default=NULL) |
stat_type |
character Either "fstat" to get F-Statistics, "adj_rsq" to get adjusted R-squared values, or "adj_pval" to get adjusted pvalues (default='adj_pval') |
n_col |
numeric The number of columns to organize the plots into (default=2) |
Value
A cowplot figure of ggplot objects for proportions of each cell type against donor factor scores for each factor.
Generate a gene by donor heatmap showing scaled expression of top loading genes for a given factor
Description
Generate a gene by donor heatmap showing scaled expression of top loading genes for a given factor
Usage
plot_donor_sig_genes(
container,
factor_select,
top_n_per_ctype,
ctypes_use = NULL,
show_donor_labels = FALSE,
additional_meta = NULL,
add_genes = NULL
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to query |
top_n_per_ctype |
numeric Vector of the number of top genes from each cell type to plot |
ctypes_use |
character The cell types for which to get the top genes to make callouts for. If NULL then uses all cell types. (default=NULL) |
show_donor_labels |
logical Set to TRUE to display donor labels (default=FALSE) |
additional_meta |
character Another meta variable to plot (default=NULL) |
add_genes |
character Additional genes to plot for all ctypes (default=NULL) |
Value
The project container with a heatmap plot in the slot container$plots$donor_sig_genes$<Factor#>. This heatmap shows scaled expression of top loading genes in each cell type for a selected factor.
Examples
test_container <- plot_donor_sig_genes(test_container, factor_select=1,
top_n_per_ctype=2)
Compute enrichment of donor metadata categorical variables at high/low factor scores
Description
Compute enrichment of donor metadata categorical variables at high/low factor scores
Usage
plot_dscore_enr(container, factor_use, meta_var)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_use |
numeric The factor to test |
meta_var |
character The name of the metadata variable to test |
Value
A cowplot figure of enrichment plots.
Examples
fig <- plot_dscore_enr(test_container, factor_use=1, meta_var='lanes')
Plot enriched gene sets from all cell types in a heatmap
Description
Plot enriched gene sets from all cell types in a heatmap
Usage
plot_gsea_hmap(container, factor_select, thresh)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to plot |
thresh |
numeric Pvalue threshold to use for including gene sets in the heatmap |
Value
A stacked heatmap object from ComplexHeatmap.
Plot already computed enriched gene sets to show semantic similarity between sets
Description
Plot already computed enriched gene sets to show semantic similarity between sets
Usage
plot_gsea_hmap_w_similarity(
container,
factor_select,
direc,
thresh,
exclude_words = character(0)
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to plot |
direc |
character Set to either 'up' or 'down' to use the appropriate sets |
thresh |
numeric Pvalue threshold to use for including gene sets in the heatmap |
exclude_words |
character Vector of words to exclude from word cloud (default=character(0)) |
Value
No value is returned. A heatmap showing enriched gene sets clustered by semantic similarity is drawn.
Look at enriched gene sets from a cluster of semantically similar gene sets. Uses the results from previous run of plot_gsea_hmap_w_similarity()
Description
Look at enriched gene sets from a cluster of semantically similar gene sets. Uses the results from previous run of plot_gsea_hmap_w_similarity()
Usage
plot_gsea_sub(container, clust_select, thresh = 0.05)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
clust_select |
numeric The cluster to plot gene sets from. On the previous semantic similarity plot, cluster numbering starts from the top as 1. |
thresh |
numeric Color threshold to use for showing significance (default=0.05) |
Value
A heatmap plot from ComplexHeatmap showing one semantic similarity cluster of enriched gene sets with adjusted p-values for each cell type.
Plot the gene by celltype loadings for a factor
Description
Plot the gene by celltype loadings for a factor
Usage
plot_loadings_annot(
container,
factor_select,
use_sig_only = FALSE,
nonsig_to_zero = FALSE,
annot = "none",
pathways = NULL,
sim_de_donor_group = NULL,
sig_thresh = 0.05,
display_genes = FALSE,
gene_callouts = FALSE,
callout_n_gene_per_ctype = 5,
callout_ctypes = NULL,
specific_callouts = NULL,
le_set_callouts = NULL,
le_set_colormap = NULL,
le_set_num_per = 5,
show_le_legend = FALSE,
show_xlab = TRUE,
show_var_explained = TRUE,
clust_method = "median",
h_w = NULL,
reset_other_factor_plots = FALSE,
draw_plot = TRUE
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to plot |
use_sig_only |
logical If TRUE, includes only significant genes from jackstraw in the heatmap. If FALSE, includes all the variable genes. (default = FALSE) |
nonsig_to_zero |
logical If TRUE, makes the loadings of all nonsignificant genes 0 (default=FALSE) |
annot |
character If set to "pathways" then creates an adjacent heatmap showing which genes are in which pathways. If set to "sig_genes" then creates an adjacent heatmap showing which genes were significant from jackstraw. If set to "none" no adjacent heatmap is plotted. (default="none") |
pathways |
character Gene sets to plot if annot is set to "pathways" (default=NULL) |
sim_de_donor_group |
numeric To plot the ground truth significant genes from a simulation next to the heatmap, put the number of the donor group that corresponds to the factor being plotted (default=NULL) |
sig_thresh |
numeric Pvalue significance threshold to use. If use_sig_only is TRUE the threshold is used as a cutoff for genes to include. If annot is "sig_genes" this value is used in the gene significance colormap as a minimum threshold. (default=0.05) |
display_genes |
logical If TRUE, displays the names of gene names (default=FALSE) |
gene_callouts |
logical If TRUE, then adds gene callout annotations to the heatmap (default=FALSE) |
callout_n_gene_per_ctype |
numeric To use if gene_callouts is TRUE. Sets the number of largest magnitude significant genes from each cell type to include in gene callouts. (default=5) |
callout_ctypes |
character To use if gene_callouts is TRUE. Specifies which cell types to get gene callouts for. If NULL, then gets gene callouts for largest magnitude significant genes for all cell types. (default=NULL) |
specific_callouts |
character A vector of gene names to show callouts for (default=NULL) |
le_set_callouts |
character Pass a vector of gene set names to show leading edge genes for a select set of gene sets (default=NULL) |
le_set_colormap |
character A named vector with names as gene sets and values as colors. If NULL, then selects first n colors of Set3 color palette. (default=NULL) |
le_set_num_per |
numeric The number of leading edge genes to show for each gene set (default=5) |
show_le_legend |
logical Set to TRUE to show the color map legend for leading edge genes (default=FALSE) |
show_xlab |
logical If TRUE, displays the xlabel 'genes' (default=TRUE) |
show_var_explained |
logical If TRUE then shows an anotation with the explained variance for each cell type (default=TRUE) |
clust_method |
character The hclust method to use for clustering rows (default='median') |
h_w |
numeric Vector specifying height and width (defualt=NULL) |
reset_other_factor_plots |
logical Set to TRUE to set all other loadings plots to NULL. Useful if run get_all_lds_factor_plots but then only want to show one or two plots. (default=FALSE) |
draw_plot |
logical Set to TRUE to show the plot. Plot is stored regardless. (default=TRUE) |
Value
The project container with a heatmap of loadings for one factor put in container$plots$all_lds_plots. The legend for the heatmap is put in container$plots$all_legends. Use draw(<hmap obj>,annotation_legend_list = <hmap legend obj>) to re-render the plot with legend.
Examples
test_container <- plot_loadings_annot(test_container, 1, display_genes=FALSE,
show_var_explained = TRUE)
Plot trio of associations between ligand expression, module eigengenes, and factor scores
Description
Plot trio of associations between ligand expression, module eigengenes, and factor scores
Usage
plot_mod_and_lig(container, factor_select, mod_ct, mod, lig_ct, lig)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor to use |
mod_ct |
character The name of the cell type for the corresponding module |
mod |
numeric The number of the corresponding module |
lig_ct |
character The name of the cell type where the ligand is expressed |
lig |
character The name of the ligand to use |
Value
A cowplot figure of ggplot objects for the three associations scatter plots.
Generate gene set x ct_module heatmap showing co-expression module gene set enrichment results
Description
Generate gene set x ct_module heatmap showing co-expression module gene set enrichment results
Usage
plot_multi_module_enr(
container,
ctypes,
modules,
sig_thresh = 0.05,
db_use = "TF",
max_plt_pval = 0.1,
h_w = NULL
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ctypes |
character A vector of cell type names corresponding to the module numbers in mod_select, specifying the modules to compute enrichment for |
modules |
numeric A vector of module numbers corresponding to the cell types in ctype, specifying the modules to compute enrichment for |
sig_thresh |
numeric P-value threshold for results to include. Only shows a given gene set if at least one module has a result lower than the threshold. (default=0.05) |
db_use |
character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", "BioCarta", "Hallmark", "TF", and "immuno". More than one database can be used. (default="GO") |
max_plt_pval |
max pvalue shown on plot, but not used to remove rows like sig_thresh (default=.1) |
h_w |
numeric Vector specifying height and width (defualt=NULL) |
Value
A ComplexHeatmap object of enrichment results.
Plot reconstruction errors as bar plot for svd method
Description
Plot reconstruction errors as bar plot for svd method
Usage
plot_rec_errors_bar_svd(real, shuffled, mode_to_show)
Arguments
real |
list The real reconstruction errors |
shuffled |
list The reconstruction errors under null model |
mode_to_show |
numeric The mode to plot the results for |
Value
A ggplot object showing the difference in reconstruction errors for successive factors.
Plot reconstruction errors as line plot for svd method
Description
Plot reconstruction errors as line plot for svd method
Usage
plot_rec_errors_line_svd(real, shuffled, mode_to_show)
Arguments
real |
list The real reconstruction errors |
shuffled |
list The reconstruction errors under null model |
mode_to_show |
numeric The mode to plot the results for |
Value
A ggplot object showing relative reconstruction errors.
Plot dotplots for each factor to compare donor scores between metadata groups
Description
Plot dotplots for each factor to compare donor scores between metadata groups
Usage
plot_scores_by_meta(container, meta_var)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
meta_var |
character The meta data variable to compare groups for |
Value
The project container with a figure of comparison plots (one for each factor) placed in container$plots$indv_meta_scores_associations.
Plot enrichment results for hand picked gene sets
Description
Plot enrichment results for hand picked gene sets
Usage
plot_select_sets(
container,
factors_all,
sets_plot,
color_sets = NULL,
cl_rows = FALSE,
h_w = NULL,
myfontsize = 8
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factors_all |
numeric Vector of one or more factor numbers to get plots for |
sets_plot |
character Vector of gene set names to show enrichment values for |
color_sets |
named character Values are colors corresponding to each set, with names as the gene set names (default=NULL) |
cl_rows |
logical Set to TRUE to cluster gene set results (default=FALSE) |
h_w |
numeric Vector specifying height and width (defualt=NULL) |
myfontsize |
numeric Gene set label fontsize (default=8) |
Value
A list with a ComplexHeatmap object of select enriched gene sets as the first element and with a legend object as the second element.
Generate a plot for either the donor scores or loadings stability test
Description
Generate a plot for either the donor scores or loadings stability test
Usage
plot_stability_results(container, plt_data)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
plt_data |
character Either 'lds' or 'dsc' and indicates which plot to make |
Value
the plot
Plot association significances for varying clustering resolutions
Description
Plot association significances for varying clustering resolutions
Usage
plot_subclust_associations(res, n_col = 2)
Arguments
res |
data.frame Regression statistics for each subcluster analysis |
n_col |
numeric The number of columns to organize the plots into (default=2) |
Value
A cowplot of ggplot objects showing statistics for regressions of proportions of each cell subtype (at varying clustering resolutions) against each factor.
Prepare data for LR analysis and get soft thresholds to use for gene modules
Description
Prepare data for LR analysis and get soft thresholds to use for gene modules
Usage
prep_LR_interact(
container,
lr_pairs,
norm_method = "trim",
scale_factor = 10000,
var_scale_power = 0.5,
batch_var = NULL
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
lr_pairs |
data.frame Data of ligand-receptor pairs. First column should be ligands and second column should be one or more receptors separated by an underscore such as receptor1_receptor2 in the case that multiple receptors are required for signaling. |
norm_method |
character The normalization method to use on the pseudobulked count data. Set to 'regular' to do standard normalization of dividing by library size. Set to 'trim' to use edgeR trim-mean normalization, whereby counts are divided by library size times a normalization factor. (default='trim') |
scale_factor |
numeric The number that gets multiplied by fractional counts during normalization of the pseudobulked data (default=10000) |
var_scale_power |
numeric Exponent of normalized variance that is used for variance scaling. Variance for each gene is initially set to unit variance across donors (for a given cell type). Variance for each gene is then scaled by multiplying the unit scaled values by each gene's normalized variance (where the effect of the mean-variance dependence is taken into account) to the exponent specified here. If NULL, uses var_scale_power from container$experiment_params. (default=.5) |
batch_var |
character A batch variable from metadata to remove (default=NULL) |
Value
The project container with added container$scale_pb_extra slot that contains the tensor with additional ligands and receptors. Also has container$no_scale_pb_extra slot with pseudobulked, normalized data that is not scaled.
Project multicellular patterns to get scores on new data
Description
Project multicellular patterns to get scores on new data
Usage
project_new_data(new_container, old_container)
Arguments
new_container |
environment A project container with new data to project scores for. The form_tensor() function should be run. |
old_container |
environment The original project container that has the multicellular gene expression patterns already extracted. These patterns will be projected onto the new data. |
Value
The new container environment object with projected scores in new_container$projected_scores. The factors will be ordered the same as the factors in old_container.
Gets a conos object of the data, aligning datasets across a specified variable such as batch or donors. This can be run independently or through get_subtype_prop_associations().
Description
Gets a conos object of the data, aligning datasets across a specified variable such as batch or donors. This can be run independently or through get_subtype_prop_associations().
Usage
reduce_dimensions(
container,
integration_var,
ncores = container$experiment_params$ncores
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
integration_var |
character The meta data variable to use for creating the joint embedding with Conos. |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
Value
The project container with a conos object in container$embedding.
Reduce each cell type's expression matrix to just the significantly variable genes. Generally, this should be done through calling the form_tensor() wrapper function.
Description
Reduce each cell type's expression matrix to just the significantly variable genes. Generally, this should be done through calling the form_tensor() wrapper function.
Usage
reduce_to_vargenes(container)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
Value
The project container with pseudobulked matrices reduced to only the most variable genes.
Create a figure of all loadings plots arranged
Description
Create a figure of all loadings plots arranged
Usage
render_multi_plots(container, data_type, max_cols = 3)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
data_type |
character Can be either "loadings", "gsea", or "dgenes". This determines which list of heatmaps to organize into the figure. |
max_cols |
numeric The max number of columns to plot. Can only either be 2 or 3 since these are large plots. (default=3) |
Value
The multi-plot figure.
Examples
test_container <- get_all_lds_factor_plots(test_container)
fig <- render_multi_plots(test_container, data_type='loadings')
Reshape loadings for a factor from linearized to matrix form
Description
Reshape loadings for a factor from linearized to matrix form
Usage
reshape_loadings(ldngs_row, genes, ctypes)
Arguments
ldngs_row |
numeric A vector of loadings values for one factor |
genes |
character The gene identifiers corresponding to each loading |
ctypes |
character The cell type corresponding to each loading |
Value
A loadings matrix with dimensions of genes by cell types.
Run fgsea for one cell type of one factor
Description
Run fgsea for one cell type of one factor
Usage
run_fgsea(
container,
factor_select,
ctype,
db_use = "GO",
signed = TRUE,
min_gs_size = 15,
max_gs_size = 500,
ncores = container$experiment_params$ncores
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor of interest |
ctype |
character The cell type of interest |
db_use |
character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", "BioCarta", and "Hallmark". More than one database can be used. (default="GO") |
signed |
logical If TRUE, uses signed gsea. If FALSE, uses unsigned gsea. Currently only works with fgsea method. (default=TRUE) |
min_gs_size |
numeric Minimum gene set size (default=15) |
max_gs_size |
numeric Maximum gene set size (default=500) |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
Value
A data.frame of the fgsea results for enrichment of gene sets in a given cell type for a given factor. The results contain adjusted p-values, normalized enrichment scores, leading edge genes, and other information output by fgsea.
Run gsea separately for all cell types of one specified factor and plot results
Description
Run gsea separately for all cell types of one specified factor and plot results
Usage
run_gsea_one_factor(
container,
factor_select,
method = "fgsea",
thresh = 0.05,
db_use = "GO",
signed = TRUE,
min_gs_size = 15,
max_gs_size = 500,
reset_other_factor_plots = FALSE,
draw_plot = TRUE,
ncores = container$experiment_params$ncores
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor of interest |
method |
character The method of gsea to use. Can either be "fgsea", "fgsea_special or "hypergeometric". (default="fgsea") |
thresh |
numeric Pvalue significance threshold to use. Will include gene sets in resulting heatmap if pvalue is below this threshold for at least one cell type. (default=0.05) |
db_use |
character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", and "BioCarta". More than one database can be used. (default="GO") |
signed |
logical If TRUE, uses signed gsea. If FALSE, uses unsigned gsea. Currently only works with fgsea method (default=TRUE) |
min_gs_size |
numeric Minimum gene set size (default=15) |
max_gs_size |
numeric Maximum gene set size (default=500) |
reset_other_factor_plots |
logical Set to TRUE to set all other gsea plots to NULL (default=FALSE) |
draw_plot |
logical Set to TRUE to show the plot. Plot is stored regardless. (default=TRUE) |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
Value
A stacked heatmap plot of the gsea results in the slot container$plots$gsea$<Factor#>. The heatmaps show adjusted p-values for the enrichment of each gene set in each cell type for the selected factor. The top heatmap shows enriched gene sets among the positive loading genes and the bottom heatmap shows enriched gene sets among the negative loading genes for the factor.
Examples
test_container <- run_gsea_one_factor(test_container, factor_select=1,
method="fgsea", thresh=0.05, db_use="Hallmark", signed=TRUE)
Compute enriched gene sets among significant genes in a cell type for a factor using hypergeometric test
Description
Compute enriched gene sets among significant genes in a cell type for a factor using hypergeometric test
Usage
run_hypergeometric_gsea(
container,
factor_select,
ctype,
up_down,
thresh = 0.05,
min_gs_size = 15,
max_gs_size = 500,
db_use = "GO"
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
factor_select |
numeric The factor of interest |
ctype |
character The cell type of interest |
up_down |
character Either "up" to compute enrichment among the significant positive loading genes or "down" to compute enrichment among the significant negative loading genes. |
thresh |
numeric Pvalue significance threshold. Used as cutoff for calling genes as significant to use for enrichment tests. (default=0.05) |
min_gs_size |
numeric Minimum gene set size (default=15) |
max_gs_size |
numeric Maximum gene set size (default=500) |
db_use |
character The database of gene sets to use. Database options include "GO", "Reactome", "KEGG", and "BioCarta". More than one database can be used. (default="GO") |
Value
A vector of adjusted p-values for enrichment of gene sets in the significant genes of a given cell type in a given factor.
Run jackstraw to get genes that are significantly associated with donor scores for factors extracted by Tucker decomposition
Description
Run jackstraw to get genes that are significantly associated with donor scores for factors extracted by Tucker decomposition
Usage
run_jackstraw(
container,
ranks,
n_fibers = 100,
n_iter = 500,
tucker_type = "regular",
rotation_type = "hybrid",
seed = container$experiment_params$rand_seed,
ncores = container$experiment_params$ncores
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ranks |
numeric The number of donor ranks and gene ranks to decompose to using Tucker decomposition |
n_fibers |
numeric The number of fibers the randomly shuffle in each iteration (default=100) |
n_iter |
numeric The number of shuffling iterations to complete (default=500) |
tucker_type |
character Set to 'regular' to run regular tucker or to 'sparse' to run tucker with sparsity constraints (default='regular') |
rotation_type |
character Set to 'hybrid' to perform hybrid rotation on resulting donor factor matrix and loadings. Otherwise set to 'ica_lds' to perform ica rotation on loadings or ica_dsc to perform ica on donor scores. (default='hybrid') |
seed |
numeric Seed passed to set.seed() (default=container$experiment_params$rand_seed) |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
Value
The project container with a vector of adjusted pvalues in container$gene_score_associations.
Examples
test_container <- run_jackstraw(test_container, ranks=c(2,4), n_fibers=2, n_iter=10,
tucker_type='regular', rotation_type='hybrid', ncores=1)
Test stability of a decomposition by subsampling or bootstrapping donors. Note that running this function will replace the decomposition in the project container with one resulting from the tucker parameters entered here.
Description
Test stability of a decomposition by subsampling or bootstrapping donors. Note that running this function will replace the decomposition in the project container with one resulting from the tucker parameters entered here.
Usage
run_stability_analysis(
container,
ranks,
tucker_type = "regular",
rotation_type = "hybrid",
subset_type = "subset",
sub_prop = 0.75,
n_iterations = 100,
ncores = container$experiment_params$ncores
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ranks |
numeric The number of donor, gene, and cell type ranks, respectively, to decompose to using Tucker decomposition. |
tucker_type |
character The 'regular' type is the only one implemented with sparsity constraints (default='regular') |
rotation_type |
character Set to 'hybrid' to optimize loadings via our hybrid method (see paper for details). Set to 'ica_dsc' to perform ICA rotation on resulting donor factor matrix. Set to 'ica_lds' to optimize loadings by the ICA rotation. (default='hybrid') |
subset_type |
character Set to either 'subset' or 'bootstrap' (default='subset') |
sub_prop |
numeric The proportion of donors to keep when using subset_type='subset' (default=.75) |
n_iterations |
numeric The number of iterations to perform (default=100) |
ncores |
numeric The number of cores to use (default=container$experiment_params$ncores) |
Value
The project container with the donor scores stability plot in container$plots$stability_plot_dsc and the loadings stability plot in container$plots$stability_plot_lds
Examples
test_container <- run_stability_analysis(test_container, ranks=c(2,4),
tucker_type='regular', rotation_type='hybrid', subset_type='subset',
sub_prop=0.75, n_iterations=5, ncores=1)
Run the Tucker decomposition and rotate the factors
Description
Run the Tucker decomposition and rotate the factors
Usage
run_tucker_ica(
container,
ranks,
tucker_type = "regular",
rotation_type = "hybrid"
)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ranks |
numeric The number of donor factors and gene factors, respectively, to decompose the data into. Since we rearrange the standard output of the Tucker decomposition to be 'donor centric', the number of donor factors will also be the total number of main factors that can be used for downstream analysis. The number of gene factors will only impact the quality of the decomposition. |
tucker_type |
character The 'regular' type is the only one currently implemented |
rotation_type |
character Set to 'hybrid' to optimize loadings via our hybrid method (see paper for details). Set to 'ica_dsc' to perform ICA rotation on resulting donor factor matrix. Set to 'ica_lds' to optimize loadings by the ICA rotation. (default='hybrid') |
Value
The project container with results of the decomposition in container$tucker_results. The results object is a list with the donor scores matrix in the first element and the unfolded loadings matrix in the second element.
Examples
test_container <- run_tucker_ica(test_container,ranks=c(2,4))
Get a list of tensor fibers to shuffle
Description
Get a list of tensor fibers to shuffle
Usage
sample_fibers(tensor_data, n_fibers)
Arguments
tensor_data |
list The tensor data including donor, gene, and cell type labels as well as the tensor array itself |
n_fibers |
numeric The number of fibers to get |
Value
A list of gene and cell type indices for the randomly selected fibers
Scale font size. From simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R
Description
Scale font size. From simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/ht_clusters.R
Usage
scale_fontsize(x, rg = c(1, 30), fs = c(4, 16))
Arguments
x |
A numeric vector. |
rg |
The range. |
fs |
Range of the font size. |
Value
A numeric vector.
Scale variance across donors for each gene within each cell type. Generally, this should be done through calling the form_tensor() wrapper function.
Description
Scale variance across donors for each gene within each cell type. Generally, this should be done through calling the form_tensor() wrapper function.
Usage
scale_variance(container, var_scale_power)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
var_scale_power |
numeric Exponent of normalized variance that is used for variance scaling. Variance for each gene is initially set to unit variance across donors (for a given cell type). Variance for each gene is then scaled by multiplying the unit scaled values by each gene's normalized variance (where the effect of the mean-variance dependence is taken into account) to the exponent specified here. If NULL, uses var_scale_power from container$experiment_params. |
Value
The project container with the variance altered for each gene within the pseudobulked matrices for each cell type.
Convert Seurat object to scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.
Description
Convert Seurat object to scMinimal object. Generally, this should be done through calling the make_new_container() wrapper function.
Usage
seurat_to_scMinimal(seurat_obj, metadata_cols = NULL, metadata_col_nm = NULL)
Arguments
seurat_obj |
Seurat object that has been cleaned and includes the normalized, log-transformed counts. The meta.data should include a column with the header 'sex' and values of 'M' or 'F' if available. The metadata should also have a column with the header 'ctypes' with the corresponding names of the cell types as well as a column with header 'donors' that contains identifiers for each donor. |
metadata_cols |
character The names of the metadata columns to use (default=NULL) |
metadata_col_nm |
character New names for the selected metadata columns if wish to change their names. If NULL, then the preexisting column names are used. (default=NULL) |
Value
An scMinimal object holding counts and metadata for a project.
Shuffle elements within the selected fibers
Description
Shuffle elements within the selected fibers
Usage
shuffle_fibers(tensor_data, s_fibers)
Arguments
tensor_data |
list The tensor data including donor, gene, and cell type labels as well as the tensor array itself |
s_fibers |
list Gene and cell type indices for the randomly selected fibers |
Value
The tensor_data object with the values for the selected fibers shuffled.
Create the tensor object by stacking each pseudobulk cell type matrix. Generally, this should be done through calling the form_tensor() wrapper function.
Description
Create the tensor object by stacking each pseudobulk cell type matrix. Generally, this should be done through calling the form_tensor() wrapper function.
Usage
stack_tensor(container)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
Value
The project container with the list of tensor data in container$tensor_data.
Helper function from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/utils.R
Description
Helper function from simplifyEnrichment package. https://github.com/jokergoo/simplifyEnrichment/blob/master/R/utils.R
Usage
stop_wrap(...)
Arguments
... |
other parameters |
Value
No value is returned.
Subset an scMinimal object by specified genes, donors, cells, or cell types
Description
Subset an scMinimal object by specified genes, donors, cells, or cell types
Usage
subset_scMinimal(
scMinimal,
ctypes_use = NULL,
cells_use = NULL,
donors_use = NULL,
genes_use = NULL,
in_place = TRUE
)
Arguments
scMinimal |
environment A sub-container for the project typically consisting of gene expression data in its raw and processed forms as well as metadata |
ctypes_use |
character The cell types to keep (default=NULL) |
cells_use |
character Cell barcodes for the cells to keep (default=NULL) |
donors_use |
character The donors to keep (default=NULL) |
genes_use |
character The genes to keep (default=NULL) |
in_place |
logical If set to TRUE then replaces the input object with the new subsetted object (default=TRUE) |
Value
A subsetted scMinimal object.
Examples
cell_names <- colnames(test_container$scMinimal_full$count_data)
cells_sub <- sample(cell_names,40)
scMinimal <- subset_scMinimal(test_container$scMinimal_full,
cells_use=cells_sub)
Data container for testing tensor formation steps
Description
Data container for testing tensor formation steps
Usage
test_container
Format
An object of class environment
of length 10.
Helper function for running the decomposition. Use the run_tucker_ica() wrapper function instead.
Description
Helper function for running the decomposition. Use the run_tucker_ica() wrapper function instead.
Usage
tucker_ica_helper(
tensor_data,
ranks,
tucker_type,
rotation_type,
projection_container = NULL
)
Arguments
tensor_data |
list The tensor data including donor, gene, and cell type labels as well as the tensor array itself |
ranks |
numeric The number of donor and gene factors respectively, to decompose to using Tucker decomposition. |
tucker_type |
character The 'regular' type is the only one currently implemented |
rotation_type |
character Set to 'hybrid' to optimize loadings via our hybrid method (see paper for details). Set to 'ica_dsc' to perform ICA rotation on resulting donor factor matrix. Set to 'ica_lds' to optimize loadings by the ICA rotation. |
projection_container |
environment A project container to store projection data in. Currently only implemented for 'hybrid' and 'ica_dsc' rotations. (default=NULL) |
Value
The list of results for tucker decomposition with donor scores matrix in first element and loadings matrix in second element.
Update any of the experiment-wide parameters
Description
Update any of the experiment-wide parameters
Usage
update_params(container, ctypes_use = NULL, ncores = NULL, rand_seed = NULL)
Arguments
container |
environment Project container that stores sub-containers for each cell type as well as results and plots from all analyses |
ctypes_use |
character Names of the cell types to use for the analysis (default=NULL) |
ncores |
numeric Number of cores to use (default=NULL) |
rand_seed |
numeric Random seed to use (default=NULL) |
Value
The project container with updated experiment parameters in container$experiment_params.
Examples
test_container <- update_params(test_container, ncores=1)
Compute significantly variable genes via anova. Generally, this should be done through calling the form_tensor() wrapper function.
Description
Compute significantly variable genes via anova. Generally, this should be done through calling the form_tensor() wrapper function.
Usage
vargenes_anova(scMinimal, ncores)
Arguments
scMinimal |
environment A sub-container for the project typically consisting of gene expression data in its raw and processed forms |
ncores |
numeric Number of cores to use |
Value
A list of raw p-values for each gene.