Type: | Package |
Title: | Implementation of 'scSorter' Algorithm |
Version: | 0.0.2 |
Description: | Implements the algorithm described in Guo, H., and Li, J., "scSorter: assigning cells to known cell types according to known marker genes". Cluster cells to known cell types based on marker genes specified for each cell type. |
Depends: | R (≥ 3.6.0) |
Imports: | stats (≥ 3.6.0) |
License: | GPL-3 |
LazyData: | true |
RoxygenNote: | 7.1.1 |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2021-03-17 01:58:59 UTC; roder |
Author: | Hongyu Guo [aut], Jun Li [aut, cre] |
Maintainer: | Jun Li <jun.li@nd.edu> |
Repository: | CRAN |
Date/Publication: | 2021-03-17 06:40:02 UTC |
Cost Function
Description
Calculates the cost.
Usage
cost_func(dat, clus, mu, designmat)
Arguments
dat |
A matrix of input data. |
clus |
A vector of predicted cell types. |
mu |
Parameter estimates from |
designmat |
An indicator variable matrix records specified marker genes of each cell type. |
Preprocess Data
Description
This function validates and preprocesses the input data for the downstream analysis.
Usage
data_preprocess(expr, anno_processed)
Arguments
expr |
A matrix of input data. Each row represents a gene and each column represents a cell. |
anno_processed |
A list of processed annotation information that consists of the design matrix and the weight matrix for marker genes. |
Value
A list contains processed expression matrix, design matrix, and weight matrix.
Design Matrix Builder
Description
Builds the design matrix required by update_func
based on user input.
Usage
design_matrix_builder(anno, weight)
Arguments
anno |
A matrix or data frame that contains marker genes specified for cell types of interest. |
weight |
The default weight assigned to marker genes. |
Value
A list contains processed design matrix and weight matrix.
scSorter
Description
This is the main function that implements the scSorter method.
Usage
scSorter(
expr,
anno,
default_weight = 2,
n_start = 10,
alpha = 0,
u = 0.05,
max_iter = 100,
setseed = 0
)
Arguments
expr |
A matrix of the input expression data. Each row represents a gene and each column represents a cell. Each row of this matrix should be named by the gene name it represents. |
anno |
A matrix or data frame that contains marker genes specified for cell types of interest.
It should contain three columns named "Type", "Marker", and "Weight" that records the name and weight of marker genes specified for each cell type.
"Weight" column is optional. If it is not specified, the |
default_weight |
The default weight assigned to marker genes. The default value is 2. |
n_start |
The number of possible cluster initializations. The default value is 10. |
alpha |
The parameter determines the cutoff whether the cell type of a cell should be considered as undecided during unknown cell calling. The default value is 0. |
u |
The parameter determines whether undecided cells are further processed. The default value is 0.05. |
max_iter |
The maximum number of iterations for the algorithm to update parameters. The default value is 100. |
setseed |
Random seed for cluster initialization. The default value is 0. |
Value
A list contains the elements:
Pred_Type
: The predicted cell types.
Pred_param
: The parameter estimates of mu
and delta
.
Examples
load(system.file('extdata', 'example_data.RData', package = 'scSorter'))
result = scSorter(expr, anno)
misclassification_rate = 1 - mean(result$Pred_Type == true_type)
table(result$Pred_Type, true_type)
Update Cluster
Description
Updates cluster assignments based on center estimates from update_mu
Usage
update_C(dat, mu_mat, designmat)
Arguments
dat |
A matrix of input data. |
mu_mat |
Center estimates from |
designmat |
An indicator variable matrix records specified marker genes of each cell type. |
Update Function
Description
Implements the scSorter method by iteratively running update_mu
and update_C
.
Usage
update_func(
dat,
design_mat,
weightmat,
unknown_threshold1 = 0,
unknown_threshold2 = 0.05,
max_iter = 100
)
Arguments
dat |
A matrix of input data. |
design_mat |
An indicator variable matrix records specified marker genes of each cell type. |
weightmat |
A matrix of weights assigned to each marker gene. |
unknown_threshold1 |
The parameter determines undecided cells cutoff. The default value is 0. |
unknown_threshold2 |
The parameter determines whether undecided cells are further processed. The default value is 0.05. |
max_iter |
The maximum number of iterations for the algorithm to update parameters. The default value is 100. |
Value
A list contains parameter estimates, type assignments, and the corresponding cost.
Mu Update
Description
Solves mu and delta given sample cluster assignment.
Usage
update_mu(dat, designmat, clus)
Arguments
dat |
A matrix of input data. |
designmat |
An indicator variable matrix records marker genes of each pre-specified cell type. |
clus |
A vector of cluster assignment. |
Value
A matrix of parameter estimates.
Select Highly Variable Genes
Description
Select Highly Variable Genes following the vst approach. Please only use this function when you do not have access to Seurat package. More details are available in the vignette of this package.
Usage
xfindvariable_genes(expr, ngenes = 2000)
Arguments
expr |
A matrix of input scRNA-seq data. Rows correspond to genes and columns correpond to cells. |
ngenes |
The number of most variable genes to be selected. |
Value
A vector of top highly variable genes with the total number determined by @ngenes option.
Normalize scRNA-seq Data
Description
Normalize scRNA-seq data. Please only use this function when you do not have access to Seurat package. More details are available in the vignette of this package.
Usage
xnormalize_scData(expr)
Arguments
expr |
A matrix of input scRNA-seq data. Rows correspond to genes and columns correpond to cells. |
Value
A matrix of normalized expression data.