Type: Package
Title: Implementation of 'scSorter' Algorithm
Version: 0.0.2
Description: Implements the algorithm described in Guo, H., and Li, J., "scSorter: assigning cells to known cell types according to known marker genes". Cluster cells to known cell types based on marker genes specified for each cell type.
Depends: R (≥ 3.6.0)
Imports: stats (≥ 3.6.0)
License: GPL-3
LazyData: true
RoxygenNote: 7.1.1
Suggests: knitr, rmarkdown
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2021-03-17 01:58:59 UTC; roder
Author: Hongyu Guo [aut], Jun Li [aut, cre]
Maintainer: Jun Li <jun.li@nd.edu>
Repository: CRAN
Date/Publication: 2021-03-17 06:40:02 UTC

Cost Function

Description

Calculates the cost.

Usage

cost_func(dat, clus, mu, designmat)

Arguments

dat

A matrix of input data.

clus

A vector of predicted cell types.

mu

Parameter estimates from update_mu.

designmat

An indicator variable matrix records specified marker genes of each cell type.


Preprocess Data

Description

This function validates and preprocesses the input data for the downstream analysis.

Usage

data_preprocess(expr, anno_processed)

Arguments

expr

A matrix of input data. Each row represents a gene and each column represents a cell.

anno_processed

A list of processed annotation information that consists of the design matrix and the weight matrix for marker genes.

Value

A list contains processed expression matrix, design matrix, and weight matrix.


Design Matrix Builder

Description

Builds the design matrix required by update_func based on user input.

Usage

design_matrix_builder(anno, weight)

Arguments

anno

A matrix or data frame that contains marker genes specified for cell types of interest.

weight

The default weight assigned to marker genes.

Value

A list contains processed design matrix and weight matrix.


scSorter

Description

This is the main function that implements the scSorter method.

Usage

scSorter(
  expr,
  anno,
  default_weight = 2,
  n_start = 10,
  alpha = 0,
  u = 0.05,
  max_iter = 100,
  setseed = 0
)

Arguments

expr

A matrix of the input expression data. Each row represents a gene and each column represents a cell. Each row of this matrix should be named by the gene name it represents.

anno

A matrix or data frame that contains marker genes specified for cell types of interest. It should contain three columns named "Type", "Marker", and "Weight" that records the name and weight of marker genes specified for each cell type. "Weight" column is optional. If it is not specified, the default_weight will be applied to all marker genes.

default_weight

The default weight assigned to marker genes. The default value is 2.

n_start

The number of possible cluster initializations. The default value is 10.

alpha

The parameter determines the cutoff whether the cell type of a cell should be considered as undecided during unknown cell calling. The default value is 0.

u

The parameter determines whether undecided cells are further processed. The default value is 0.05.

max_iter

The maximum number of iterations for the algorithm to update parameters. The default value is 100.

setseed

Random seed for cluster initialization. The default value is 0.

Value

A list contains the elements: Pred_Type: The predicted cell types. Pred_param: The parameter estimates of mu and delta.

Examples

load(system.file('extdata', 'example_data.RData', package = 'scSorter'))
result = scSorter(expr, anno)
misclassification_rate = 1 - mean(result$Pred_Type == true_type)
table(result$Pred_Type, true_type)


Update Cluster

Description

Updates cluster assignments based on center estimates from update_mu

Usage

update_C(dat, mu_mat, designmat)

Arguments

dat

A matrix of input data.

mu_mat

Center estimates from update_mu

designmat

An indicator variable matrix records specified marker genes of each cell type.


Update Function

Description

Implements the scSorter method by iteratively running update_mu and update_C.

Usage

update_func(
  dat,
  design_mat,
  weightmat,
  unknown_threshold1 = 0,
  unknown_threshold2 = 0.05,
  max_iter = 100
)

Arguments

dat

A matrix of input data.

design_mat

An indicator variable matrix records specified marker genes of each cell type.

weightmat

A matrix of weights assigned to each marker gene.

unknown_threshold1

The parameter determines undecided cells cutoff. The default value is 0.

unknown_threshold2

The parameter determines whether undecided cells are further processed. The default value is 0.05.

max_iter

The maximum number of iterations for the algorithm to update parameters. The default value is 100.

Value

A list contains parameter estimates, type assignments, and the corresponding cost.


Mu Update

Description

Solves mu and delta given sample cluster assignment.

Usage

update_mu(dat, designmat, clus)

Arguments

dat

A matrix of input data.

designmat

An indicator variable matrix records marker genes of each pre-specified cell type.

clus

A vector of cluster assignment.

Value

A matrix of parameter estimates.


Select Highly Variable Genes

Description

Select Highly Variable Genes following the vst approach. Please only use this function when you do not have access to Seurat package. More details are available in the vignette of this package.

Usage

xfindvariable_genes(expr, ngenes = 2000)

Arguments

expr

A matrix of input scRNA-seq data. Rows correspond to genes and columns correpond to cells.

ngenes

The number of most variable genes to be selected.

Value

A vector of top highly variable genes with the total number determined by @ngenes option.


Normalize scRNA-seq Data

Description

Normalize scRNA-seq data. Please only use this function when you do not have access to Seurat package. More details are available in the vignette of this package.

Usage

xnormalize_scData(expr)

Arguments

expr

A matrix of input scRNA-seq data. Rows correspond to genes and columns correpond to cells.

Value

A matrix of normalized expression data.