Type: | Package |
Title: | Dynamic Slicing |
Version: | 1.2.2 |
Date: | 2023-11-22 |
Maintainer: | Chao Ye <yechao1009@gmail.com> |
Description: | Dynamic slicing is a method designed for dependency detection between a categorical variable and a continuous variable. It could be applied for non-parametric hypothesis testing and gene set enrichment analysis. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Depends: | R (≥ 2.10), stats, utils, Rcpp (≥ 0.11.1), ggplot2 (≥ 0.9.3.1), scales |
LinkingTo: | Rcpp |
NeedsCompilation: | yes |
Packaged: | 2023-11-21 16:53:11 UTC; yechao |
Author: | Chao Ye [aut, cre], Bo Jiang [aut] |
Repository: | CRAN |
Date/Publication: | 2023-11-22 01:10:02 UTC |
Dependency and conditional dependency detection between a level k
(k > 1
) categorical variable and a continuous variable via Bayes factor.
Description
Conditional dependency detection between a level k_x
(k_x > 1
) categorical variable x
and a continuous variable y
via Bayes factor given a level k_z
categorical variable z
. If k_z = 1
, it is unconditional dependency detection method. It could be applied for non-parametric variable selecltion.
Usage
bfslice_c(z, x, zdim, xdim, lambda, alpha)
Arguments
z |
Vector: observations of given (preselected) categorical variable, |
x |
Vector: observations of categorical variable, |
zdim |
Level of |
xdim |
Level of |
lambda |
|
alpha |
|
Value
Value of Bayes factor (nonnegative). Bayes factor could be treated as a statistic and one can take some threshold then calculates the corresponded Type I error rate. One can also take the value of Bayes factor for judgement.
References
Jiang, B., Ye, C. and Liu, J.S. Bayesian nonparametric tests via sliced inverse modeling. Bayesian Analysis, 12(1): 89-112, 2017.
See Also
Examples
n <- 100
mu <- 0.5
## Unconditional test
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))
x <- c(rep(0, n), rep(1, n))
z <- rep(0, 2*n)
## Conditional test
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))
x <- c(rep(0, n/5), rep(1, n), rep(0, 4*n/5))
z <- c(rep(0, n), rep(1, n))
z <- z[order(y)]
x <- x[order(y)]
zdim <- max(z) + 1
xdim <- max(x) + 1
lambda <- 1.0
alpha <- 1.0
bfval <- bfslice_c(z, x, zdim, xdim, lambda, alpha)
Dependency and conditional dependency detection between a level k
(k > 1
) categorical variable and a continuous variable via Bayes factor.
Description
Conditional dependency detection between a level k_x
(k_x > 1
) categorical variable x
and a continuous variable y
via Bayes factor given a level k_z
categorical variable z
with O(n^{1/2})
-resolution. The basic idea is almost the same as bfslice_c
. The only different is that bfslice_eqp_c
groups samples into approximate O(n^{1/2})
groups which contain approximate O(n^{1/2})
samples and treat the groups as a sample to calculate Bayes facor. If k_z = 1
, it is unconditional dependency detection method. It could be applied for non-parametric variable selecltion.
Usage
bfslice_eqp_c(z, x, zdim, xdim, lambda, alpha)
Arguments
z |
Vector: observations of given (preselected) categorical variable, |
x |
Vector: observations of categorical variable, |
zdim |
Level of |
xdim |
Level of |
lambda |
|
alpha |
|
Value
Value of Bayes factor (nonnegative). Bayes factor could be treated as a statistic and one can take some threshold then calculates the corresponded Type I error rate. One can also take the value of Bayes factor for judgement.
References
Jiang, B., Ye, C. and Liu, J.S. Bayesian nonparametric tests via sliced inverse modeling. Bayesian Analysis, 12(1): 89-112, 2017.
See Also
Examples
n <- 1000
mu <- 0.2
## Unconditional test
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))
x <- c(rep(0, n), rep(1, n))
z <- rep(0, 2*n)
## Conditional test
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))
x <- c(rep(0, n/5), rep(1, n), rep(0, 4*n/5))
z <- c(rep(0, n), rep(1, n))
z <- z[order(y)]
x <- x[order(y)]
zdim <- max(z) + 1
xdim <- max(x) + 1
lambda <- 1.0
alpha <- 1.0
bfval <- bfslice_eqp_c(z, x, zdim, xdim, lambda, alpha)
Dependency detection between a level k
(k > 1
) categorical variable and a continuous variable via Bayes factor with given size of each group.
Description
Dependency detection between a level k
(k > 1
) categorical variable x
and a continuous variable y
via Bayes factor with O(n^{1/2})
-resolution. The basic idea is almost the same as bfslice_u
. The only different is that bfslice_eqp_u
groups samples into approximate O(n^{1/2})
groups which contain approximate O(n^{1/2})
samples and treat the groups as a sample to calculate Bayes facor.
Usage
bfslice_eqp_u(x, dim, lambda, alpha)
Arguments
x |
Vector: observations of categorical variable, |
dim |
Level of |
lambda |
|
alpha |
|
Value
Value of Bayes factor (nonnegative). Bayes factor could be treated as a statistic and one can take some threshold then calculates the corresponded Type I error rate. One can also take the value of Bayes factor for judgement.
References
Jiang, B., Ye, C. and Liu, J.S. Bayesian nonparametric tests via sliced inverse modeling. Bayesian Analysis, 12(1): 89-112, 2017.
See Also
Examples
n <- 1000
mu <- 0.2
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))
x <- c(rep(0, n), rep(1, n))
x <- x[order(y)]
dim <- max(x) + 1
lambda <- 1.0
alpha <- 1.0
bfval <- bfslice_eqp_u(x, dim, lambda, alpha)
Dependency detection between a level k
(k > 1
) categorical variable and a continuous variable via Bayes factor.
Description
Dependency detection between a level k
(k > 1
) categorical variable x
and a continuous variable y
via Bayes factor.
Usage
bfslice_u(x, dim, lambda, alpha)
Arguments
x |
Vector: observations of categorical variable, |
dim |
Level of |
lambda |
|
alpha |
|
Value
Value of Bayes factor (nonnegative). Bayes factor could be treated as a statistic and one can take some threshold then calculates the corresponded Type I error rate. One can also take the value of Bayes factor for judgement.
References
Jiang, B., Ye, C. and Liu, J.S. Bayesian nonparametric tests via sliced inverse modeling. Bayesian Analysis, 12(1): 89-112, 2017.
See Also
Examples
n <- 100
mu <- 0.5
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))
x <- c(rep(0, n), rep(1, n))
x <- x[order(y)]
dim <- max(x) + 1
lambda <- 1.0
alpha <- 1.0
bfval <- bfslice_u(x, dim, lambda, alpha)
Non-parametric one-sample hypothesis testing via dynamic slicing
Description
Non-parametric one-sample hypothesis testing via dynamic slicing. By mapping sample values to the quantile of null distribution, ds_1
test whether they follow uniform distribution on [0, 1] via a regularized likelihood-ratio. Its calculated is based on a dynamic programming procedure.
Usage
ds_1(y, lambda, alpha)
Arguments
y |
Vector: quantiles of observations according to null distribution. |
lambda |
|
alpha |
|
Value
Value of dynamic slicing statistic for one-sample test. It is nonnegative. The null hypothesis that observations are from the null distribution is rejected if this statistic is greater than zero, otherwise accept the null hypothesis.
See Also
Examples
n <- 100
mu <- 0.5
x <- rnorm(n, mu, 1)
y <- pnorm(sort(x), 0, 1)
lambda <- 1.0
alpha <- 1.0
dsres <- ds_1(y, lambda, alpha)
Non-parametric one-sample hypothesis testing via dynamic slicing
Description
Non-parametric one-sample hypothesis testing via dynamic slicing with O(n)
-resolution. The basic idea of ds_eqp_1
is almost the same as ds_1
. Difference between these two functions is that ds_eqp_1
considers an equal partition on [0, 1] but ds_1
does not. Candidate slicing boundaries in ds_eqp_1
only depend on the total number of samples and are unrelated to sample quantiles. In ds_1
they are immediately to the left or right of sample quantile.
Usage
ds_eqp_1(y, lambda)
Arguments
y |
Vector: quantiles of observations according to null distribution. |
lambda |
|
Value
Value of dynamic slicing statistic for one-sample test. It is nonnegative. The null hypothesis that observations are from the null distribution is rejected if this statistic is greater than zero, otherwise accept the null hypothesis.
See Also
ds_1
.
Examples
n <- 100
mu <- 0.5
x <- rnorm(n, mu, 1)
y <- pnorm(sort(x), 0, 1)
lambda <- 1.0
dsres <- ds_eqp_1(y, lambda)
Dependency detection between level k
(k > 1
) categorical variable and continuous variable
Description
Dependency detection between level k
(k > 1
) categorical variable and continuous variable via dynamic slicing with O(n^{1/2})
-resolution. The basic idea is almost the same as ds_k
. The only different is that ds_eqp_k
groups samples into approximate O(n^{1/2})
groups which contain approximate O(n^{1/2})
samples and performs dynamic slicing on their boundaries. This much faster version could reduce computation time substantially without too much power loss. Based on the strategy of ds_eqp_k
, we recommend to apply it in large sample size problem and use ds_k
for ordinary problem. For more details please refer to Jiang, Ye & Liu (2015). Results contains value of dynamic slicing statistic and slicing strategy. It could be applied for non-parametric K-sample hypothesis testing.
Usage
ds_eqp_k(x, xdim, lambda, slice = FALSE)
Arguments
x |
Vector: observations of categorical variable, |
xdim |
Level of |
lambda |
Penalty for introducing an additional slice, which is used to avoid making too many slices. It corresponds to the type I error under the scenario that the two variables are independent. |
slice |
Indicator for reporting slicing strategy or not. |
Value
dsval |
Value of dynamic slicing statistic. It is nonnegative. If it equals zero, the categorical variable and continuous variable will be treated as independent of each other, otherwise they will be treated as dependent. |
slices |
Slicing strategy that maximize dynamic slicing statistic based on currently ranked vector |
References
Jiang, B., Ye, C. and Liu, J.S. Non-parametric K-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510): 642-653, 2015.
See Also
ds_k
.
Examples
n <- 100
mu <- 0.5
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))
x <- c(rep("1", n), rep("2", n))
x <- relabel(x)
x <- x[order(y)]
xdim <- max(x) + 1
lambda <- 1.0
dsres <- ds_eqp_k(x, xdim, lambda, slice = TRUE)
Gene set analysis via dynamic slicing
Description
Gene set analysis via dynamic slicing.
Usage
ds_gsa(expdat, geneset, label, generank, ..., lambda = 1, bycol = FALSE,
minsize = 15, maxsize = 500, randseed = 11235, rounds = 1000)
Arguments
expdat |
Either a character string of gene expression file name (.gct file), or an expression matrix with rownames, each row is a gene and each column is a sample. |
geneset |
Either a character string of gene set file name (.gmt file), or a list contains a vector of gene set names, a vector of gene set description and a list of gene symbols in each gene set. |
label |
Either a character string of phenotypes file (.cls file), or a list contains a vector of types of pheotype and a vector of encoded pheotypes of samples. It should match gene expression matrix. |
generank |
Either an integer vector of rank of each gene according to some statistic, or a character string naming a function which takes gene expression matrix as input and returns a vector of gene rank (not tie). |
... |
Parameters of the function specified (as a character string) by |
lambda |
Penalty for introducing an additional slice in dynamic slicing procedure, which is used to avoid making too many slices. It corresponds to the type I error under the scenario that the two variables are independent. |
bycol |
Type of permutation, by row (default) or by column. Permutation by row means shuffling the gene rank. Permutation by column means shuffling pheotypes then obtain gene rank. |
minsize |
Minimum number of genes in genesets to be considered. |
maxsize |
Maximum number of genes in genesets to be considered. |
randseed |
Optional initial seed for random number generator (integer). |
rounds |
Number of permutations for estimating significant level of results. |
Details
ds_gsa
performs gene set analysis via dynamic slicing. It returns the DS statistics and slicing strategy of each gene set.
ds_gsa
does not attempt to integrate the ranking method into it. It requires ranking method or directly the gene rank as a parameter. Leaving ranking method as an optional input parameter is convenience for users who would like to use any ranking methods they want.
Value
A list with informations of gene sets whose size satisfy the minimum and maximum size thresholds. Its contains the following components:
set_name |
A vector of gene set names. |
set_size |
A vector of gene set sizes. |
DS_value |
A vector of dynamic slicing statistic of each gene set. |
pvalue |
A vector of p-value of each gene set. |
FDR |
A vector of FDR of each gene set. |
slices |
A list of slicing strategy of each gene set. Each component is a matrix of slices. |
References
Jiang, B., Ye, C. and Liu, J.S. Non-parametric K-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510): 642-653, 2015.
Subramanian, A., Tamayo, P., Mootha, V. K., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(43): 15545-15550.
Benjamini, Y. and Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 1995, 57(1): 289-300.
See Also
ds_k
.
Examples
## Loading data from files
## Not run:
gctpath <- "P53.gct"
clspath <- "P53.cls"
gmtpath <- "C2.gmt"
expdat <- load_gct(gctpath)
label <- load_cls(clspath)
geneset <- load_gmt(gmtpath)
fc <- function(x, label)
{
d0 <- apply(x[,which(label == 0)], 1, mean)
d1 <- apply(x[,which(label == 1)], 1, mean)
d <- d1 / d0
return(order(d))
}
ds_gsa_obj <- ds_gsa(expdat, geneset, label, "fc", lambda = 1.2, bycol = TRUE,
minsize = 15, maxsize = 500, randseed = 11235, rounds = 100)
## End(Not run)
Dependency detection between level k
(k > 1
) categorical variable and continuous variable
Description
Dependency detection between level k
(k > 1
) categorical variable and continuous variable. The basic idea is that the different values of categorical variable correspond to different distribution of continuous variable if there exist dependency between this two varibles, otherwise the distributions of continuous variable do not show difference conditioning on the values of categorical variable. Statistic for this dynamic slicing method is a regularized likelihood-ratio calculated via a dynamic programming procedure. For more details please refer to Jiang, Ye & Liu (2015). Results contains value of dynamic slicing statistic and slicing strategy. It could be applied for non-parametric K-sample hypothesis testing.
Usage
ds_k(x, xdim, lambda, slice = FALSE)
Arguments
x |
Vector: observations of categorical variable, |
xdim |
Level of |
lambda |
Penalty for introducing an additional slice, which is used to avoid making too many slices. It corresponds to the type I error under the scenario that the two variables are independent. |
slice |
Indicator for reporting slicing strategy or not. |
Value
dsval |
Value of dynamic slicing statistic. It is nonnegative. If it equals zero, the categorical variable and continuous variable will be treated as independent of each other, otherwise they will be treated as dependent. |
slices |
Slicing strategy that maximize dynamic slicing statistic based on currently ranked vector |
References
Jiang, B., Ye, C. and Liu, J.S. Non-parametric K-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510): 642-653, 2015.
See Also
Examples
n <- 100
mu <- 0.5
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))
x <- c(rep("1", n), rep("2", n))
x <- relabel(x)
x <- x[order(y)]
xdim <- max(x) + 1
lambda <- 1.0
dsres <- ds_k(x, xdim, lambda, slice = TRUE)
Hypothesis testing via dynamic slicing
Description
Perform a one- or K-sample (K > 1
) hypothesis testing via dynamic slicing.
Usage
ds_test(y, x, ..., type = c("ds", "eqp"), lambda = 1, alpha = 1, rounds = 0)
Arguments
y |
A numeric vector of data values. |
x |
Either an integer vector of data values, from 0 to |
... |
Parameters of the distribution specified (as a character string) by |
type |
Methods applied for dynamic slicing. " |
lambda |
Penalty for introducing an additional slice, which is used to avoid making too many slices. It corresponds to the type I error under the scenario that the two variables are independent. |
alpha |
Penalty required for " |
rounds |
Number of permutations for estimating empirical p-value. |
Details
If x
is an integer vector, ds_test
performs K-sample test (K > 1
).
Under this scenario, suppose that there are observations y
drawn from some continuous populations. Let x
be a vector that stores values of indicator of samples from different populations, i.e., x
has values 0, 1, \ldots, K-1
. The null hypothesis is that these populations have the same distribution.
If x
is a character string naming a continuous (cumulative) distribution function, ds_test
performs one-sample test with the null hypothesis that the distribution function which generated y
is distribution x
with parameters specified by \ldots
. The parameters specified in \ldots
must be pre-specified and not estimated from the data.
Only empirical p-values are available by specifying the value of parameter rounds
, the number of permutation. lambda
and alpha
(for one-sample test with type "ds
") contributes to p-value.
The procedure of choosing parameter lambda
was described in Jiang, Ye & Liu (2015). Refer to dataset ds_type_one_error
in this package for the empirical relationship of lambda
, sample size and type I error.
Value
A list with class "htest
" containing the following components:
statistic |
The value of the dynamic slicing statistic. |
p.value |
The p-value of the test. |
alternative |
A character string describing the alternative hypothesis. |
method |
A character string indicating what type of test was performed. |
data.name |
A character string giving the name(s) of the data. |
slices |
Slicing strategy that maximize dynamic slicing statistic in K-sample test. Each row stands for a slice. Each column except the last one stands for the number of observations take each value in each slice. The last column is the number of observations in each slice i.e., the sum of the first column to the kth column. |
References
Jiang, B., Ye, C. and Liu, J.S. Non-parametric K-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510): 642-653, 2015.
Examples
## One-sample test
n <- 100
mu <- 0.5
y <- rnorm(n, mu, 1)
lambda <- 1.0
alpha <- 1.0
dsres <- ds_test(y, "pnorm", 0, 1, lambda = 1, alpha = 1, rounds = 100)
dsres <- ds_test(y, "pnorm", 0, 1, type = "ds", lambda = 1, alpha = 1)
dsres <- ds_test(y, "pnorm", 0, 1, type = "eqp", lambda = 1, rounds = 100)
dsres <- ds_test(y, "pnorm", 0, 1, type = "eqp", lambda = 1)
## K-sample test
n <- 100
mu <- 0.5
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))
## generate x in this way:
x <- c(rep(0, n), rep(1, n))
x <- as.integer(x)
## or in this way:
x <- c(rep("G1", n), rep("G2", n))
x <- relabel(x)
lambda <- 1.0
dsres <- ds_test(y, x, lambda = 1, rounds = 100)
dsres <- ds_test(y, x, type = "eqp", lambda = 1, rounds = 100)
Relationship between penalty and Type I error
Description
Because there are not close form relationship between the value of slicing penalty and Type-I error rate of dynamic slicing. Empirical relationship between them is provided in this dataset. Each row corresponds to the value of penalty. Each column corresponds to the total number of samples.
Usage
data(ds_type_one_error)
Format
A list on the following 2 variables.
two_sample
a matrix.
three_sample
a matrix.
Examples
data(ds_type_one_error)
Export gene set analysis result
Description
Export gene set analysis result.
Usage
export_res(ds_gsa_obj, file = "", ..., cutoff = 1, decreasing = FALSE,
type = c("name", "size", "DS", "p-value", "FDR", "slice"))
Arguments
ds_gsa_obj |
Object returned by |
file |
Either a character string naming a file or a connection open for writing. "" indicates output to the console. |
... |
Parameters of |
cutoff |
threshold for selecting gene set analysis result according to parameter |
decreasing |
Ranking results decreasingly (default) or ascendingly. |
type |
Opitons for ranking results by gene set name (“name”), gene set size (“size”), value of dynamic slicing statistic (" |
Details
The usage of export_res
is similar to write.table
.
See Also
The ‘R Data Import/Export’ manual.
Examples
## Not run:
ds_gsa_obj <- ds_gsa(expdat, geneset, label, fc, lambda = 1.2, bycol = TRUE,
minsize = 15, maxsize = 500, randseed = 11235, rounds = 100)
export_res(ds_gsa_obj, "ds_gsa_res.txt", sep = "\t", type = "DS", cutoff = 0,
row.names = F, col.names = T, quote = F, append = F)
## End(Not run)
Gene expression matrix in gene set analysis
Description
P53 NCI-60 data set provided by Subramanian et al., (2005). A gene expression matrix whose rows correspond to genes and column correspond to samples.
Usage
data(gsa_exp)
Format
A matrix.
Source
http://www.broadinstitute.org/gsea
References
Subramanian, A., Tamayo, P., Mootha, V. K., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(43): 15545-15550.
Examples
data(gsa_exp)
Sample labels in gene set analysis
Description
P53 NCI-60 data set provided by Subramanian et al., (2005). A list with phenotypes vector and a vector of sample label values.
Usage
data(gsa_label)
Format
A list on the following 2 variables.
pheotype
a character vector contains two genotypes.
value
a numeric vector contains sample label 0 and 1, where 0 and 1 stands for “MUT” and “WT”, respectively.
Source
http://www.broadinstitute.org/gsea
References
Subramanian, A., Tamayo, P., Mootha, V. K., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(43): 15545-15550.
Examples
data(gsa_label)
Gene set list in gene set analysis
Description
Gene set provided by Subramanian et al., (2005). A list with gene set vector, gene set description vector and a list whose elements are vector of genes in each gene set.
Usage
data(gsa_set)
Format
A list with three elements: gene set name, gene set description and genes in gene set.
set_name
a character vector of gene set name.
set_description
a character vector of gene set description.
set_description
a list whose elements are vector. Each of them contains genes in each gene set.
Source
http://www.broadinstitute.org/gsea
References
Subramanian, A., Tamayo, P., Mootha, V. K., et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 2005, 102(43): 15545-15550.
Examples
data(gsa_set)
Load phenotype file
Description
Load phenotype file from .cls file
Usage
load_cls(file)
Arguments
file |
Name of .cls file which contains phenotypes of samples, separared by spaces. |
Value
A list with components:
pheotype |
Pheotype of samples |
value |
A vector with encoded value of pheotypes. If there is k pheotypes, it takes values |
Examples
# Pheotype files are available after registration at Broad institute GSEA website
# http://www.broadinstitute.org/gsea
## Not run:
filename <- "P53.cls"
label <- load_cls(filename)
## End(Not run)
Load gene expression file
Description
Load gene expression data from .gct file
Usage
load_gct(file)
Arguments
file |
Name of .gct file which contains gene expression data. Should be a tab-separated text file. The first row is version and the second is the dimension of expression matrix. There is an expression matrix from the third row to the end. The third row is column name of expression matrix. The first column is gene symbol and the second is the description of gene. For the remaining rows and columns, each row is a gene and each column is a sample. |
Value
A matrix with row names and column names.
Examples
# Gene expression files are available after registration at Broad institute GSEA website
# http://www.broadinstitute.org/gsea
## Not run:
filename <- "P53.gct"
expdat <- load_gct(filename)
## End(Not run)
Load gene set file
Description
Load gene set from .gmt file
Usage
load_gmt(file)
Arguments
file |
Name of .gmt file which contains gene sets. Should be a tab-separated text file. Each row is a gene set. The first column is gene set name and the second is its description. Remaining columns are gene symbols of genes in this set. |
Value
A list with components:
set_name |
Vector of gene set names. |
set_description |
Vector of gene set descriptions. |
gene_symbol |
List of gene symbols in each gene set. |
Examples
# Gene set files are available after registration at Broad institute GSEA website
# http://www.broadinstitute.org/gsea
## Not run:
filename <- "C2.gmt"
geneset <- load_gmt(filename)
## End(Not run)
Ranking genes by signal to noise ratio
Description
Ranking genes by signal to noise ratio according to their expression data.
Usage
rank_by_s2n(expmat, label)
Arguments
expmat |
A matrix of gene expression data. Each row is a gene and each column is a sample. |
label |
An integer vector of encoded pheotypes. Its value is 0 and 1. Its length should match the column number of expression matrix. |
Value
A vector of rank of each gene according to signal to noise ratio.
Examples
expdat <- matrix(rnorm(500), nrow = 25, ncol = 20)
label <- rep(c(0, 1), 10)
ranklist <- rank_by_s2n(expdat, label)
Reassigning values of categorical variable
Description
Reassigning values of categorical variable. It is used for generating legal value of categorical variable before applying dynamic slicing.
Usage
relabel(x)
Arguments
x |
A vector of data values. |
Value
An integer vector with values range from 0 to k (k > 0).
See Also
Examples
n <- 10
x <- c(rep("G1", n), rep("G2", n))
x <- relabel(x)
x <- c(rep(4, n), rep(5, n), rep(NA, n))
x <- relabel(x)
Show the slicing result
Description
Showing slicing result and plotting counts of observations in each slice.
Usage
slice_show(slices_obj, main="Counts in each slice", xlab="Slices", ylab="Percentage")
Arguments
slices_obj |
A matrix stores slicing strategy. It is a component of object returned by function |
main |
An overall title for the plo |
xlab |
A title for the x axis |
ylab |
A title for the y axis |
Value
A “ggplot” object which illustrates details of slicing.
See Also
Examples
n <- 100
mu <- 0.5
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))
x <- c(rep(0, n), rep(1, n))
x <- x[order(y)]
xdim <- max(x) + 1
lambda <- 1.0
dsres <- ds_k(x, xdim, lambda, slice = TRUE)
ds_show <- slice_show(dsres$slices)