Type: | Package |
Title: | Inferring the Topology of Omics Data |
Version: | 1.0.2 |
Author: | Nanne Aben |
Maintainer: | Nanne Aben <nanne.aben@gmail.com> |
Description: | Infers a topology of relationships between different datasets, such as multi-omics and phenotypic data recorded on the same samples. We based this methodology on the RV coefficient (Robert & Escoufier, 1976, <doi:10.2307/2347233>), a measure of matrix correlation, which we have extended for partial matrix correlations and binary data (Aben et al., 2018, <doi:10.1101/293993>). |
Imports: | Matrix, corpcor |
License: | GPL-2 |
Encoding: | UTF-8 |
LazyData: | true |
Suggests: | knitr, rmarkdown, NMF, pcalg, Rgraphviz |
VignetteBuilder: | knitr |
RoxygenNote: | 6.0.1 |
NeedsCompilation: | no |
Packaged: | 2018-06-13 08:58:43 UTC; n.aben |
Repository: | CRAN |
Date/Publication: | 2018-06-13 09:16:06 UTC |
Performing a single bootstrap
Description
Helper function for run.bootstraps(). It's unlikely you'll ever need to run this function directly.
Usage
bootstrap.config.matrices(config_matrices)
Arguments
config_matrices |
The result from compute.config.matrices(). |
Value
An n x n matrix of RV coefficients for the bootstrapped data, where n is the number of datasets.
Compute configuration matrices
Description
Given a list of n data matrices (corresponding to n datasets), this function computes the configuration matrix for each of these configuration matrices. By default inner product similarity is used, but other similarity (such as Jaccard similarity for binary data) can also be used (see the vignette 'A quick introduction to iTOP' for more information). In addition, the configuration matrices can be centered and prepared for use with the modified RV coefficient, both of which we will briefly explain here.
Usage
compute.config.matrices(data, similarity_fun = inner.product, center = TRUE,
mod.rv = TRUE)
Arguments
data |
List of datasets. |
similarity_fun |
Either a function pointer to the similarity function to be used for all datasets; or a list of function pointers, if different similarity functions need to be used for different datasets (default=inner.product). |
center |
Either a boolean indicating whether centering should be used for all datasets; or a list of booleans, if centering should be used for some datasets but not all of them (default=TRUE). |
mod.rv |
Either a boolean indicating whether the modified RV coefficient should be used for all datasets; or a list of booleans, if the modified RV should be used for some datasets but not all of them (default=TRUE). |
Details
The RV coefficient often results in values very close to one when both datasets are not centered around zero, even for orthogonal data. For inner product similarity and Jaccard similarity, we recommend using centering. However, for some other similarity measures, centering may not be beneficial (for example, because the measure itself is already centered, such as in the case of Pearson correlation). For more information on centering of binary (and other non-continuous) data, for which we used kernel centering of the configuration matrix, we refer to our manuscript: Aben et al., 2018, doi.org/10.1101/293993.
The modified RV coefficient was proposed for high-dimensional data, as the regular RV coefficient would result in values close to one even for orthogonal data. We recommend always using the modified RV coefficient.
Value
A list of n configuration matrices, where n is the number of datasets.
Examples
set.seed(2)
n = 100
p = 100
x1 = matrix(rnorm(n*p), n, p)
x2 = x1 + matrix(rnorm(n*p), n, p)
x3 = x2 + matrix(rnorm(n*p), n, p)
data = list(x1=x1, x2=x2, x3=x3)
config_matrices = compute.config.matrices(data)
cors = rv.cor.matrix(config_matrices)
Computes a configuration matrix
Description
Given a data matrix, this function computes the configuration matrix for the corresponding dataset. You'll typically won't need to call this function directly, but should use compute.config.matrices() instead, as it will make determining partial RV coefficients, p-values and confidence intervals easier later on.
Usage
compute.config.matrix(x, similarity_fun = inner.product, center = TRUE,
mod.rv = TRUE)
Arguments
x |
Data matrix. |
similarity_fun |
A function pointer to the similarity function to be used (default=inner.product). |
center |
A boolean indicating whether centering should be used (default=TRUE). |
mod.rv |
A boolean indicating whether the modified RV coefficient should be used (default=TRUE). |
Value
A configuration matrix.
Examples
set.seed(2)
n = 100
p = 100
x1 = matrix(rnorm(n*p), n, p)
x2 = x1 + matrix(rnorm(n*p), n, p)
S1 = compute.config.matrix(x1)
S2 = compute.config.matrix(x1)
rv.coef(S1, S2)
Inner product similarity.
Description
Computes the inner product between x and y.
Usage
inner.product(x, y)
Arguments
x |
A vector of numbers. |
y |
A vector of numbers. |
Value
The inner product similarity between x and y.
Examples
set.seed(2)
n = 100
x = rnorm(n)
y = rnorm(n)
inner.product(x, y)
Intersect samples between datasets.
Description
In order to make all datasets comparable, we have to make sure they describe the same set of samples. This function takes a list of datasets (i.e. data matrices), takes the intersect of all rownames, and returns a list of datasets with only those samples.
Usage
intersect.samples(data)
Arguments
data |
A list of data matrices. The data matrices need to have rownames. |
Value
A list with of data matrices, all with the same set of samples.
Examples
set.seed(2)
n = 100
p = 100
x1 = matrix(rnorm(n*p), n, p)
x2 = matrix(rnorm(n*p), n, p)
rownames(x1) = rownames(x2) = paste0("X",1:n)
data = list(x1=x1[1:90,], x2=x2[10:100,])
data = intersect.samples(data)
Jaccard similarity.
Description
Computes the Jaccard similarity between x and y. When both x and y only contain zeroes, the Jaccard similarity it not defined. This function returns zero for that specific case.
Usage
jaccard(x, y)
Arguments
x |
A vector of zeroes and ones. |
y |
A vector of zeroes and ones. |
Value
The Jaccard similarity between x and y.
Examples
set.seed(2)
n = 100
x = rbinom(n, 1, 0.5)
y = rbinom(n, 1, 0.5)
jaccard(x, y)
Performing a permutation
Description
Helper function for run.permutations(). It's unlikely you'll ever need to run this function directly.
Usage
permute.config.matrices(config_matrices)
Arguments
config_matrices |
The result from compute.config.matrices(). |
Value
An n x n matrix of RV coefficients for the permutated data, where n is the number of datasets.
Process a custom configuration matrix.
Description
This function can be used to process a custom-made configuration matrix (i.e. similarity matrix) for use with the RV coefficient. The function can perform two tasks: centering and preparation for the modified RV coefficient, both of which we will briefly explain here.
Usage
process.custom.config.matrix(S, center = TRUE, mod.rv = TRUE)
Arguments
S |
A configuration matrix. |
center |
Should the configuration matrix be centered using kernel centering? |
mod.rv |
Should the configuration matrix be prepared for the modified RV coefficient? |
Details
The RV coefficient often results in values very close to one when both datasets are not centered around zero, even for orthogonal data. For inner product similarity and Jaccard similarity, we recommend using centering. However, for some other similarity measures, centering may not be beneficial (for example, because the measure itself is already centered, such as in the case of Pearson correlation). For more information on centering of binary (and other non-continuous) data, for which we used kernel centering of the configuration matrix, we refer to our manuscript: Aben et al., 2018, doi.org/10.1101/293993.
The modified RV coefficient was proposed for high-dimensional data, as the regular RV coefficient would result in values close to one even for orthogonal data. We recommend always using the modified RV coefficient.
Value
The processed configuration matrix.
Examples
set.seed(2)
n = 100
p = 100
x = matrix(rnorm(n*p)+10, n, p)
S = x%*%t(x)
S_dash = process.custom.config.matrix(S, center=TRUE, mod.rv=TRUE)
Bootstrapping procedure
Description
Performs a bootstrapping procedure. The result from this function can be used with rv.conf.interval() to determine confidence intervals. By decoupling this into two functions, you don't have to redo the bootstrapping for every confidence interval, hence increasing the runtime speed.
Usage
run.bootstraps(config_matrices, nboots = 1000)
Arguments
config_matrices |
The result from compute.config.matrices(). |
nboots |
The number of bootstraps to perform (default=1000). |
Value
An n x n x nboots array of RV coefficients for the bootstrapped data, where n is the number of datasets.
Examples
set.seed(2)
n = 100
p = 100
x1 = matrix(rnorm(n*p), n, p)
x2 = x1 + matrix(rnorm(n*p), n, p)
x3 = x2 + matrix(rnorm(n*p), n, p)
data = list(x1=x1, x2=x2, x3=x3)
config_matrices = compute.config.matrices(data)
cors_boot = run.bootstraps(config_matrices, nboots=1000)
rv.conf.interval(cors_boot, "x1", "x3", "x2")
Permutations for significance testing
Description
Performs a permutations for significance testing. The result from this function can be used with rv.pval() to determine a p-value. By decoupling this into two functions, you don't have to redo the permutations for every p-value, hence increasing the runtime speed.
Usage
run.permutations(config_matrices, nperm = 1000)
Arguments
config_matrices |
The result from compute.config.matrices(). |
nperm |
The number of permutations to perform (default=1000). |
Value
An n x n x nperms array of RV coefficients for the permutated data, where n is the number of datasets.
Examples
set.seed(2)
n = 100
p = 100
x1 = matrix(rnorm(n*p), n, p)
x2 = x1 + matrix(rnorm(n*p), n, p)
x3 = x2 + matrix(rnorm(n*p), n, p)
data = list(x1=x1, x2=x2, x3=x3)
config_matrices = compute.config.matrices(data)
cors = rv.cor.matrix(config_matrices)
cors_perm = run.permutations(config_matrices, nperm=1000)
rv.pval(cors, cors_perm, "x1", "x3", "x2")
Computes the RV coefficient
Description
Computes the RV coefficient between dataset 1 and dataset 2. You'll typically won't need to call this function directly, but should use rv.cor.matrix() instead, as it will make determining partial RV coefficients, p-values and confidence intervals easier later on.
Usage
rv.coef(S1, S2)
Arguments
S1 |
Configuration matrix corresponding to dataset 1 |
S2 |
Configuration matrix corresponding to dataset 2 |
Value
The RV coefficient between dataset 1 and dataset 2
Examples
set.seed(2)
n = 100
p = 100
x1 = matrix(rnorm(n*p), n, p)
x2 = x1 + matrix(rnorm(n*p), n, p)
S1 = compute.config.matrix(x1)
S2 = compute.config.matrix(x1)
rv.coef(S1, S2)
Determining a confidence interval for the (partial) RV coefficient
Description
This function uses a bootstrapping procedure to determine a confidence interval for the RV coefficient RV(a, b) or the partial RV coefficient RV(a, b | set).
Usage
rv.conf.interval(cors_boot, a, b, set = NULL, conf = 0.95)
Arguments
cors_boot |
The result from run.bootstraps(). |
a |
Either an index or a string to identify dataset a. |
b |
Either an index or a string to identify dataset b. |
set |
Optional parameter to define the datasets that need to be partialized for. If set consists of one dataset, then provide an index or a string to identify set. If set consists of multiple datasets, then provide a vector of indices or a vector of strings. |
conf |
The size of the confidence interval (default=0.95). |
Value
The confidence interval.
Examples
set.seed(2)
n = 100
p = 100
x1 = matrix(rnorm(n*p), n, p)
x2 = x1 + matrix(rnorm(n*p), n, p)
x3 = x2 + matrix(rnorm(n*p), n, p)
data = list(x1=x1, x2=x2, x3=x3)
config_matrices = compute.config.matrices(data)
cors_boot = run.bootstraps(config_matrices, nboots=1000)
rv.conf.interval(cors_boot, "x1", "x3", "x2")
A correlation matrix of RV coefficients
Description
Given a list of n configuration matrices (corresponding to n datasets), this function computes an n x n matrix of pairwise RV coefficients.
Usage
rv.cor.matrix(config_matrices)
Arguments
config_matrices |
The result from compute.config.matrices(). |
Value
An n x n matrix of pairwise RV coefficients, where n is the number of datasets.
Examples
set.seed(2)
n = 100
p = 100
x1 = matrix(rnorm(n*p), n, p)
x2 = x1 + matrix(rnorm(n*p), n, p)
x3 = x2 + matrix(rnorm(n*p), n, p)
data = list(x1=x1, x2=x2, x3=x3)
config_matrices = compute.config.matrices(data)
cors = rv.cor.matrix(config_matrices)
Wrapper function to determine significance in the PC algorithm
Description
This function is a wrapper function around rv.pval(), such that it can easily be used with pc() from the pcalg package. If you have trouble installing the pcalg package, have a look at our vignette 'A quick start to iTOP'.
Usage
rv.link.significance(a, b, set, suffStat)
Arguments
a |
Either an index or a string to identify dataset a. |
b |
Either an index or a string to identify dataset b. |
set |
Datasets that need to be partialized for. Set to NULL if there are none (i.e. if you're computing a regular, non-partial RV). If set consists of one dataset, then provide an index or a string to identify set. If set consists of multiple datasets, then provide a vector of indices or a vector of strings. |
suffStat |
A named list with two items: cors, which is the result from rv.cor.matrix(); and cors_perm, which is the result from run.permutations(). |
Value
The p-value.
Examples
set.seed(2)
n = 100
p = 100
x1 = matrix(rnorm(n*p), n, p)
x2 = x1 + matrix(rnorm(n*p), n, p)
x3 = x2 + matrix(rnorm(n*p), n, p)
data = list(x1=x1, x2=x2, x3=x3)
config_matrices = compute.config.matrices(data)
cors = rv.cor.matrix(config_matrices)
cors_perm = run.permutations(config_matrices, nperm=1000)
## Not run:
library(pcalg)
suffStat = list(cors=cors, cors_perm=cors_perm)
pc.fit = pc(suffStat=suffStat, indepTest=rv.link.significance, labels=names(data),
alpha=0.05, conservative=TRUE, solve.confl=TRUE)
plot(pc.fit, main="")
## End(Not run)
Determining a (partial) RV coefficient
Description
Determines the RV coefficient RV(a, b) or the partial RV coefficient RV(a, b | set).
Usage
rv.pcor(cors, a, b, set = NULL)
Arguments
cors |
The result from rv.cor.matrix(). |
a |
Either an index or a string to identify dataset a. |
b |
Either an index or a string to identify dataset b. |
set |
Optional parameter to define the datasets that need to be partialized for. If set consists of one dataset, then provide an index or a string to identify set. If set consists of multiple datasets, then provide a vector of indices or a vector of strings. |
Value
The (partial) RV coefficient.
Examples
set.seed(2)
n = 100
p = 100
x1 = matrix(rnorm(n*p), n, p)
x2 = x1 + matrix(rnorm(n*p), n, p)
x3 = x2 + matrix(rnorm(n*p), n, p)
data = list(x1=x1, x2=x2, x3=x3)
config_matrices = compute.config.matrices(data)
cors = rv.cor.matrix(config_matrices)
rv.pcor(cors, "x1", "x3", "x2")
Determining a p-value the (partial) RV coefficient
Description
This function uses a permutation test to determine a p-value for the RV coefficient RV(a, b) or the partial RV coefficient RV(a, b | set).
Usage
rv.pval(cors, cors_perm, a, b, set = NULL)
Arguments
cors |
The result from rv.cor.matrix(). |
cors_perm |
The result from run.permutations(). |
a |
Either an index or a string to identify dataset a. |
b |
Either an index or a string to identify dataset b. |
set |
Optional parameter to define the datasets that need to be partialized for. If set consists of one dataset, then provide an index or a string to identify set. If set consists of multiple datasets, then provide a vector of indices or a vector of strings. |
Value
The p-value.
Examples
set.seed(2)
n = 100
p = 100
x1 = matrix(rnorm(n*p), n, p)
x2 = x1 + matrix(rnorm(n*p), n, p)
x3 = x2 + matrix(rnorm(n*p), n, p)
data = list(x1=x1, x2=x2, x3=x3)
config_matrices = compute.config.matrices(data)
cors = rv.cor.matrix(config_matrices)
cors_perm = run.permutations(config_matrices, nperm=1000)
rv.pval(cors, cors_perm, "x1", "x3", "x2")