| Type: | Package | 
| Title: | Multiscale Graph Correlation | 
| Version: | 2.0.2 | 
| Date: | 2020-06-20 | 
| Maintainer: | Eric Bridgeford <ericwb95@gmail.com> | 
| Description: | Multiscale Graph Correlation (MGC) is a framework developed by Vogelstein et al. (2019) <doi:10.7554/eLife.41690> that extends global correlation procedures to be multiscale; consequently, MGC tests typically require far fewer samples than existing methods for a wide variety of dependence structures and dimensionalities, while maintaining computational efficiency. Moreover, MGC provides a simple and elegant multiscale characterization of the potentially complex latent geometry underlying the relationship. | 
| Depends: | R (≥ 3.4.0) | 
| Imports: | stats, MASS, abind, boot, energy, raster | 
| URL: | https://github.com/neurodata/r-mgc | 
| Suggests: | testthat (≥ 2.1.0), ggplot2, reshape2, knitr, rmarkdown | 
| License: | GPL-2 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.0.2 | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | yes | 
| Packaged: | 2020-06-22 22:51:13 UTC; eric | 
| Author: | Eric Bridgeford [aut, cre], Censheng Shen [aut], Shangsi Wang [aut], Joshua Vogelstein [ths] | 
| Repository: | CRAN | 
| Date/Publication: | 2020-06-23 12:50:18 UTC | 
Connected Components Labelling – Unique Patch Labelling
Description
ConnCompLabel is a 1 pass implementation of connected components
labelling. Here it is applied to identify disjunt patches within a
distribution. 
 
 The raster matrix can be a raster of class 'asc'
(adehabitat package), 'RasterLayer' (raster package) or
'SpatialGridDataFrame' (sp package).
Usage
ConnCompLabel(mat)
Arguments
| mat | is a binary matrix of data with 0 representing background and 1 representing environment of interest. NA values are acceptable. The matrix can be a raster of class 'asc' (this & adehabitat package), 'RasterLayer' (raster package) or 'SpatialGridDataFrame' (sp package) | 
Value
A matrix of the same dim and class of mat in which unique
components (individual patches) are numbered 1:n with 0 remaining background
value.
Author(s)
Jeremy VanDerWal jjvanderwal@gmail.com
References
Chang, F., C.-J. Chen, and C.-J. Lu. 2004. A linear-time component-labeling algorithm using contour tracing technique. Comput. Vis. Image Underst. 93:206-220.
Examples
#define a simple binary matrix
tmat = { matrix(c( 0,0,0,1,0,0,1,1,0,1,
                   0,0,1,0,1,0,0,0,0,0,
                   0,1,NA,1,0,1,0,0,0,1,
                   1,0,1,1,1,0,1,0,0,1,
                   0,1,0,1,0,1,0,0,0,1,
                   0,0,1,0,1,0,0,1,1,0,
                   1,0,0,1,0,0,1,0,0,1,
                   0,1,0,0,0,1,0,0,0,1,
                   0,0,1,1,1,0,0,0,0,1,
                   1,1,1,0,0,0,0,0,0,1),nr=10,byrow=TRUE) }
#do the connected component labelling
ccl.mat = ConnCompLabel(tmat)
ccl.mat
image(t(ccl.mat[10:1,]),col=c('grey',rainbow(length(unique(ccl.mat))-1)))
An auxiliary function that properly transforms the distance matrix X
Description
An auxiliary function that properly transforms the distance matrix X
Usage
DistCentering(X, option, optionRk)
Arguments
| X | is a symmetric distance matrix | 
| option | is a string that specifies which global correlation to build up-on, including 'mgc','dcor','mantel', and 'rank' | 
| optionRk | is a string that specifies whether ranking within column is computed or not. | 
Value
A list contains the following:
| A | is the centered distance matrices | 
| RX | is the column rank matrices of X. | 
An auxiliary function that sorts the entries within each column by ascending order: For ties, the minimum ranking is used, e.g. if there are repeating distance entries, the order is like 1,2,3,3,4,..,n-1.
Description
An auxiliary function that sorts the entries within each column by ascending order: For ties, the minimum ranking is used, e.g. if there are repeating distance entries, the order is like 1,2,3,3,4,..,n-1.
Usage
DistRanks(dis)
Arguments
| dis | is a symmetric distance matrix. | 
Value
disRank is the column rank matrices of X.
An auxiliary function that computes all local correlations simultaneously in O(n^2).
Description
An auxiliary function that computes all local correlations simultaneously in O(n^2).
Usage
LocalCov(A, B, RX, RY)
Arguments
| A | is a properly transformed distance matrix | 
| B | is the second distance matrix properly transformed | 
| RX | is the column-ranking matrix of A | 
| RY | is the column-ranking matrix of B. | 
Value
covXY is all local covariances computed iteratively.
An auxiliary function that finds the smoothed maximal within the significant region R: If area of R is too small, return the last local corr otherwise take the maximum within R.
Description
An auxiliary function that finds the smoothed maximal within the significant region R: If area of R is too small, return the last local corr otherwise take the maximum within R.
Usage
Smoothing(localCorr, m, n, R)
Arguments
| localCorr | is all local correlations | 
| m | is the number of rows of localCorr | 
| n | is the number of columns of localCorr | 
| R | is a binary matrix of size m by n indicating the significant region. | 
Value
A list contains the following:
| stat | is the sample MGC statistic within  | 
| optimalScale | the estimated optimal scale as a list. | 
Author(s)
C. Shen
An auxiliary function that finds a region of significance in the local correlation map by thresholding.
Description
An auxiliary function that finds a region of significance in the local correlation map by thresholding.
Usage
Thresholding(localCorr, m, n, sz)
Arguments
| localCorr | is all local correlations | 
| m | is the number of rows of localCorr | 
| n | is the number of columns of localCorr | 
| sz | is the sample size of original data (which may not equal m or n in case of repeating data). | 
Value
R is a binary matrix of size m and n, with 1's indicating the significant region.
Author(s)
Eric Bridgeford and C. Shen
Discriminability Mean Normalized Rank
Description
Discriminability Mean Normalized Rank
Usage
discr.mnr(rdf)
Arguments
| rdf | the reliability densities. | 
Value
the mnr.
Reliability Density Function
Description
A function for computing the reliability density function of a dataset.
Usage
discr.rdf(X, ids)
Arguments
| X | 
 | 
| ids | 
 | 
Value
[n] vector of the reliability per sample.
Author(s)
Eric Bridgeford
Discriminability Cross Simulation
Description
A function to simulate data with the same mean that spreads as class id increases.
Usage
discr.sims.cross(
  n,
  d,
  K,
  signal.scale = 10,
  non.scale = 1,
  mean.scale = 0,
  rotate = FALSE,
  class.equal = TRUE,
  ind = FALSE
)
Arguments
| n | the number of samples. | 
| d | the number of dimensions. | 
| K | the number of classes in the dataset. | 
| signal.scale | the scaling for the signal dimension. Defaults to  | 
| non.scale | the scaling for the non-signal dimensions. Defaults to  | 
| mean.scale | whether the magnitude of the difference in the means between the two classes.
If a mean scale is requested,  | 
| rotate | whether to apply a random rotation. Defaults to  | 
| class.equal | whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to  | 
| ind | whether to sample x and y independently. Defaults to  | 
Author(s)
Eric Bridgeford
Examples
library(mgc)
sim <- discr.sims.cross(100, 3, 2)
Discriminability Exponential Simulation
Description
A function to simulate multi-class data with an Exponential class-mean trend.
Usage
discr.sims.exp(
  n,
  d,
  K,
  signal.scale = 1,
  signal.lshift = 1,
  non.scale = 1,
  rotate = FALSE,
  class.equal = TRUE,
  ind = FALSE
)
Arguments
| n | the number of samples. | 
| d | the number of dimensions. The first dimension will be the signal dimension; the remainders noise. | 
| K | the number of classes in the dataset. | 
| signal.scale | the scaling for the signal dimension. Defaults to  | 
| signal.lshift | the location shift for the signal dimension between the classes. Defaults to  | 
| non.scale | the scaling for the non-signal dimensions. Defaults to  | 
| rotate | whether to apply a random rotation. Defaults to  | 
| class.equal | whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to  | 
| ind | whether to sample x and y independently. Defaults to  | 
Author(s)
Eric Bridgeford
Discriminability Spread Simulation
Description
A function to simulate data with the same mean that spreads as class id increases.
Usage
discr.sims.fat_tails(
  n,
  d,
  K,
  signal.scale = 1,
  rotate = FALSE,
  class.equal = TRUE,
  ind = FALSE
)
Arguments
| n | the number of samples. | 
| d | the number of dimensions. | 
| K | the number of classes in the dataset. | 
| signal.scale | the scaling for the signal dimension. Defaults to  | 
| rotate | whether to apply a random rotation. Defaults to  | 
| class.equal | whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to  | 
| ind | whether to sample x and y independently. Defaults to  | 
Author(s)
Eric Bridgeford
Examples
library(mgc)
sim <- discr.sims.fat_tails(100, 3, 2)
Discriminability Linear Simulation
Description
A function to simulate multi-class data with a linear class-mean trend. The signal dimension is the dimension carrying all of the between-class difference, and the non-signal dimensions are noise.
Usage
discr.sims.linear(
  n,
  d,
  K,
  signal.scale = 1,
  signal.lshift = 1,
  non.scale = 1,
  rotate = FALSE,
  class.equal = TRUE,
  ind = FALSE
)
Arguments
| n | the number of samples. | 
| d | the number of dimensions. The first dimension will be the signal dimension; the remainders noise. | 
| K | the number of classes in the dataset. | 
| signal.scale | the scaling for the signal dimension. Defaults to  | 
| signal.lshift | the location shift for the signal dimension between the classes. Defaults to  | 
| non.scale | the scaling for the non-signal dimensions. Defaults to  | 
| rotate | whether to apply a random rotation. Defaults to  | 
| class.equal | whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to  | 
| ind | whether to sample x and y independently. Defaults to  | 
Author(s)
Eric Bridgeford
Discriminability Radial Simulation
Description
A function to simulate data with the same mean with radial symmetry as class id increases.
Usage
discr.sims.radial(
  n,
  d,
  K,
  er.scale = 0.1,
  r = 1,
  class.equal = TRUE,
  ind = FALSE
)
Arguments
| n | the number of samples. | 
| d | the number of dimensions. | 
| K | the number of classes in the dataset. | 
| er.scale | the scaling for the error of the samples. Defaults to  | 
| r | the radial spacing between each class. Defaults to  | 
| class.equal | whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to  | 
| ind | whether to sample x and y independently. Defaults to  | 
Author(s)
Eric Bridgeford
Examples
library(mgc)
sim <- discr.sims.radial(100, 3, 2)
Discriminability Statistic
Description
A function for computing the discriminability from a distance matrix and a set of associated labels.
Usage
discr.stat(
  X,
  Y,
  is.dist = FALSE,
  dist.xfm = mgc.distance,
  dist.params = list(method = "euclidean"),
  dist.return = NULL,
  remove.isolates = TRUE
)
Arguments
| X | is interpreted as: 
 | 
| Y | 
 | 
| is.dist | a boolean indicating whether your  | 
| dist.xfm | if  | 
| dist.params | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return | the return argument for the specified  
 | 
| remove.isolates | remove isolated samples from the dataset. Isolated samples are samples with only
one instance of their class appearing in the  | 
Value
A list containing the following:
| discr | the discriminability statistic. | 
| rdf | the rdfs for each sample. | 
Details
For more details see the help vignette:
vignette("discriminability", package = "mgc")
Author(s)
Eric Bridgeford
References
Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).
Examples
sim <- discr.sims.linear(100, 10, K=2)
X <- sim$X; Y <- sim$Y
discr.stat(X, Y)$discr
Discriminability One Sample Permutation Test
Description
A function that performs a one-sample test for whether the discriminability differs from random chance.
Usage
discr.test.one_sample(
  X,
  Y,
  is.dist = FALSE,
  dist.xfm = mgc.distance,
  dist.params = list(method = "euclidean"),
  dist.return = NULL,
  remove.isolates = TRUE,
  nperm = 500,
  no_cores = 1
)
Arguments
| X | is interpreted as: 
 | 
| Y | 
 | 
| is.dist | a boolean indicating whether your  | 
| dist.xfm | if  | 
| dist.params | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return | the return argument for the specified  
 | 
| remove.isolates | remove isolated samples from the dataset. Isolated samples are samples with only
one instance of their class appearing in the  | 
| nperm | the number of permutations to perform. Defaults to  | 
| no_cores | the number of cores to use for permutation test. Defaults to  | 
Value
A list containing the following:
| stat | the discriminability of the data. | 
| null | the discriminability scores under the null, computed via permutation. | 
| p.value | the pvalue associated with the permutation test. | 
Details
Performs a test of whether an observed discriminability is significantly different from chance, as described in Bridgeford et al. (2019).
With \hat D_X the sample discriminability of X:
H_0: D_X = D_0
and:
H_A: D_X > D_0
 where D_0
is the discriminability that would be observed by random chance.
Author(s)
Eric Bridgeford
References
Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).
Examples
## Not run: 
require(mgc)
n = 100; d=5
# simulation with a large difference between the classes
# meaning they are more discriminable
sim <- discr.sims.linear(n=n, d=d, K=2, signal.lshift=10)
X <- sim$X; Y <- sim$Y
# p-value is small
discr.test.one_sample(X, Y)$p.value
## End(Not run)
Discriminability Two Sample Permutation Test
Description
A function that takes two sets of paired data and tests of whether or not the data is more, less, or non-equally discriminable between the set of paired data.
Usage
discr.test.two_sample(
  X1,
  X2,
  Y,
  dist.xfm = mgc.distance,
  dist.params = list(method = "euclidian"),
  dist.return = NULL,
  remove.isolates = TRUE,
  nperm = 500,
  no_cores = 1,
  alt = "greater"
)
Arguments
| X1 | is interpreted as a  | 
| X2 | is interpreted as a  | 
| Y | 
 | 
| dist.xfm | if  | 
| dist.params | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return | the return argument for the specified  
 | 
| remove.isolates | remove isolated samples from the dataset. Isolated samples are samples with only
one instance of their class appearing in the  | 
| nperm | the number of permutations for permutation test. Defualts to  | 
| no_cores | the number of cores to use for the permutations. Defaults to  | 
| alt | the alternative hypothesis. Can be that first dataset is more discriminable ( | 
Value
A list containing the following:
| stat | the observed test statistic. the test statistic is the difference in discriminability of X1 vs X2. | 
| discr | the discriminabilities for each of the two data sets, as a list. | 
| null | the null distribution of the test statistic, computed via permutation. | 
| p.value | The p-value associated with the test. | 
| alt | The alternative hypothesis for the test. | 
Details
A function that performs a two-sample test for whether the discriminability is different for that of
one dataset vs another, as described in Bridgeford et al. (2019). With \hat D_{X_1} the sample discriminability of one approach, and \hat D_{X_2} the sample discriminability of another approach:
H_0: D_{X_1} = D_{X_2}
and:
H_A: D_{X_1} > D_{X_2}
.
Also implemented are tests of < and \neq.
Author(s)
Eric Bridgeford
References
Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).
Examples
## Not run: 
require(mgc)
require(MASS)
n = 100; d=5
# generate two subjects truths; true difference btwn
# subject 1 (column 1) and subject 2 (column 2)
mus <- cbind(c(0, 0), c(1, 1))
Sigma <- diag(2)  # dimensions are independent
# first dataset X1 contains less noise than X2
X1 <- do.call(rbind, lapply(1:dim(mus)[2],
  function(k) {mvrnorm(n=50, mus[,k], 0.5*Sigma)}))
X2 <- do.call(rbind, lapply(1:dim(mus)[2],
  function(k) {mvrnorm(n=50, mus[,k], 2*Sigma)}))
Y <- do.call(c, lapply(1:2, function(i) rep(i, 50)))
# X1 should be more discriminable, as less noise
discr.test.two_sample(X1, X2, Y, alt="greater")$p.value  # p-value is small
## End(Not run)
Discriminability Utility Validator
Description
A script that validates that data inputs are correct, and returns a distance matrix and a ids vector.
Usage
discr.validator(
  X,
  Y,
  is.dist = FALSE,
  dist.xfm = mgc.distance,
  dist.params = list(method = "euclidean"),
  dist.return = NULL,
  remove.isolates = TRUE
)
Arguments
| X | is interpreted as: 
 | 
| Y | is interpreted as: 
 | 
| is.dist | a boolean indicating whether your  | 
| dist.xfm | if  | 
| dist.params | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return | the return argument for the specified  
 | 
| remove.isolates | whether to remove isolated samples, or samples with only a single instance in the  | 
Value
A list containing the following:
| DX | The X distance matrix, as a  | 
| Y | The sample ids, with isolates removed. | 
A helper function to generate a d-dimensional linear transformation matrix.
Description
A helper function to generate a d-dimensional linear transformation matrix.
Usage
gen.coefs(d)
Arguments
| d | the number of dimensions. | 
Value
A [d] the coefficient vector.
Author(s)
Eric Bridgeford
A helper function for simulating sample labels
Description
A helper function for simulating sample labels
Usage
gen.sample.labels(K, class.equal = TRUE)
Arguments
| K | the number of classes | 
| class.equal | whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to  | 
A helper function to generate n samples of a d-dimensional uniform vector.
Description
A helper function to generate n samples of a d-dimensional uniform vector.
Usage
gen.x.unif(n, d, a = -1, b = 1)
Arguments
| n | the number of samples. | 
| d | the number of dimensions. | 
| a | the lower limit. | 
| b | the upper limit. | 
| x | 
 | 
Author(s)
Eric Bridgeford
Distance Matrix Validator
Description
A utility to validate a distance matrix.
Usage
mgc.dist.validator(
  X,
  is.dist = FALSE,
  dist.xfm = mgc.distance,
  dist.params = list(method = "euclidean"),
  dist.return = NULL
)
Arguments
| X | is interpreted as: 
 | 
| is.dist | a boolean indicating whether your  | 
| dist.xfm | if  | 
| dist.params | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return | the return argument for the specified  
 | 
Value
A distance matrix.
Author(s)
Eric Bridgeford
MGC Distance Transform
Description
Transform the distance matrices, with column-wise ranking if needed.
Usage
mgc.dist.xfm(X, Y, option = "mgc", optionRk = TRUE)
Arguments
| X | 
 | 
| Y | 
 | 
| option | is a string that specifies which global correlation to build up-on. Defaults to  
 | 
| optionRk | is a string that specifies whether ranking within column is computed or not. If  | 
Value
A list containing the following:
| A | 
 | 
| B | 
 | 
| RX | 
 | 
| RY | 
 | 
Author(s)
C. Shen
Examples
library(mgc)
n=200; d=2
data <- mgc.sims.linear(n, d)
Dx <- as.matrix(dist(data$X), nrow=n); Dy <- as.matrix(dist(data$Y), nrow=n)
dt <- mgc.dist.xfm(Dx, Dy)
Distance
Description
A function that returns a distance matrix given a collection of observations.
Usage
mgc.distance(X, method = "euclidean")
Arguments
| X | 
 | 
| method | the method for computing distances. Defaults to  | 
Value
a [n x n] distance matrix indicating the pairwise distances between all samples passed in.
Author(s)
Eric Bridgeford
MGC K Sample Testing
Description
MGC K Sample Testing provides a wrapper for MGC Sample testing under the constraint that the Ys here are categorical labels with K possible sample ids. This function uses a 0-1 loss for the Ys (one-hot-encoding)).
Usage
mgc.ksample(X, Y, mgc.opts = list(), ...)
Arguments
| X | is interpreted as: 
 | 
| Y | 
 | 
| mgc.opts | Arguments to pass to MGC, as a named list. See  | 
| ... | trailing args. | 
Value
A list containing the following:
| p.value | P-value of MGC | 
| stat | is the sample MGC statistic within  | 
| pLocalCorr | P-value of the local correlations by double matrix index | 
| localCorr | the local correlations | 
| optimalScale | the optimal scale identified by MGC | 
Author(s)
Eric Bridgeford
References
Youjin Lee, et al. "Network Dependence Testing via Diffusion Maps and Distance-Based Correlations." ArXiv (2019).
Examples
## Not run: 
library(mgc)
library(MASS)
n = 100; d = 2
# simulate 100 samples, where first 50 have mean [0,0] and second 50 have mean [1,1]
Y <- c(replicate(n/2, 0), replicate(n/2, 1))
X <- do.call(rbind, lapply(Y, function(y) {
    return(rnorm(d) + y)
}))
# p value is small
mgc.ksample(X, Y, mgc.opts=list(nperm=100))$p.value
## End(Not run)
MGC Local Correlations
Description
Compute all local correlation coefficients in O(n^2 log n)
Usage
mgc.localcorr(
  X,
  Y,
  is.dist.X = FALSE,
  dist.xfm.X = mgc.distance,
  dist.params.X = list(method = "euclidean"),
  dist.return.X = NULL,
  is.dist.Y = FALSE,
  dist.xfm.Y = mgc.distance,
  dist.params.Y = list(method = "euclidean"),
  dist.return.Y = NULL,
  option = "mgc"
)
Arguments
| X | is interpreted as: 
 | 
| Y | is interpreted as: 
 | 
| is.dist.X | a boolean indicating whether your  | 
| dist.xfm.X | if  | 
| dist.params.X | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return.X | the return argument for the specified  
 | 
| is.dist.Y | a boolean indicating whether your  | 
| dist.xfm.Y | if  | 
| dist.params.Y | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return.Y | the return argument for the specified  
 | 
| option | is a string that specifies which global correlation to build up-on. Defaults to  
 | 
Value
A list contains the following:
| corr | consists of all local correlations within [-1,1] by double matrix index | 
| varX | contains all local variances for X. | 
| varY | contains all local variances for X. | 
Author(s)
C. Shen
Examples
library(mgc)
n=200; d=2
data <- mgc.sims.linear(n, d)
lcor <- mgc.localcorr(data$X, data$Y)
Driver for MGC Local Correlations
Description
Driver for MGC Local Correlations
Usage
mgc.localcorr.driver(DX, DY, option = "mgc")
Arguments
| DX | the first distance matrix. | 
| DY | the second distance matrix. | 
| option | is a string that specifies which global correlation to build up-on. Defaults to  
 | 
Value
A list contains the following:
| corr | consists of all local correlations within [-1,1] by double matrix index | 
| varX | contains all local variances for X. | 
| varY | contains all local variances for X. | 
Author(s)
C. Shen
Sample from Unit 2-Ball
Description
Sample from the 2-ball in d-dimensions.
Usage
mgc.sims.2ball(n, d, r = 1, cov.scale = 0)
Arguments
| n | the number of samples. | 
| d | the number of dimensions. | 
| r | the radius of the 2-ball. Defaults to  | 
| cov.scale | if desired, sample from 2-ball with error sigma. Defaults to  | 
Value
the points sampled from the ball, as a [n, d] array.
Author(s)
Eric Bridgeford
Examples
library(mgc)
# sample 100 points from 3-d 2-ball with radius 2
X <- mgc.sims.2ball(100, 3, 2)
Sample from Unit 2-Sphere
Description
Sample from the 2-sphere in d-dimensions.
Usage
mgc.sims.2sphere(n, d, r, cov.scale = 0)
Arguments
| n | the number of samples. | 
| d | the number of dimensions. | 
| r | the radius of the 2-ball. Defaults to  | 
| cov.scale | if desired, sample from 2-ball with error sigma. Defaults to  | 
Value
the points sampled from the sphere, as a [n, d] array.
Author(s)
Eric Bridgeford
Examples
library(mgc)
# sample 100 points from 3-d 2-sphere with radius 2
X <- mgc.sims.2sphere(100, 3, 2)
Cubic Simulation
Description
A function for Generating a cubic simulation.
Usage
mgc.sims.cubic(
  n,
  d,
  eps = 80,
  ind = FALSE,
  a = -1,
  b = 1,
  c.coef = c(-12, 48, 128),
  s = 1/3
)
Arguments
| n | the number of samples for the simulation. | 
| d | the number of dimensions for the simulation setting. | 
| eps | the noise level for the simulation. Defaults to  | 
| ind | whether to sample x and y independently. Defaults to  | 
| a | the lower limit for the range of the data matrix. Defaults to  | 
| b | the upper limit for the range  of the data matrix. Defaults to  | 
| c.coef | the coefficients for the cubic function, where the first value is the first order coefficient, the second value the quadratic coefficient, and the third the cubic coefficient. Defaults to  | 
| s | the scaling for the center of the cubic. Defaults to  | 
Value
a list containing the following:
| X | 
 | 
| Y | 
 | 
Details
Given: w_i = \frac{1}{i} is a weight-vector that scales with the dimensionality.
Simulates n points from Linear(X, Y) \in  \mathbf{R}^d \times \mathbf{R}, where:
X \sim {U}(a, b)^d
Y = c_3\left(w^TX - s\right)^3 + c_2\left(w^TX - s\right)^2 + c_1\left(w^TX - s\right) + \kappa \epsilon
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise} controls the noise for higher dimensions.
Author(s)
Eric Bridgeford
Examples
library(mgc)
result  <- mgc.sims.cubic(n=100, d=10)  # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Exponential Simulation
Description
A function for Generating an exponential simulation.
Usage
mgc.sims.exp(n, d, eps = 10, ind = FALSE, a = 0, b = 3)
Arguments
| n | the number of samples for the simulation. | 
| d | the number of dimensions for the simulation setting. | 
| eps | the noise level for the simulation. Defaults to  | 
| ind | whether to sample x and y independently. Defaults to  | 
| a | the lower limit for the range of the data matrix. Defaults to  | 
| b | the upper limit for the range  of the data matrix. Defaults to  | 
Value
a list containing the following:
| X | 
 | 
| Y | 
 | 
Details
Given: w_i = \frac{1}{i} is a weight-vector that scales with the dimensionality.
Simulates n points from Linear(X, Y) \in  \mathbf{R}^d \times \mathbf{R}, where:
X \sim {U}(a, b)^d
Y = e^{w^TX} + \kappa \epsilon
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise} controls the noise for higher dimensions.
Author(s)
Eric Bridgeford
Examples
library(mgc)
result  <- mgc.sims.exp(n=100, d=10)  # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Joint Normal Simulation
Description
A function for Generating a joint-normal simulation.
Usage
mgc.sims.joint(n, d, eps = 0.5)
Arguments
| n | the number of samples for the simulation. | 
| d | the number of dimensions for the simulation setting. | 
| eps | the noise level for the simulation. Defaults to  | 
Value
a list containing the following:
| X | 
 | 
| Y | 
 | 
Details
Given: \rho = \frac{1}{2}d, I_d is the identity matrix of size d \times d, J_d is the matrix of ones of size d \times d.
Simulates n points from Joint-Normal(X, Y) \in  \mathbf{R}^d \times \mathbf{R}^d, where:
(X, Y) \sim {N}(0, \Sigma)
,
\Sigma = \left[I_d, \rho J_d; \rho J_d , (1 + \epsilon\kappa)I_d\right]
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise} controls the noise for higher dimensions.
For more details see the help vignette:
vignette("sims", package = "mgc")
Author(s)
Eric Bridgeford
Examples
library(mgc)
result  <- mgc.sims.joint(n=100, d=10)  # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Linear Simulation
Description
A function for Generating a linear simulation.
Usage
mgc.sims.linear(n, d, eps = 1, ind = FALSE, a = -1, b = 1)
Arguments
| n | the number of samples for the simulation. | 
| d | the number of dimensions for the simulation setting. | 
| eps | the noise level for the simulation. Defaults to  | 
| ind | whether to sample x and y independently. Defaults to  | 
| a | the lower limit for the range of the data matrix. Defaults to  | 
| b | the upper limit for the range  of the data matrix. Defaults to  | 
Value
a list containing the following:
| X | 
 | 
| Y | 
 | 
Details
Given: w_i = \frac{1}{i} is a weight-vector that scales with the dimensionality.
Simulates n points from Linear(X, Y) \in  \mathbf{R}^d \times \mathbf{R}, where:
X \sim {U}(a, b)^d
Y = w^TX + \kappa \epsilon
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise} controls the noise for higher dimensions.
Author(s)
Eric Bridgeford
Examples
library(mgc)
result  <- mgc.sims.linear(n=100, d=10)  # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Quadratic Simulation
Description
A function for Generating a quadratic simulation.
Usage
mgc.sims.quad(n, d, eps = 0.5, ind = FALSE, a = -1, b = 1)
Arguments
| n | the number of samples for the simulation. | 
| d | the number of dimensions for the simulation setting. | 
| eps | the noise level for the simulation. Defaults to  | 
| ind | whether to sample x and y independently. Defaults to  | 
| a | the lower limit for the data matrix. Defaults to  | 
| b | the upper limit for the data matrix. Defaults to  | 
Value
a list containing the following:
| X | 
 | 
| Y | 
 | 
Details
Given: w_i = \frac{1}{i} is a weight-vector that scales with the dimensionality.
Simulates n points from Quadratic(X, Y) \in \mathbf{R}^d \times \mathbf{R} where:
X \sim {U}(a, b)^d
,
Y = (w^TX)^2 + \kappa\epsilon N(0, 1)
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise} controls the noise for higher dimensions.
For more details see the help vignette:
vignette("sims", package = "mgc")
Author(s)
Eric Bridgeford
Examples
library(mgc)
result  <- mgc.sims.quad(n=100, d=10)  # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Random Rotation
Description
A helper function for applying a random rotation to gaussian parameter set.
Usage
mgc.sims.random_rotate(mus, Sigmas, Q = NULL)
Arguments
| mus | means per class. | 
| Sigmas | covariances per class. | 
| Q | rotation to use, if any | 
Author(s)
Eric Bridgeford
Sample Random Rotation
Description
A helper function for estimating a random rotation matrix.
Usage
mgc.sims.rotation(d)
Arguments
| d | dimensions to generate a rotation matrix for. | 
Value
the rotation matrix
Author(s)
Eric Bridgeford
GMM Simulate
Description
A helper function for simulating from Gaussian Mixture.
Usage
mgc.sims.sim_gmm(mus, Sigmas, n, priors)
Arguments
| mus | 
 | 
| Sigmas | 
 | 
| n | the number of examples. | 
| priors | 
 | 
Value
A list with the following:
| X | 
 | 
| Y | 
 | 
| priors | 
 | 
Author(s)
Eric Bridgeford
Spiral Simulation
Description
A function for Generating a spiral simulation.
Usage
mgc.sims.spiral(n, d, eps = 0.4, a = 0, b = 5)
Arguments
| n | the number of samples for the simulation. | 
| d | the number of dimensions for the simulation setting. | 
| eps | the noise level for the simulation. Defaults to  | 
| a | the lower limit for the data matrix. Defaults  | 
| b | the upper limit for the data matrix. Defaults to  | 
Value
a list containing the following:
| X | 
 | 
| Y | 
 | 
Details
Given: U \sim U(a, b) a random variable.
Simumlates n points from Spiral(X, Y) \in \mathbf{R}^d \times \mathbf{R} where:
X_i = U\, \textrm{cos}(\pi\, U)^d if i = d, and U\, \textrm{sin}(\pi U)\textrm{cos}^i(\pi U) otherwise
Y = U\, \textrm{sin}(\pi\, U) + \epsilon p N(0, 1)
For more details see the help vignette:
vignette("sims", package = "mgc")
Author(s)
Eric Bridgeford
Examples
library(mgc)
result  <- mgc.sims.spiral(n=100, d=10)  # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Step Function Simulation
Description
A function for Generating a step function simulation.
Usage
mgc.sims.step(n, d, eps = 1, ind = FALSE, a = -1, b = 1)
Arguments
| n | the number of samples for the simulation. | 
| d | the number of dimensions for the simulation setting. | 
| eps | the noise level for the simulation. Defaults to  | 
| ind | whether to sample x and y independently. Defaults to  | 
| a | the lower limit for the data matrix. Defaults to  | 
| b | the upper limit for the data matrix. Defaults to  | 
Value
a list containing the following:
| X | 
 | 
| Y | 
 | 
Details
Given: w_i = \frac{1}{i} is a weight-vector that scales with the dimensionality.
Simulates n points from Step(X, Y) \in \mathbf{R}^d\times \mathbf{R} where:
X \sim {U}\left(a, b\right)^d
,
Y = \mathbf{I}\left\{w^TX > 0\right\} + \kappa \epsilon N(0, 1)
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise} controls the noise for higher dimensions.
For more details see the help vignette:
vignette("sims", package = "mgc")
Author(s)
Eric Bridgeford
Examples
library(mgc)
result  <- mgc.sims.step(n=100, d=10)  # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Uncorrelated Bernoulli Simulation
Description
A function for Generating an uncorrelated bernoulli simulation.
Usage
mgc.sims.ubern(n, d, eps = 0.5, p = 0.5)
Arguments
| n | the number of samples for the simulation. | 
| d | the number of dimensions for the simulation setting. | 
| eps | the noise level for the simulation. Defaults to  | 
| p | the bernoulli probability. | 
Value
a list containing the following:
| X | 
 | 
| Y | 
 | 
Details
Given: w_i = \frac{1}{i} is a weight-vector that scales with the dimensionality.
Simumlates n points from Wshape(X, Y) \in \mathbf{R}^d \times \mathbf{R} where:
U \sim Bern(p)
X \sim Bern\left(p\right)^d + \epsilon N(0, I_d)
Y = (2U - 1)w^TX + \epsilon N(0, 1)
For more details see the help vignette:
vignette("sims", package = "mgc")
Author(s)
Eric Bridgeford
Examples
library(mgc)
result  <- mgc.sims.ubern(n=100, d=10)  # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
W Shaped Simulation
Description
A function for Generating a W-shaped simulation.
Usage
mgc.sims.wshape(n, d, eps = 0.5, ind = FALSE, a = -1, b = 1)
Arguments
| n | the number of samples for the simulation. | 
| d | the number of dimensions for the simulation setting. | 
| eps | the noise level for the simulation. Defaults to  | 
| ind | whether to sample x and y independently. Defaults to  | 
| a | the lower limit for the data matrix. Defaults  | 
| b | the upper limit for the data matrix. Defaults to  | 
Value
a list containing the following:
| X | 
 | 
| Y | 
 | 
Details
Given: w_i = \frac{1}{i} is a weight-vector that scales with the dimensionality.
Simumlates n points from W-shape(X, Y) \in \mathbf{R}^d \times \mathbf{R} where:
U \sim {U}(a, b)^d
,
X \sim {U}(a, b)^d
,
Y = \left[\left((w^TX)^2 - \frac{1}{2}\right)^2 + \frac{w^TU}{500}\right] + \kappa \epsilon N(0, 1)
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise} controls the noise for higher dimensions.
For more details see the help vignette:
vignette("sims", package = "mgc")
Author(s)
Eric Bridgeford
Examples
library(mgc)
result  <- mgc.sims.wshape(n=100, d=10)  # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
MGC Test
Description
The main function that computes the MGC measure between two datasets: It first computes all local correlations, then use the maximal statistic among all local correlations based on thresholding.
Usage
mgc.stat(
  X,
  Y,
  is.dist.X = FALSE,
  dist.xfm.X = mgc.distance,
  dist.params.X = list(method = "euclidean"),
  dist.return.X = NULL,
  is.dist.Y = FALSE,
  dist.xfm.Y = mgc.distance,
  dist.params.Y = list(method = "euclidean"),
  dist.return.Y = NULL,
  option = "mgc"
)
Arguments
| X | is interpreted as: 
 | 
| Y | is interpreted as: 
 | 
| is.dist.X | a boolean indicating whether your  | 
| dist.xfm.X | if  | 
| dist.params.X | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return.X | the return argument for the specified  
 | 
| is.dist.Y | a boolean indicating whether your  | 
| dist.xfm.Y | if  | 
| dist.params.Y | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return.Y | the return argument for the specified  
 | 
| option | is a string that specifies which global correlation to build up-on. Defaults to  
 | 
Value
A list containing the following:
| stat | is the sample MGC statistic within  | 
| localCorr | the local correlations | 
| optimalScale | the optimal scale identified by MGC | 
| option | specifies which global correlation was used | 
Author(s)
C. Shen and Eric Bridgeford
References
Joshua T. Vogelstein, et al. "Discovering and deciphering relationships across disparate data modalities." eLife (2019).
Examples
library(mgc)
n=200; d=2
data <- mgc.sims.linear(n, d)
mgc.stat.res <- mgc.stat(data$X, data$Y)
MGC Sample Statistic Internal Driver
Description
MGC Sample Statistic Internal Driver
Usage
mgc.stat.driver(DX, DY, option = "mgc")
Arguments
| DX | the first distance matrix. | 
| DY | the second distance matrix. | 
| option | is a string that specifies which global correlation to build up-on. Defaults to  
 | 
MGC Permutation Test
Description
Test of Dependence using MGC Approach.
Usage
mgc.test(
  X,
  Y,
  is.dist.X = FALSE,
  dist.xfm.X = mgc.distance,
  dist.params.X = list(method = "euclidean"),
  dist.return.X = NULL,
  is.dist.Y = FALSE,
  dist.xfm.Y = mgc.distance,
  dist.params.Y = list(method = "euclidean"),
  dist.return.Y = NULL,
  nperm = 1000,
  option = "mgc",
  no_cores = 1
)
Arguments
| X | is interpreted as: 
 | 
| Y | is interpreted as: 
 | 
| is.dist.X | a boolean indicating whether your  | 
| dist.xfm.X | if  | 
| dist.params.X | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return.X | the return argument for the specified  
 | 
| is.dist.Y | a boolean indicating whether your  | 
| dist.xfm.Y | if  | 
| dist.params.Y | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return.Y | the return argument for the specified  
 | 
| nperm | specifies the number of replicates to use for the permutation test. Defaults to  | 
| option | is a string that specifies which global correlation to build up-on. Defaults to  
 | 
| no_cores | the number of cores to use for the permutations. Defaults to  | 
Value
A list containing the following:
| p.value | P-value of MGC | 
| stat | is the sample MGC statistic within  | 
| p.localCorr | P-value of the local correlations by double matrix index. | 
| localCorr | the local correlations | 
| optimalScale | the optimal scale identified by MGC | 
| option | specifies which global correlation was used | 
Details
A test of independence using the MGC approach, described in Vogelstein et al. (2019). For X \sim F_X, Y \sim F_Y:
H_0: F_X \neq F_Y
and:
H_A: F_X = F_Y
Note that one should avoid report positive discovery via minimizing individual p-values of local correlations, unless corrected for multiple hypotheses.
For details on usage see the help vignette:
vignette("mgc", package = "mgc")
Author(s)
Eric Bridgeford and C. Shen
References
Joshua T. Vogelstein, et al. "Discovering and deciphering relationships across disparate data modalities." eLife (2019).
Examples
## Not run: 
library(mgc)
n = 100; d = 2
data <- mgc.sims.linear(n, d)
# note: on real data, one would put nperm much higher (at least 100)
# nperm is set to 10 merely for demonstration purposes
result <- mgc.test(data$X, data$Y, nperm=10)
## End(Not run)
MGC Utility Validator
Description
A script that validates that data inputs are correct, and returns a X distance and Y distance matrix for MGC.
Usage
mgc.validator(
  X,
  Y,
  is.dist.X = FALSE,
  dist.xfm.X = mgc.distance,
  dist.params.X = list(method = "euclidean"),
  dist.return.X = NULL,
  is.dist.Y = FALSE,
  dist.xfm.Y = mgc.distance,
  dist.params.Y = list(method = "euclidean"),
  dist.return.Y = NULL
)
Arguments
| X | is interpreted as: 
 | 
| Y | 
 | 
| is.dist.X | a boolean indicating whether your  | 
| dist.xfm.X | if  | 
| dist.params.X | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return.X | the return argument for the specified  
 | 
| is.dist.Y | a boolean indicating whether your  | 
| dist.xfm.Y | if  | 
| dist.params.Y | a list of trailing arguments to pass to the distance function specified in  | 
| dist.return.Y | the return argument for the specified  
 | 
Value
A list containing the following:
| D | The distance matrix, as a  | 
| Y | the sample ids, as a  | 
Remove Isolates
Description
A function to remove isolates from a dataset, given a data matrix or a distance matrix.
Usage
remove.isolates(X, Y, is.dist = FALSE)
Arguments
| X | is interpreted as: 
 | 
| Y | 
 | 
| is.dist | a boolean indicating whether your  | 
Author(s)
Eric Bridgeford