Type: | Package |
Title: | Uniformity Tests on the Circle, Sphere, and Hypersphere |
Version: | 1.4.0 |
Date: | 2024-05-21 |
Description: | Implementation of uniformity tests on the circle and (hyper)sphere. The main function of the package is unif_test(), which conveniently collects more than 35 tests for assessing uniformity on S^{p-1} = {x in R^p : ||x|| = 1}, p >= 2. The test statistics are implemented in the unif_stat() function, which allows computing several statistics for different samples within a single call, thus facilitating Monte Carlo experiments. Furthermore, the unif_stat_MC() function allows parallelizing them in a simple way. The asymptotic null distributions of the statistics are available through the function unif_stat_distr(). The core of 'sphunif' is coded in C++ by relying on the 'Rcpp' package. The package also provides several novel datasets and gives the replicability for the data applications/simulations in García-Portugués et al. (2021) <doi:10.1007/978-3-030-69944-4_12>, García-Portugués et al. (2023) <doi:10.3150/21-BEJ1454>, García-Portugués et al. (2024) <doi:10.48550/arXiv.2108.09874>, and Fernández-de-Marcos and García-Portugués (2024) <doi:10.48550/arXiv.2405.13531>. |
License: | GPL-3 |
LazyData: | true |
Depends: | R (≥ 3.5.0), Rcpp |
Imports: | doFuture, doRNG, foreach, future, gsl, rotasym |
Suggests: | CompQuadForm, goftest, knitr, markdown, mvtnorm, numDeriv, progress, progressr, rmarkdown, scatterplot3d, testthat, viridisLite |
LinkingTo: | Rcpp, RcppArmadillo |
URL: | https://github.com/egarpor/sphunif |
BugReports: | https://github.com/egarpor/sphunif |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | yes |
Packaged: | 2024-05-24 11:24:25 UTC; Eduardo |
Author: | Eduardo García-Portugués
|
Maintainer: | Eduardo García-Portugués <edgarcia@est-econ.uc3m.es> |
Repository: | CRAN |
Date/Publication: | 2024-05-24 21:50:01 UTC |
sphunif
: Uniformity Tests on the Circle, Sphere, and
Hypersphere
Description
Implementation of uniformity tests on the circle and
(hyper)sphere. The main function of the package is unif_test
,
which conveniently collects more than 35 tests for assessing uniformity on
S^{p-1}=\{{\bf x}\in R^p:||{\bf x}||=1\}
, p\ge 2
. The test statistics are
implemented in the unif_stat
function, which allows computing
several statistics for different samples within a single call, thus
facilitating Monte Carlo experiments. Furthermore, the
unif_stat_MC
function allows parallelizing them in
a simple way. The asymptotic null distributions of the statistics are
available through the function unif_stat_distr
. The core of
sphunif-package
is coded in C++ by relying on the
Rcpp-package
. The package also provides several
novel datasets and gives the replicability for the data applications/
simulations in García-Portugués et al. (2021)
<doi:10.1007/978-3-030-69944-4_12>, García-Portugués et al. (2023)
<doi:10.3150/21-BEJ1454>, García-Portugués et al. (2024)
<doi:10.48550/arXiv.2108.09874>, and Fernández-de-Marcos and
García-Portugués (2024) <doi:10.48550/arXiv.405.13531>.
Author(s)
Eduardo García-Portugués and Thomas Verdebout.
References
Fernández-de-Marcos, A. and García-Portugués, E. (2024) A stereographic test of spherical uniformity. arXiv:2405.13531. doi:10.48550/arXiv.2405.13531.
García-Portugués, E. and Verdebout, T. (2018) An overview of uniformity tests on the hypersphere. arXiv:1804.00286. doi:10.48550/arXiv.1804.00286.
García-Portugués, E., Navarro-Esteban, P., Cuesta-Albertos, J. A. (2023) On a projection-based class of uniformity tests on the hypersphere. Bernoulli, 29(1):181–204. doi:10.3150/21-BEJ1454.
García-Portugués, E., Navarro-Esteban, P., and Cuesta-Albertos, J. A. (2021). A Cramér–von Mises test of uniformity on the hypersphere. In Balzano, S., Porzio, G. C., Salvatore, R., Vistocco, D., and Vichi, M. (Eds.), Statistical Learning and Modeling in Data Analysis, Studies in Classification, Data Analysis and Knowledge Organization, pp. 107–116. Springer, Cham. doi:10.1007/978-3-030-69944-4_12.
García-Portugués, E., Paindaveine, D., and Verdebout, T. (2024). On a class of Sobolev tests for symmetry of directions, their detection thresholds, and asymptotic powers. arXiv:2108.09874v2. doi:10.48550/arXiv.2108.09874.
Surface area of the intersection of two hyperspherical caps
Description
Computation of
A_x(\theta_{ij}) := \frac{1}{\omega_p}
\int_{S^{p - 1}} 1_{\{{\bf X}_i'\boldsymbol\gamma \le x,
{\bf X}_j'\boldsymbol\gamma \le x\}}\,\mathrm{d}\boldsymbol\gamma,
where \theta_{ij} := \cos^{-1}({\bf X}_i'{\bf X}_j)
\in [0, \pi]
,
x \in [-1, 1]
, and \omega_{p}
is the surface area of
S^{p - 1}
. A_x(\theta_{ij})
is the proportion of surface area
of S^{p - 1}
covered by the intersection of two hyperspherical caps
centered at {\bf X}_i
and {\bf X}_j
and with
common solid angle \pi - \cos^{-1}(x)
.
Usage
A_theta_x(theta, x, p, N = 160L, as_matrix = TRUE)
Arguments
theta |
vector with values in |
x |
vector with values in |
p |
integer giving the dimension of the ambient space |
N |
number of points used in the
Gauss-Legendre quadrature. Defaults to
|
as_matrix |
return a matrix with the values of |
Details
See García-Portugués et al. (2023) for more details about the
A_x(\theta)
function.
Value
A matrix of size c(length(theta), length(x))
containing the
evaluation of A_x(\theta)
if as_matrix = TRUE
. Otherwise,
a vector of size c(length(theta)
if theta
and x
equal
in size.
References
García-Portugués, E., Navarro-Esteban, P., Cuesta-Albertos, J. A. (2023) On a projection-based class of uniformity tests on the hypersphere. Bernoulli, 29(1):181–204. doi:10.3150/21-BEJ1454.
Examples
# Plot A_x(theta) for several dimensions and x's
A_lines <- function(x, th = seq(0, pi, l = 200)) {
plot(th, A_theta_x(theta = th, x = x, p = 2), type = "l",
col = 1, ylim = c(0, 1.25), main = paste("x =", x),
ylab = expression(A[x](theta)),
xlab = expression(theta), axes = FALSE)
axis(1, at = c(0, pi / 4, pi / 2, 3 * pi / 4, pi),
labels = expression(0, pi / 4, pi / 2, 3 * pi / 4, pi))
axis(2); box()
abline(h = c(0, 1), lty = 2)
lines(th, A_theta_x(theta = th, x = x, p = 3), col = 2)
lines(th, A_theta_x(theta = th, x = x, p = 4), col = 3)
lines(th, A_theta_x(theta = th, x = x, p = 5), col = 4)
legend("top", lwd = 2, legend = paste("p =", 2:5),
col = 1:4, cex = 0.75, horiz = TRUE)
}
old_par <- par(mfrow = c(2, 3))
A_lines(x = -0.75)
A_lines(x = -0.25)
A_lines(x = 0)
A_lines(x = 0.25)
A_lines(x = 0.5)
A_lines(x = 0.75)
par(old_par)
# As surface of (theta, x) for several dimensions
A_surf <- function(p, x = seq(-1, 1, l = 201), th = seq(0, pi, l = 201)) {
col <- c("white", viridisLite::viridis(20))
breaks <- c(-1, seq(1e-15, 1, l = 21))
A <- A_theta_x(theta = th, x = x, p = p)
image(th, x, A, main = paste("p =", p), col = col, breaks = breaks,
xlab = expression(theta), axes = FALSE)
axis(1, at = c(0, pi / 4, pi / 2, 3 * pi / 4, pi),
labels = expression(0, pi / 4, pi / 2, 3 * pi / 4, pi))
axis(2); box()
contour(th, x, A, levels = breaks, add = TRUE)
}
old_par <- par(mfrow = c(2, 2))
A_surf(p = 2)
A_surf(p = 3)
A_surf(p = 4)
A_surf(p = 5)
par(old_par)
# No matrix return
th <- seq(0, pi, l = 5)
x <- seq(-1, 1, l = 5)
diag(A_theta_x(theta = th, x = x, p = 2))
A_theta_x(theta = th, x = x, p = 2, as_matrix = FALSE)
Distribution and quantile functions from angular function
Description
Numerical computation of the distribution function F
and
the quantile function F^{-1}
for an angular function
f
in a tangent-normal decomposition.
F^{-1}(x)
results from the inversion of
F(x) = \int_{-1}^x \omega_{p - 1}c_f f(z) (1 - z^2)^{(p - 3) / 2}
\,\mathrm{d}z
for x\in [-1, 1]
, where c_f
is a normalizing constant and
\omega_{p - 1}
is the surface area of S^{p - 2}
.
Usage
F_from_f(f, p, Gauss = TRUE, N = 320, K = 1000, tol = 1e-06, ...)
F_inv_from_f(f, p, Gauss = TRUE, N = 320, K = 1000, tol = 1e-06, ...)
Arguments
f |
angular function defined on |
p |
integer giving the dimension of the ambient space |
Gauss |
use a Gauss–Legendre quadrature
rule to integrate |
N |
number of points used in the Gauss–Legendre quadrature. Defaults
to |
K |
number of equispaced points on |
tol |
tolerance passed to |
... |
further parameters passed to |
Details
The normalizing constant c_f
is such that F(1) = 1
. It does not
need to be part of f
as it is computed internally.
Interpolation is performed by a monotone cubic spline. Gauss = TRUE
yields more accurate results, at expenses of a heavier computation.
If f
yields negative values, these are silently truncated to zero.
Value
A splinefun
object ready to evaluate F
or
F^{-1}
, as specified.
Examples
f <- function(x) rep(1, length(x))
plot(F_from_f(f = f, p = 4, Gauss = TRUE), ylab = "F(x)", xlim = c(-1, 1))
plot(F_from_f(f = f, p = 4, Gauss = FALSE), col = 2, add = TRUE,
xlim = c(-1, 1))
curve(p_proj_unif(x = x, p = 4), col = 3, add = TRUE, n = 300)
plot(F_inv_from_f(f = f, p = 4, Gauss = TRUE), ylab = "F^{-1}(x)")
plot(F_inv_from_f(f = f, p = 4, Gauss = FALSE), col = 2, add = TRUE)
curve(q_proj_unif(u = x, p = 4), col = 3, add = TRUE, n = 300)
Gauss–Legendre quadrature
Description
Convenience for computing the nodes x_k
and weights
w_k
of the Gauss–Legendre quadrature formula
in (a, b)
:
\int_a^b f(x) w(x)\,\mathrm{d}x\approx\sum_{k=1}^N w_k f(x_k).
.
Usage
Gauss_Legen_nodes(a = -1, b = 1, N = 40L)
Gauss_Legen_weights(a = -1, b = 1, N = 40L)
Arguments
a , b |
scalars giving the interval |
N |
number of points used in the Gauss–Legendre quadrature. The
following choices are supported: |
Details
For C^\infty
functions, Gauss–Legendre quadrature
can be very efficient. It is exact for polynomials up to degree
2N - 1
.
The nodes and weights up to N = 80
were retrieved from
NIST and have 10^{-21}
precision.
For N = 160
onwards, the nodes and weights were computed with the
gauss.quad
function from the
statmod
package (Smyth, 1998), and have 10^{-15}
precision.
Value
A matrix of size c(N, 1)
with the nodes x_k
(Gauss_Legen_nodes
) or the corresponding weights w_k
(Gauss_Legen_weights
).
References
NIST Digital Library of Mathematical Functions. Release 1.0.20 of 2018-09-15. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, and B. V. Saunders, eds. https://dlmf.nist.gov/
Smyth, G. K. (1998). Numerical integration. In: Encyclopedia of Biostatistics, P. Armitage and T. Colton (eds.), Wiley, London, pp. 3088-3095.
Examples
## Integration of a smooth function in (1, 10)
# Weights and nodes for integrating
x_k <- Gauss_Legen_nodes(a = 1, b = 10, N = 40)
w_k <- Gauss_Legen_weights(a = 1, b = 10, N = 40)
# Check quadrature
f <- function(x) sin(x) * x^2 - log(x + 1)
integrate(f, lower = 1, upper = 10, rel.tol = 1e-12)
sum(w_k * f(x_k))
# Exact for polynomials up to degree 2 * N - 1
f <- function(x) (((x + 0.5) / 1e3)^5 - ((x - 0.5)/ 5)^4 +
((x - 0.25) / 10)^2 + 1)^20
sum(w_k * f(x_k))
integrate(f, lower = -1, upper = 1, rel.tol = 1e-12)
## Integration on (0, pi)
# Weights and nodes for integrating
th_k <- Gauss_Legen_nodes(a = 0, b = pi, N = 40)
w_k <- Gauss_Legen_weights(a = 0, b = pi, N = 40)
# Check quadrature
p <- 4
psi <- function(th) -sin(th / 2)
w <- function(th) sin(th)^(p - 2)
integrate(function(th) psi(th) * w(th), lower = 0, upper = pi,
rel.tol = 1e-12)
sum(w_k * psi(th_k) * w(th_k))
# Integral with Gegenbauer polynomial
k <- 3
C_k <- function(th) drop(Gegen_polyn(theta = th, k = k, p = p))
integrate(function(th) psi(th) * C_k(th) * w(th), lower = 0, upper = pi,
rel.tol = 1e-12)
th_k <- drop(Gauss_Legen_nodes(a = 0, b = pi, N = 80))
w_k <- drop(Gauss_Legen_weights(a = 0, b = pi, N = 80))
sum(w_k * psi(th_k) * C_k(th_k) * w(th_k))
Gegenbauer polynomials and coefficients
Description
The Gegenbauer polynomials
\{C_k^{(\lambda)}(x)\}_{k = 0}^\infty
form a family of orthogonal polynomials on the interval [-1, 1]
with respect to the weight function (1 - x^2)^{\lambda - 1/2}
,
for \lambda > -1/2
, \lambda \neq 0
. They usually appear
when dealing with functions defined on
S^{p-1} := \{{\bf x} \in R^p : ||{\bf x}|| = 1\}
with index \lambda = p / 2 - 1
.
The Gegenbauer polynomials are somehow simpler to evaluate for
x = \cos(\theta)
, with \theta \in [0, \pi]
. This simplifies
also the connection with the Chebyshev polynomials
\{T_k(x)\}_{k = 0}^\infty
, which admit
the explicit expression
T_k(\cos(\theta)) = \cos(k\theta)
. The Chebyshev polynomials
appear as the limit of the Gegenbauer polynomials
(divided by \lambda
) when \lambda
goes to 0
, so they
can be regarded as the extension by continuity of
\{C_k^{(p/2 - 1)}(x)\}_{k = 0}^\infty
to the case p = 2
.
For a reasonably smooth function
\psi
defined on [0, \pi]
,
\psi(\theta) = \sum_{k = 0}^\infty b_{k, p}
C_k^{(p/2 - 1)}(\cos(\theta)),
provided that the coefficients
b_{k, p} := \frac{1}{c_{k, p}} \int_0^\pi \psi(\theta)
C_k^{(p/2 - 1)}(\cos(\theta)) (\sin(\theta))^{p - 2}\,\mathrm{d}\theta
are finite, where the normalizing constants are
c_{k, p} := \int_0^\pi (C_k^{(p/2 - 1)}(\cos(\theta)))^2
(\sin(\theta))^{p - 2} \,\mathrm{d}\theta.
The (squared) "Gegenbauer norm" of \psi
is
\|\psi\|_{G, p}^2 := \int_0^\pi \psi(\theta)^2
C_k^{(p/2 - 1)}(\cos(\theta)) (\sin(\theta))^{p - 2}\,\mathrm{d}\theta.
The previous expansion can be generalized for a 2-dimensional function
\psi
defined on [0, \pi] \times [0, \pi]
:
\psi(\theta_1, \theta_2) = \sum_{k = 0}^\infty \sum_{m = 0}^\infty
b_{k, m, p} C_k^{(p/2 - 1)}(\cos(\theta_1))
C_k^{(p/2 - 1)}(\cos(\theta_2)),
with coefficients
b_{k, m, p} := \frac{1}{c_{k, p} c_{m, p}} \int_0^\pi\int_0^\pi
\psi(\theta_1, \theta_2) C_k^{(p/2 - 1)}(\cos(\theta_1))
C_k^{(p/2 - 1)}(\cos(\theta_2)) (\sin(\theta_1))^{p - 2}
(\sin(\theta_2))^{p - 2}\,\mathrm{d}\theta_1\,\mathrm{d}\theta_2.
The (squared) "Gegenbauer norm" of \psi
is
\|\psi\|_{G, p}^2 := \int_0^\pi\int_0^\pi \psi(\theta_1, \theta_2)^2
C_k^{(p/2 - 1)}(\cos(\theta_1)) C_k^{(p/2 - 1)}(\cos(\theta_2))
(\sin(\theta_1))^{p - 2} (\sin(\theta_2))^{p - 2}
\,\mathrm{d}\theta_1\,\mathrm{d}\theta_2.
Usage
Gegen_polyn(theta, k, p)
Gegen_coefs(k, p, psi, Gauss = TRUE, N = 320, normalize = TRUE,
only_const = FALSE, tol = 1e-06, ...)
Gegen_series(theta, coefs, k, p, normalize = TRUE)
Gegen_norm(coefs, k, p, normalize = TRUE, cumulative = FALSE)
Gegen_polyn_2d(theta_1, theta_2, k, m, p)
Gegen_coefs_2d(k, m, p, psi, Gauss = TRUE, N = 320, normalize = TRUE,
only_const = FALSE, tol = 1e-06, ...)
Gegen_series_2d(theta_1, theta_2, coefs, k, m, p, normalize = TRUE)
Gegen_norm_2d(coefs, k, m, p, normalize = TRUE)
Arguments
theta , theta_1 , theta_2 |
vectors with values in |
k , m |
vectors with the orders of the Gegenbauer polynomials. Must
be integers larger or equal than |
p |
integer giving the dimension of the ambient space |
psi |
function defined in |
Gauss |
use a Gauss–Legendre quadrature rule of |
N |
number of points used in the
Gauss–Legendre quadrature for computing the Gegenbauer coefficients.
Defaults to |
normalize |
consider normalized coefficients (divided by
|
only_const |
return only the normalizing constants |
tol |
tolerance passed to |
... |
further arguments to be passed to |
coefs |
for |
cumulative |
return the cumulative norm for increasing truncation of
the series? Defaults to |
Details
The Gegen_polyn
function is a wrapper to the functions
gegenpoly_n
and
gegenpoly_array
in the
gsl-package
, which they interface the functions
defined in the header file gsl_sf_gegenbauer.h
(documented
here) of the
GNU Scientific Library.
Note that the function Gegen_polyn
computes the regular
unnormalized Gegenbauer polynomials.
For the case p = 2
, the Chebyshev polynomials are considered.
Value
-
Gegen_polyn
: a matrix of sizec(length(theta), length(k))
containing the evaluation of thelength(k)
Gegenbauer polynomials attheta
. -
Gegen_coefs
: a vector of sizelength(k)
containing the coefficientsb_{k, p}
. -
Gegen_series
: the evaluation of the truncated series expansion, a vector of sizelength(theta)
. -
Gegen_norm
: the Gegenbauer norm of the truncated series, a scalar ifcumulative = FALSE
, otherwise a vector of sizelength(k)
. -
Gegen_polyn_2d
: a 4-dimensional array of sizec(length(theta_1), length(theta_2), length(k), length(m))
containing the evaluation of thelength(k) * length(m)
2-dimensional Gegenbauer polynomials at the bivariate grid spanned bytheta_1
andtheta_2
. -
Gegen_coefs_2d
: a matrix of sizec(length(k), length(m))
containing the coefficientsb_{k, m, p}
. -
Gegen_series_2d
: the evaluation of the truncated series expansion, a matrix of sizec(length(theta_1), length(theta_2))
. -
Gegen_norm_2d
: the 2-dimensional Gegenbauer norm of the truncated series, a scalar.
References
Galassi, M., Davies, J., Theiler, J., Gough, B., Jungman, G., Alken, P., Booth, M., and Rossi, F. (2009) GNU Scientific Library Reference Manual. Network Theory Ltd. http://www.gnu.org/software/gsl/
NIST Digital Library of Mathematical Functions. Release 1.0.20 of 2018-09-15. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, and B. V. Saunders, eds. https://dlmf.nist.gov/
Examples
## Representation of Gegenbauer polynomials (Chebyshev polynomials for p = 2)
th <- seq(0, pi, l = 500)
k <- 0:3
old_par <- par(mfrow = c(2, 2))
for (p in 2:5) {
matplot(th, t(Gegen_polyn(theta = th, k = k, p = p)), lty = 1,
type = "l", main = substitute(p == d, list(d = p)),
axes = FALSE, xlab = expression(theta), ylab = "")
axis(1, at = c(0, pi / 4, pi / 2, 3 * pi / 4, pi),
labels = expression(0, pi / 4, pi / 2, 3 * pi / 4, pi))
axis(2); box()
mtext(text = expression({C[k]^{p/2 - 1}}(cos(theta))), side = 2,
line = 2, cex = 0.75)
legend("bottomleft", legend = paste("k =", k), lwd = 2, col = seq_along(k))
}
par(old_par)
## Coefficients and series in p = 2
# Function in [0, pi] to be projected in Chebyshev polynomials
psi <- function(th) -sin(th / 2)
# Coefficients
p <- 2
k <- 0:4
(coefs <- Gegen_coefs(k = k, p = p, psi = psi))
# Series
plot(th, psi(th), type = "l", axes = FALSE, xlab = expression(theta),
ylab = "", ylim = c(-1.25, 0))
axis(1, at = c(0, pi / 4, pi / 2, 3 * pi / 4, pi),
labels = expression(0, pi / 4, pi / 2, 3 * pi / 4, pi))
axis(2); box()
col <- viridisLite::viridis(length(coefs))
for (i in seq_along(coefs)) {
lines(th, Gegen_series(theta = th, coefs = coefs[1:(i + 1)], k = 0:i,
p = p), col = col[i])
}
lines(th, psi(th), lwd = 2)
## Coefficients and series in p = 3
# Function in [0, pi] to be projected in Gegenbauer polynomials
psi <- function(th) tan(th / 3)
# Coefficients
p <- 3
k <- 0:10
(coefs <- Gegen_coefs(k = k, p = p, psi = psi))
# Series
plot(th, psi(th), type = "l", axes = FALSE, xlab = expression(theta),
ylab = "", ylim = c(0, 2))
axis(1, at = c(0, pi / 4, pi / 2, 3 * pi / 4, pi),
labels = expression(0, pi / 4, pi / 2, 3 * pi / 4, pi))
axis(2); box()
col <- viridisLite::viridis(length(coefs))
for (i in seq_along(coefs)) {
lines(th, Gegen_series(theta = th, coefs = coefs[1:(i + 1)], k = 0:i,
p = p), col = col[i])
}
lines(th, psi(th), lwd = 2)
## Surface representation
# Surface in [0, pi]^2 to be projected in Gegenbauer polynomials
p <- 3
psi <- function(th_1, th_2) A_theta_x(theta = th_1, x = cos(th_2),
p = p, as_matrix = TRUE)
# Coefficients
k <- 0:20
m <- 0:10
coefs <- Gegen_coefs_2d(k = k, m = m, p = p, psi = psi)
# Series
th <- seq(0, pi, l = 100)
col <- viridisLite::viridis(20)
old_par <- par(mfrow = c(2, 2))
image(th, th, A_theta_x(theta = th, x = cos(th), p = p), axes = FALSE,
col = col, zlim = c(0, 1), xlab = expression(theta[1]),
ylab = expression(theta[2]), main = "Original")
axis(1, at = c(0, pi / 4, pi / 2, 3 * pi / 4, pi),
labels = expression(0, pi / 4, pi / 2, 3 * pi / 4, pi))
axis(2, at = c(0, pi / 4, pi / 2, 3 * pi / 4, pi),
labels = expression(0, pi / 4, pi / 2, 3 * pi / 4, pi))
box()
for(K in c(5, 10, 20)) {
A <- Gegen_series_2d(theta_1 = th, theta_2 = th,
coefs = coefs[1:(K + 1), ], k = 0:K, m = m, p = p)
image(th, th, A, axes = FALSE, col = col, zlim = c(0, 1),
xlab = expression(theta[1]), ylab = expression(theta[2]),
main = paste(K, "x", m[length(m)], "coefficients"))
axis(1, at = c(0, pi / 4, pi / 2, 3 * pi / 4, pi),
labels = expression(0, pi / 4, pi / 2, 3 * pi / 4, pi))
axis(2, at = c(0, pi / 4, pi / 2, 3 * pi / 4, pi),
labels = expression(0, pi / 4, pi / 2, 3 * pi / 4, pi))
box()
}
par(old_par)
Utilities for projected-ecdf statistics of spherical uniformity
Description
Computation of the kernels
\psi_p^W(\theta) := \int_{-1}^1 A_x(\theta)\,\mathrm{d}W(F_p(x)),
where A_x(\theta)
is the proportion of area surface of
S^{p - 1}
covered by the
intersection of two hyperspherical caps with common solid
angle \pi - \cos^{-1}(x)
and centers separated by
an angle \theta \in [0, \pi]
, F_p
is the distribution function
of the projected spherical uniform distribution,
and W
is a measure on [0, 1]
.
Also, computation of the Gegenbauer coefficients of
\psi_p^W
:
b_{k, p}^W := \frac{1}{c_{k, p}}\int_0^\pi \psi_p^W(\theta)
C_k^{p / 2 - 1}(\cos\theta)\,\mathrm{d}\theta.
These coefficients can also be computed via
b_{k, p}^W = \int_{-1}^1 a_{k, p}^x\,\mathrm{d}W(F_p(x))
for a certain function x\rightarrow a_{k, p}^x
. They serve to define
projected alternatives to uniformity.
Usage
psi_Pn(theta, q, type, Rothman_t = 1/3, tilde = FALSE, psi_Gauss = TRUE,
psi_N = 320, tol = 1e-06)
Gegen_coefs_Pn(k, p, type, Rothman_t = 1/3, Gauss = TRUE, N = 320,
tol = 1e-06, verbose = FALSE)
akx(x, p, k, sqr = FALSE)
f_locdev_Pn(p, type, K = 1000, N = 320, K_max = 10000, thre = 0.001,
Rothman_t = 1/3, verbose = FALSE)
Arguments
theta |
vector with values in |
q |
integer giving the dimension of the sphere |
type |
type of projected-ecdf test statistic. Must be either
|
Rothman_t |
|
tilde |
include the constant and bias term? Defaults to |
psi_Gauss |
use a Gauss–Legendre quadrature
rule with |
psi_N |
number of points used in the Gauss–Legendre quadrature for
computing the kernel function. Defaults to |
tol |
tolerance passed to |
k |
vector with the index of coefficients. |
p |
integer giving the dimension of the ambient space |
Gauss |
use a Gauss–Legendre quadrature rule of |
N |
number of points used in the
Gauss–Legendre quadrature for computing the Gegenbauer coefficients.
Defaults to |
verbose |
flag to print informative messages. Defaults to |
x |
evaluation points for |
sqr |
return the signed square root of |
K |
number of equispaced points on |
K_max |
integer giving the truncation of the series. Defaults to
|
thre |
proportion of norm not explained by the first terms of the
truncated series. Defaults to |
Details
The evaluation of \psi_p^W
and b_{k, p}^W
depends on the type of
projected-ecdf statistic:
PCvM: closed-form expressions for
\psi_p^W
andb_{k, p}^W
withp = 2, 3, 4
, numerical integration required forp \ge 5
.PAD: closed-form expressions for
\psi_2^W
andb_{k, 3}^W
, numerical integration required for\psi_p^W
withp \ge 3
andb_{k, p}^W
withp = 2
andp \ge 4
.PRt: closed-form expressions for
\psi_p^W
andb_{k, p}^W
for anyp \ge 2
.
See García-Portugués et al. (2023) for more details.
Value
-
psi_Pn
: a vector of sizelength(theta)
with the evaluation of\psi
. -
Gegen_coefs_Pn
: a vector of sizelength(k)
containing the coefficientsb_{k, p}^W
. -
akx
: a matrix of sizec(length(x), length(k))
containing the coefficientsa_{k, p}^x
. -
f_locdev_Pn
: the projected alternativef
as a function ready to be evaluated.
Author(s)
Eduardo García-Portugués and Paula Navarro-Esteban.
References
García-Portugués, E., Navarro-Esteban, P., Cuesta-Albertos, J. A. (2023) On a projection-based class of uniformity tests on the hypersphere. Bernoulli, 29(1):181–204. doi:10.3150/21-BEJ1454.
Examples
# Kernels in the projected-ecdf test statistics
k <- 0:10
coefs <- list()
(coefs$PCvM <- t(sapply(2:5, function(p)
Gegen_coefs_Pn(k = k, p = p, type = "PCvM"))))
(coefs$PAD <- t(sapply(2:5, function(p)
Gegen_coefs_Pn(k = k, p = p, type = "PAD"))))
(coefs$PRt <- t(sapply(2:5, function(p)
Gegen_coefs_Pn(k = k, p = p, type = "PRt"))))
# Gegenbauer expansion
th <- seq(0, pi, length.out = 501)[-501]
old_par <- par(mfrow = c(3, 4))
for (type in c("PCvM", "PAD", "PRt")) {
for (p in 2:5) {
plot(th, psi_Pn(theta = th, q = p - 1, type = type), type = "l",
main = paste0(type, ", p = ", p), xlab = expression(theta),
ylab = expression(psi(theta)), axes = FALSE, ylim = c(-1.5, 1))
axis(1, at = c(0, pi / 4, pi / 2, 3 * pi / 4, pi),
labels = expression(0, pi / 4, pi / 2, 3 * pi / 4, pi))
axis(2); box()
lines(th, Gegen_series(theta = th, coefs = coefs[[type]][p - 1, ],
k = k, p = p), col = 2)
}
}
par(old_par)
# Analytical coefficients vs. numerical integration
test_coef <- function(type, p, k = 0:20) {
plot(k, log1p(abs(Gegen_coefs_Pn(k = k, p = p, type = type))),
ylab = "Coefficients", main = paste0(type, ", p = ", p))
points(k, log1p(abs(Gegen_coefs(k = k, p = p, psi = psi_Pn, type = type,
q = p - 1))), col = 2)
legend("topright", legend = c("log(1 + Gegen_coefs_Pn))",
"log(1 + Gegen_coefs(psi_Pn))"),
lwd = 2, col = 1:2)
}
# PCvM statistic
old_par <- par(mfrow = c(2, 2))
for (p in 2:5) test_coef(type = "PCvM", p = p)
par(old_par)
# PAD statistic
old_par <- par(mfrow = c(2, 2))
for (p in 2:5) test_coef(type = "PAD", p = p)
par(old_par)
# PRt statistic
old_par <- par(mfrow = c(2, 2))
for (p in 2:5) test_coef(type = "PRt", p = p)
par(old_par)
# akx
akx(x = seq(-1, 1, l = 5), k = 1:4, p = 2)
akx(x = 0, k = 1:4, p = 3)
# PRt alternative to uniformity
z <- seq(-1, 1, l = 1e3)
p <- c(2:5, 10, 15, 17)
col <- viridisLite::viridis(length(p))
plot(z, f_locdev_Pn(p = p[1], type = "PRt")(z), type = "s",
col = col[1], ylim = c(0, 0.6), ylab = expression(f[Rt](z)))
for (k in 2:length(p)) {
lines(z, f_locdev_Pn(p = p[k], type = "PRt")(z), type = "s", col = col[k])
}
legend("topleft", legend = paste("p =", p), col = col, lwd = 2)
Shortest angles matrix
Description
Efficient computation of the shortest angles matrix
\boldsymbol\Psi
, defined as
\Psi_{ij}:=\cos^{-1}({\bf X}_i'{\bf X}_j),\quad
i,j=1,\ldots,n,
for a sample {\bf X}_1,\ldots,{\bf X}_n\in S^{p-1}:=\{{\bf x}\in
R^p:||{\bf x}||=1\}
, p\ge 2
.
For a circular sample \Theta_1, \ldots, \Theta_n \in [0, 2\pi)
,
\boldsymbol\Psi
can be expressed as
\Psi_{ij}=\pi-|\pi-|\Theta_i-\Theta_j||,\quad
i,j=1,\ldots,n.
Usage
Psi_mat(data, ind_tri = 0L, use_ind_tri = FALSE, scalar_prod = FALSE,
angles_diff = FALSE)
upper_tri_ind(n)
Arguments
data |
an array of size |
ind_tri |
if |
use_ind_tri |
use the already computed vector index |
scalar_prod |
return the scalar products
|
angles_diff |
return the (unwrapped) angles difference
|
n |
sample size, used to determine the index vector that gives the
upper triangular part of |
Value
-
Psi_mat
: a matrix of sizec(n * (n - 1) / 2, M)
containing, for each column, the vector half of\boldsymbol\Psi
for each of theM
samples. -
upper_tri_ind
: a matrix of sizen * (n - 1) / 2
containing the 0-based linear indexes for extracting the upper triangular matrix of a matrix of sizec(n, n)
, diagonal excluded, assuming column-major order.
Warning
Be careful on avoiding the next bad usages of Psi_mat
, which will
produce spurious results:
The directions in
data
do not have unit norm when Cartesian coordinates are employed.The entries of
data
are not in[0, 2\pi)
when polar coordinates are employed.-
ind_tri
is a vector of sizen * (n - 1) / 2
that does not contain the indexes produced byupper_tri_ind(n)
.
Examples
# Shortest angles
n <- 5
X <- r_unif_sph(n = n, p = 2, M = 2)
Theta <- X_to_Theta(X)
dim(Theta) <- c(n, 1, 2)
Psi_mat(X)
Psi_mat(Theta)
# Precompute ind_tri
ind_tri <- upper_tri_ind(n)
Psi_mat(X, ind_tri = ind_tri, use_ind_tri = TRUE)
# Compare with R
A <- acos(tcrossprod(X[, , 1]))
ind <- upper.tri(A)
A[ind]
# Reconstruct matrix
Psi_vec <- Psi_mat(Theta[, , 1, drop = FALSE])
Psi <- matrix(0, nrow = n, ncol = n)
Psi[upper.tri(Psi)] <- Psi_vec
Psi <- Psi + t(Psi)
Asymptotic distributions of Sobolev statistics of spherical uniformity
Description
Approximated density, distribution, and quantile functions for
the asymptotic null distributions of Sobolev statistics of uniformity
on S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}
. These asymptotic distributions are infinite
weighted sums of (central) chi squared random variables:
\sum_{k = 1}^\infty v_k^2 \chi^2_{d_{p, k}},
where
d_{p, k} := {{p + k - 3}\choose{p - 2}} + {{p + k - 2}\choose{p - 2}}
is the dimension of the space of eigenfunctions of the Laplacian on
S^{p-1}
, p\ge 2
, associated to the k
-th
eigenvalue, k\ge 1
.
Usage
d_p_k(p, k, log = FALSE)
weights_dfs_Sobolev(p, K_max = 1000, thre = 0.001, type, Rothman_t = 1/3,
Pycke_q = 0.5, Riesz_s = 1, Poisson_rho = 0.5, Softmax_kappa = 1,
Stereo_a = 0, Sobolev_vk2 = c(0, 0, 1), log = FALSE, verbose = TRUE,
Gauss = TRUE, N = 320, tol = 1e-06, force_positive = TRUE,
x_tail = NULL)
d_Sobolev(x, p, type, method = c("I", "SW", "HBE")[1], K_max = 1000,
thre = 0.001, Rothman_t = 1/3, Pycke_q = 0.5, Riesz_s = 1,
Poisson_rho = 0.5, Softmax_kappa = 1, Stereo_a = 0,
Sobolev_vk2 = c(0, 0, 1), ncps = 0, verbose = TRUE, N = 320,
x_tail = NULL, ...)
p_Sobolev(x, p, type, method = c("I", "SW", "HBE", "MC")[1], K_max = 1000,
thre = 0.001, Rothman_t = 1/3, Pycke_q = 0.5, Riesz_s = 1,
Poisson_rho = 0.5, Softmax_kappa = 1, Stereo_a = 0,
Sobolev_vk2 = c(0, 0, 1), ncps = 0, verbose = TRUE, N = 320,
x_tail = NULL, ...)
q_Sobolev(u, p, type, method = c("I", "SW", "HBE", "MC")[1], K_max = 1000,
thre = 0.001, Rothman_t = 1/3, Pycke_q = 0.5, Riesz_s = 1,
Poisson_rho = 0.5, Softmax_kappa = 1, Stereo_a = 0,
Sobolev_vk2 = c(0, 0, 1), ncps = 0, verbose = TRUE, N = 320,
x_tail = NULL, ...)
Arguments
p |
integer giving the dimension of the ambient space |
k |
sequence of integer indexes. |
log |
compute the logarithm of |
K_max |
integer giving the truncation of the series that compute the
asymptotic p-value of a Sobolev test. Defaults to |
thre |
error threshold for the tail probability given by the
the first terms of the truncated series of a Sobolev test. Defaults to
|
type |
name of the Sobolev statistic, using the naming from
|
Rothman_t |
|
Pycke_q |
|
Riesz_s |
|
Poisson_rho |
|
Softmax_kappa |
|
Stereo_a |
|
Sobolev_vk2 |
weights for the finite Sobolev test. A non-negative
vector or matrix. Defaults to |
verbose |
output information about the truncation? Defaults to
|
Gauss |
use a Gauss–Legendre quadrature rule of |
N |
number of points used in the
Gauss–Legendre quadrature for computing the Gegenbauer coefficients.
Defaults to |
tol |
tolerance passed to |
force_positive |
set negative weights to zero? Defaults to |
x_tail |
scalar evaluation point for determining the upper tail
probability. If |
x |
vector of quantiles. |
method |
method for approximating the density, distribution, or
quantile function of the weighted sum of chi squared random variables. Must
be |
ncps |
non-centrality parameters. Either |
... |
further parameters passed to |
u |
vector of probabilities. |
Details
The truncation of \sum_{k = 1}^\infty v_k^2 \chi^2_{d_{p, k}}
is
done to the first K_max
terms and then up to the index such that
the first terms explain the tail probability at the x_tail
with
an absolute error smaller than thre
(see details in
cutoff_wschisq
). This automatic truncation takes place when
calling *_Sobolev
. Setting thre = 0
truncates to K_max
terms exactly. If the series only contains odd or even non-zero terms, then
only K_max / 2
addends are effectively taken into account
in the first truncation.
Value
-
d_p_k
: a vector of sizelength(k)
with the evaluation ofd_{p,k}
. -
weights_dfs_Sobolev
: a list with entriesweights
anddfs
, automatically truncated according toK_max
andthre
(see details). -
d_Sobolev
: density function evaluated atx
, a vector. -
p_Sobolev
: distribution function evaluated atx
, a vector. -
q_Sobolev
: quantile function evaluated atu
, a vector.
Author(s)
Eduardo García-Portugués and Paula Navarro-Esteban.
Examples
# Circular-specific statistics
curve(p_Sobolev(x = x, p = 2, type = "Watson", method = "HBE"),
n = 2e2, ylab = "Distribution", main = "Watson")
curve(p_Sobolev(x = x, p = 2, type = "Rothman", method = "HBE"),
n = 2e2, ylab = "Distribution", main = "Rothman")
curve(p_Sobolev(x = x, p = 2, type = "Pycke_q", method = "HBE"), to = 10,
n = 2e2, ylab = "Distribution", main = "Pycke_q")
curve(p_Sobolev(x = x, p = 2, type = "Hermans_Rasson", method = "HBE"),
to = 10, n = 2e2, ylab = "Distribution", main = "Hermans_Rasson")
# Statistics for arbitrary dimensions
test_statistic <- function(type, to = 1, pmax = 5, M = 1e3, ...) {
col <- viridisLite::viridis(pmax - 1)
curve(p_Sobolev(x = x, p = 2, type = type, method = "MC", M = M,
...), to = to, n = 2e2, col = col[pmax - 1],
ylab = "Distribution", main = type, ylim = c(0, 1))
for (p in 3:pmax) {
curve(p_Sobolev(x = x, p = p, type = type, method = "MC", M = M,
...), add = TRUE, n = 2e2, col = col[pmax - p + 1])
}
legend("bottomright", legend = paste("p =", 2:pmax), col = rev(col),
lwd = 2)
}
# Ajne
test_statistic(type = "Ajne")
# Gine_Gn
test_statistic(type = "Gine_Gn", to = 1.5)
# Gine_Fn
test_statistic(type = "Gine_Fn", to = 2)
# Bakshaev
test_statistic(type = "Bakshaev", to = 3)
# Riesz
test_statistic(type = "Riesz", Riesz_s = 0.5, to = 3)
# PCvM
test_statistic(type = "PCvM", to = 0.6)
# PAD
test_statistic(type = "PAD", to = 3)
# PRt
test_statistic(type = "PRt", Rothman_t = 0.5)
# Quantiles
p <- c(2, 3, 4, 11)
t(sapply(p, function(p) q_Sobolev(u = c(0.10, 0.05, 0.01), p = p,
type = "PCvM")))
t(sapply(p, function(p) q_Sobolev(u = c(0.10, 0.05, 0.01), p = p,
type = "PAD")))
t(sapply(p, function(p) q_Sobolev(u = c(0.10, 0.05, 0.01), p = p,
type = "PRt")))
# Series truncation for thre = 1e-5
sapply(p, function(p) length(weights_dfs_Sobolev(p = p, type = "PCvM")$dfs))
sapply(p, function(p) length(weights_dfs_Sobolev(p = p, type = "PRt")$dfs))
sapply(p, function(p) length(weights_dfs_Sobolev(p = p, type = "PAD")$dfs))
Transformation between different coefficients in Sobolev statistics
Description
Given a Sobolev statistic
S_{n, p} = \sum_{i, j = 1}^n \psi(\cos^{-1}({\bf X}_i'{\bf X}_j)),
for a sample {\bf X}_1, \ldots, {\bf X}_n \in S^{p - 1} := \{{\bf x}
\in R^p : ||{\bf x}|| = 1\}
, p\ge 2
, three important sequences
are related to S_{n, p}
.
-
Gegenbauer coefficients
\{b_{k, p}\}
of\psi_p
(see, e.g., the projected-ecdf statistics), given byb_{k, p} := \frac{1}{c_{k, p}}\int_0^\pi \psi_p(\theta) C_k^{p / 2 - 1}(\cos\theta)\,\mathrm{d}\theta.
Weights
\{v_{k, p}^2\}
of the asymptotic distribution of the Sobolev statistic,\sum_{k = 1}^\infty v_k^2 \chi^2_{d_{p, k}}
, given byv_{k, p}^2 = \left(1 + \frac{2k}{p - 2}\right)^{-1} b_{k, p}, \quad p \ge 3.
Gegenbauer coefficients
\{u_{k, p}\}
of the local projected alternative associated toS_{n, p}
, given byu_{k, p} = \left(1 + \frac{2k}{p - 2}\right) v_{k, p}, \quad p \ge 3.
For p = 2
, the factor (1 + 2k / (p - 2))
is replaced by 2
.
Usage
bk_to_vk2(bk, p, log = FALSE)
bk_to_uk(bk, p, signs = 1)
vk2_to_bk(vk2, p, log = FALSE)
vk2_to_uk(vk2, p, signs = 1)
uk_to_vk2(uk, p)
uk_to_bk(uk, p)
Arguments
bk |
coefficients |
p |
integer giving the dimension of the ambient space |
log |
do operations in log scale (log-in, log-out)? Defaults to
|
signs |
signs of the coefficients |
vk2 |
squared coefficients |
uk |
coefficients |
Details
See more details in Prentice (1978) and García-Portugués et al. (2023). The
adequate signs of uk
for the "PRt"
Rothman test
can be retrieved with akx
and sqr = TRUE
, see the
examples.
Value
The corresponding vectors of coefficients vk2
, bk
, or
uk
, depending on the call.
References
García-Portugués, E., Navarro-Esteban, P., Cuesta-Albertos, J. A. (2023) On a projection-based class of uniformity tests on the hypersphere. Bernoulli, 29(1):181–204. doi:10.3150/21-BEJ1454.
Prentice, M. J. (1978). On invariant tests of uniformity for directions and orientations. The Annals of Statistics, 6(1):169–176. doi:10.1214/aos/1176344075
Examples
# bk, vk2, and uk for the PCvM test in p = 3
(bk <- Gegen_coefs_Pn(k = 1:5, type = "PCvM", p = 3))
(vk2 <- bk_to_vk2(bk = bk, p = 3))
(uk <- bk_to_uk(bk = bk, p = 3))
# vk2 is the same as
weights_dfs_Sobolev(K_max = 10, thre = 0, p = 3, type = "PCvM")$weights
# bk and uk for the Rothman test in p = 3, with adequate signs
t <- 1 / 3
(bk <- Gegen_coefs_Pn(k = 1:5, type = "PRt", p = 3, Rothman_t = t))
(ak <- akx(x = drop(q_proj_unif(t, p = 3)), p = 3, k = 1:5, sqr = TRUE))
(uk <- bk_to_uk(bk = bk, p = 3, signs = ak))
Conversion between angular and Cartesian coordinates of the (hyper)sphere
Description
Transforms the angles (\theta_1,\ldots,\theta_{p-1})'
in
[0,\pi)^{p-2}\times[-\pi,\pi)
into the Cartesian coordinates
(\cos(x_1),\sin(x_1)\cos(x_2),\ldots,
\sin(x_1)\cdots\sin(x_{p-2})\cos(x_{p-1}),
\sin(x_1)\cdots\sin(x_{p-2})\sin(x_{p-1}))'
of S^{p-1}
, and vice versa.
Usage
angles_to_sphere(theta)
sphere_to_angles(x)
Arguments
theta |
matrix of size |
x |
matrix of size |
Value
For angles_to_sphere
, the matrix x
. For
sphere_to_angles
, the matrix theta
.
Examples
# Check changes of coordinates
sphere_to_angles(angles_to_sphere(c(pi / 2, 0, pi)))
sphere_to_angles(angles_to_sphere(rbind(c(pi / 2, 0, pi), c(pi, pi / 2, 0))))
angles_to_sphere(sphere_to_angles(c(0, sqrt(0.5), sqrt(0.1), sqrt(0.4))))
angles_to_sphere(sphere_to_angles(
rbind(c(0, sqrt(0.5), sqrt(0.1), sqrt(0.4)),
c(0, sqrt(0.5), sqrt(0.5), 0),
c(0, 1, 0, 0),
c(0, 0, 0, -1),
c(0, 0, 1, 0))))
# Circle
sphere_to_angles(angles_to_sphere(0))
sphere_to_angles(angles_to_sphere(cbind(0:3)))
angles_to_sphere(cbind(sphere_to_angles(rbind(c(0, 1), c(1, 0)))))
angles_to_sphere(cbind(sphere_to_angles(rbind(c(0, 1)))))
Available circular and (hyper)spherical uniformity tests
Description
Listing of the tests implemented in the sphunif
package.
Usage
avail_cir_tests
avail_sph_tests
Format
An object of class character
of length 33.
An object of class character
of length 18.
Value
A character vector whose elements are valid inputs for the
type
argument in unif_test
, unif_stat
,
unif_stat_distr
, and unif_stat_MC
.
avail_cir_tests
provides the available circular tests and
avail_sph_tests
the (hyper)spherical tests.
Examples
# Circular tests
avail_cir_tests
# Spherical tests
avail_sph_tests
The incomplete beta function and its inverse
Description
Computes the incomplete beta function
I_x(a,b):=\int_0^x u^{a-1}(1-u)^{b-1}\,d\mathrm{u},\quad a,b>0
and its inverse function.
Usage
beta_inc(x, a, b, lower_tail = TRUE, log = FALSE)
beta_inc_inv(u, a, b, lower_tail = TRUE, log = FALSE)
Arguments
x |
a vector of size |
a , b |
scalars giving the parameters of the beta function. |
lower_tail |
accumulate the probability from the lower tail? If
|
log |
use log-scale? If |
u |
a vector of probabilities of size |
Details
The functions are mere wrappers to R's internal pbeta
and
qbeta
functions.
Value
-
beta_inc
: a matrix of sizec(nx, 1)
with the evaluation of the incomplete beta function atx
. -
beta_inc_inv
: a matrix of sizec(nu, 1)
with the evaluation of the inverse incomplete beta function atu
.
Density and distribution of a chi squared
Description
Computation of the density and distribution functions of a chi squared.
Usage
d_chisq(x, df, ncp = 0L)
p_chisq(x, df, ncp = 0L)
Arguments
x |
a vector of size |
df |
degrees of freedom. |
ncp |
non-centrality parameter. |
Value
A matrix of size c(nx, 1)
with the evaluation of the
density or distribution function at x
.
Transforming between polar and Cartesian coordinates
Description
Transformation between a matrix Theta
containing
M
circular samples of size n
on [0, 2\pi)
and an array
X
containing the associated Cartesian coordinates on
S^1:=\{{\bf x}\in R^2:||{\bf x}||=1\}
.
Usage
Theta_to_X(Theta)
X_to_Theta(X)
Arguments
Theta |
a matrix of size |
X |
an array of size |
Value
-
Theta_to_X
: the correspondingX
. -
X_to_Theta
: the correspondingTheta
.
Examples
# Sample
Theta <- r_unif_cir(n = 10, M = 2)
X <- r_unif_sph(n = 10, p = 2, M = 2)
# Check equality
sum(abs(X - Theta_to_X(X_to_Theta(X))))
sum(abs(Theta - X_to_Theta(Theta_to_X(Theta))))
Circular gaps
Description
Computation of the circular gaps of an angular sample
\Theta_1,\ldots,\Theta_n
on [0, 2\pi)
, defined as
\Theta_{(2)} - \Theta_{(1)},\ldots,\Theta_{(n)} - \Theta_{(n - 1)},
2\pi - \Theta_{(n)} - \Theta_{(1)},
where
0 \le \Theta_{(1)} \le \Theta_{(2)} \le \ldots \le
\Theta_{(n)} \le 2\pi.
Usage
cir_gaps(Theta, sorted = FALSE)
Arguments
Theta |
a matrix of size |
sorted |
are the columns of |
Value
A matrix of size c(n, M)
containing the n
circular
gaps for each of the M
circular samples.
Warning
Be careful on avoiding the next bad usages of cir_gaps
, which will
produce spurious results:
The entries of
Theta
are not in[0, 2\pi)
.-
Theta
is not sorted increasingly whendata_sorted = TRUE
.
Examples
Theta <- cbind(c(pi, 0, 3 * pi / 2), c(0, 3 * pi / 2, pi), c(5, 3, 1))
cir_gaps(Theta)
Statistics for testing circular uniformity
Description
Low-level implementation of several statistics for assessing
circular uniformity on [0, 2\pi)
or, equivalently,
S^1:=\{{\bf x}\in R^2:||{\bf x}||=1\}
.
Usage
cir_stat_Kuiper(Theta, sorted = FALSE, KS = FALSE, Stephens = FALSE)
cir_stat_Watson(Theta, sorted = FALSE, CvM = FALSE, Stephens = FALSE)
cir_stat_Watson_1976(Theta, sorted = FALSE, minus = FALSE)
cir_stat_Range(Theta, sorted = FALSE, gaps_in_Theta = FALSE,
max_gap = TRUE)
cir_stat_Rao(Theta, sorted = FALSE, gaps_in_Theta = FALSE)
cir_stat_Greenwood(Theta, sorted = FALSE, gaps_in_Theta = FALSE)
cir_stat_Log_gaps(Theta, sorted = FALSE, gaps_in_Theta = FALSE,
abs_val = TRUE)
cir_stat_Vacancy(Theta, a = 2 * pi, sorted = FALSE,
gaps_in_Theta = FALSE)
cir_stat_Max_uncover(Theta, a = 2 * pi, sorted = FALSE,
gaps_in_Theta = FALSE)
cir_stat_Num_uncover(Theta, a = 2 * pi, sorted = FALSE,
gaps_in_Theta = FALSE, minus_val = TRUE)
cir_stat_Gini(Theta, sorted = FALSE, gaps_in_Theta = FALSE)
cir_stat_Gini_squared(Theta, sorted = FALSE, gaps_in_Theta = FALSE)
cir_stat_Ajne(Theta, Psi_in_Theta = FALSE)
cir_stat_Rothman(Theta, Psi_in_Theta = FALSE, t = 1/3)
cir_stat_Hodges_Ajne(Theta, asymp_std = FALSE, sorted = FALSE,
use_Cressie = TRUE)
cir_stat_Cressie(Theta, t = 1/3, sorted = FALSE)
cir_stat_FG01(Theta, sorted = FALSE)
cir_stat_Rayleigh(Theta, m = 1L)
cir_stat_Bingham(Theta)
cir_stat_Hermans_Rasson(Theta, Psi_in_Theta = FALSE)
cir_stat_Gine_Gn(Theta, Psi_in_Theta = FALSE)
cir_stat_Gine_Fn(Theta, Psi_in_Theta = FALSE)
cir_stat_Pycke(Theta, Psi_in_Theta = FALSE)
cir_stat_Pycke_q(Theta, Psi_in_Theta = FALSE, q = 0.5)
cir_stat_Bakshaev(Theta, Psi_in_Theta = FALSE)
cir_stat_Riesz(Theta, Psi_in_Theta = FALSE, s = 1)
cir_stat_PCvM(Theta, Psi_in_Theta = FALSE)
cir_stat_PRt(Theta, Psi_in_Theta = FALSE, t = 1/3)
cir_stat_PAD(Theta, Psi_in_Theta = FALSE, AD = FALSE, sorted = FALSE)
cir_stat_Poisson(Theta, Psi_in_Theta = FALSE, rho = 0.5)
cir_stat_Softmax(Theta, Psi_in_Theta = FALSE, kappa = 1)
cir_stat_CCF09(Theta, dirs, K_CCF09 = 25L, original = FALSE)
Arguments
Theta |
a matrix of size |
sorted |
are the columns of |
KS |
compute the Kolmogorov-Smirnov statistic (which is
not invariant under origin shifts) instead of the Kuiper statistic?
Defaults to |
Stephens |
compute Stephens (1970) modification so that the null distribution of the is less dependent on the sample size? The modification does not alter the test decision. |
CvM |
compute the Cramér-von Mises statistic (which is not
invariant under origin shifts) instead of the Watson statistic? Defaults to
|
minus |
compute the invariant |
gaps_in_Theta |
does |
max_gap |
compute the maximum gap for the range statistic? If
|
abs_val |
return the absolute value of the Darling's log gaps
statistic? If |
a |
either:
|
minus_val |
return the negative value of the (standardized) number of
uncovered spacings? If |
Psi_in_Theta |
does |
t |
|
asymp_std |
normalize the Hodges-Ajne statistic in terms of its
asymptotic distribution? Defaults to |
use_Cressie |
compute the Hodges-Ajne statistic as a particular case
of the Cressie statistic? Defaults to |
m |
integer |
q |
|
s |
|
AD |
compute the Anderson-Darling statistic (which is not
invariant under origin shifts) instead of the Projected Anderson-Darling
statistic? Defaults to |
rho |
|
kappa |
|
dirs |
a matrix of size |
K_CCF09 |
integer giving the truncation of the series present in the
asymptotic distribution of the Kolmogorov-Smirnov statistic. Defaults to
|
original |
return the CCF09 statistic as originally defined?
If |
Details
Descriptions and references for most of the statistics are available in García-Portugués and Verdebout (2018).
The statistics cir_stat_PCvM
and cir_stat_PRt
are provided
for the sake of completion, but they equal the more efficiently-implemented
statistics 2 * cir_stat_Watson
and cir_stat_Rothman
,
respectively.
Value
A matrix of size c(M, 1)
containing the statistics for each
of the M
samples.
Warning
Be careful on avoiding the next bad usages of the functions, which will produce spurious results:
The entries of
Theta
are not in[0, 2\pi)
.-
Theta
does not contain the circular gaps whengaps_in_Theta = TRUE
. -
Theta
is not sorted increasingly whendata_sorted = TRUE
. -
Theta
does not containPsi_mat(array(Theta, dim = c(n, 1, M)))
whenPsi_in_Theta = TRUE
. The directions in
dirs
do not have unit norm.
References
García-Portugués, E. and Verdebout, T. (2018) An overview of uniformity tests on the hypersphere. arXiv:1804.00286. doi:10.48550/arXiv.1804.00286.
Examples
## Sample uniform circular data
M <- 2
n <- 100
set.seed(987202226)
Theta <- r_unif_cir(n = n, M = M)
## Tests based on the empirical cumulative distribution function
# Kuiper
cir_stat_Kuiper(Theta)
cir_stat_Kuiper(Theta, Stephens = TRUE)
# Watson
cir_stat_Watson(Theta)
cir_stat_Watson(Theta, Stephens = TRUE)
# Watson (1976)
cir_stat_Watson_1976(Theta)
## Partition-based tests
# Ajne
Theta_array <- Theta
dim(Theta_array) <- c(nrow(Theta), 1, ncol(Theta))
Psi <- Psi_mat(Theta_array)
cir_stat_Ajne(Theta)
cir_stat_Ajne(Psi, Psi_in_Theta = TRUE)
# Rothman
cir_stat_Rothman(Theta, t = 0.5)
cir_stat_Rothman(Theta)
cir_stat_Rothman(Psi, Psi_in_Theta = TRUE)
# Hodges-Ajne
cir_stat_Hodges_Ajne(Theta)
cir_stat_Hodges_Ajne(Theta, use_Cressie = FALSE)
# Cressie
cir_stat_Cressie(Theta, t = 0.5)
cir_stat_Cressie(Theta)
# FG01
cir_stat_FG01(Theta)
## Spacings-based tests
# Range
cir_stat_Range(Theta)
# Rao
cir_stat_Rao(Theta)
# Greenwood
cir_stat_Greenwood(Theta)
# Log gaps
cir_stat_Log_gaps(Theta)
# Vacancy
cir_stat_Vacancy(Theta)
# Maximum uncovered spacing
cir_stat_Max_uncover(Theta)
# Number of uncovered spacings
cir_stat_Num_uncover(Theta)
# Gini mean difference
cir_stat_Gini(Theta)
# Gini mean squared difference
cir_stat_Gini_squared(Theta)
## Sobolev tests
# Rayleigh
cir_stat_Rayleigh(Theta)
cir_stat_Rayleigh(Theta, m = 2)
# Bingham
cir_stat_Bingham(Theta)
# Hermans-Rasson
cir_stat_Hermans_Rasson(Theta)
cir_stat_Hermans_Rasson(Psi, Psi_in_Theta = TRUE)
# Gine Fn
cir_stat_Gine_Fn(Theta)
cir_stat_Gine_Fn(Psi, Psi_in_Theta = TRUE)
# Gine Gn
cir_stat_Gine_Gn(Theta)
cir_stat_Gine_Gn(Psi, Psi_in_Theta = TRUE)
# Pycke
cir_stat_Pycke(Theta)
cir_stat_Pycke(Psi, Psi_in_Theta = TRUE)
# Pycke q
cir_stat_Pycke_q(Theta)
cir_stat_Pycke_q(Psi, Psi_in_Theta = TRUE)
# Bakshaev
cir_stat_Bakshaev(Theta)
cir_stat_Bakshaev(Psi, Psi_in_Theta = TRUE)
# Riesz
cir_stat_Riesz(Theta, s = 1)
cir_stat_Riesz(Psi, Psi_in_Theta = TRUE, s = 1)
# Projected Cramér-von Mises
cir_stat_PCvM(Theta)
cir_stat_PCvM(Psi, Psi_in_Theta = TRUE)
# Projected Rothman
cir_stat_PRt(Theta, t = 0.5)
cir_stat_PRt(Theta)
cir_stat_PRt(Psi, Psi_in_Theta = TRUE)
# Projected Anderson-Darling
cir_stat_PAD(Theta)
cir_stat_PAD(Psi, Psi_in_Theta = TRUE)
## Other tests
# CCF09
dirs <- r_unif_sph(n = 3, p = 2, M = 1)[, , 1]
cir_stat_CCF09(Theta, dirs = dirs)
## Connection of Kuiper and Watson statistics with KS and CvM, respectively
# Rotate sample for KS and CvM
alpha <- seq(0, 2 * pi, l = 1e4)
KS_alpha <- sapply(alpha, function(a) {
cir_stat_Kuiper((Theta[, 2, drop = FALSE] + a) %% (2 * pi), KS = TRUE)
})
CvM_alpha <- sapply(alpha, function(a) {
cir_stat_Watson((Theta[, 2, drop = FALSE] + a) %% (2 * pi), CvM = TRUE)
})
AD_alpha <- sapply(alpha, function(a) {
cir_stat_PAD((Theta[, 2, drop = FALSE] + a) %% (2 * pi), AD = TRUE)
})
# Kuiper is the maximum rotated KS
plot(alpha, KS_alpha, type = "l")
abline(h = cir_stat_Kuiper(Theta[, 2, drop = FALSE]), col = 2)
points(alpha[which.max(KS_alpha)], max(KS_alpha), col = 2, pch = 16)
# Watson is the minimum rotated CvM
plot(alpha, CvM_alpha, type = "l")
abline(h = cir_stat_Watson(Theta[, 2, drop = FALSE]), col = 2)
points(alpha[which.min(CvM_alpha)], min(CvM_alpha), col = 2, pch = 16)
# Anderson-Darling is the average rotated AD?
plot(alpha, AD_alpha, type = "l")
abline(h = cir_stat_PAD(Theta[, 2, drop = FALSE]), col = 2)
abline(h = mean(AD_alpha), col = 3)
Comet orbits
Description
Comet orbits data from the
JPL Small-Body Database Search Engine. The normal vector of a comet orbit
represents is a vector on S^2
.
Usage
comets
Format
A data frame with 3798 rows and 13 variables:
- id
database ID.
- spkid
object primary SPK-ID.
- full_name
full name/designation following the IUA naming convention.
- pdes
object primary designation.
- frag
flag indicating if the record is a comet fragment.
- diameter
diameter from equivalent sphere (in km).
- i
inclination; the orbit's plane angle with respect to the ecliptic plane, in radians in
[0, \pi]
.- om
longitude of the ascending node; the counterclockwise angle from the vector pointing to the First Point of Aries and that pointing to the ascending node (the intersection between orbit and ecliptic plane), in radians in
[0, 2\pi)
. (Both vectors are heliocentric and within the ecliptic plane.)- per_y
sidereal orbital period (in years).
- class
orbit classification. A factor with levels given below.
- e
eccentricity of the orbit.
- a
semi-major axis of the orbit (in AU).
- w
argument of perihelion; the (shortest) angle between the vector pointing to the ascending node and that pointing to the perihelion (nearest orbit point to the Sun), in radians in
[0, \pi]
. (Both vectors are heliocentric and within the orbit's plane.)- first_obs, last_obs
Date
of the first and last recorded observations used in the orbit fit.- ccf09
flag indicating if the comet was considered in the data application in Cuesta-Albertos et al. (2009); see details below.
Details
The normal vector to the ecliptic plane of the comet with inclination
i
and longitude of the ascending node \omega
is
(\sin(i) \sin(\omega), -\sin(i) \cos(\omega), \cos(i))'.
A prograde comet has positive \cos(i)
, negative
\cos(i)
represents a retrograde comet.
class
has the following levels:
-
COM
: comet orbit not matching any defined orbit class. -
CTc
: Chiron-type comet, as defined by Levison and Duncan (T_Jupiter > 3; a > a_Jupiter). -
ETc
: Encke-type comet, as defined by Levison and Duncan (T_Jupiter > 3; a < a_Jupiter). -
HTC
: Halley-type comet, classical definition (20y < P < 200y). -
HYP
: comets on hyperbolic orbits. -
JFc
: Jupiter-family comet, as defined by Levison and Duncan (2 < T_Jupiter < 3). -
JFC
: Jupiter-family comet, classical definition (P < 20y). -
PAR
: comets on parabolic orbits.
Hyperbolic and parabolic comets are not periodic; only elliptical comets are periodic.
The ccf09
variable gives the observations considered in
Cuesta-Albertos et al. (2009) after fetching in the database in 2007-12-14
for the comets such that !(class %in% c("HYP", "PAR")) &
per_y >= 200
. Due to the dynamic nature of the data, more comets were added
to the database since 2007 and also some past records were updated.
The script performing the data preprocessing is available at
comets.R
. The data was retrieved on 2022-05-28. A previous version
of this dataset based on the old NASA's JPL Database (accessed on
2020-05-07) is available at
comets-old.rda
and was obtained with
comets-old.R
.
Source
https://ssd.jpl.nasa.gov/tools/sbdb_query.html
References
Cuesta-Albertos, J. A., Cuevas, A., Fraiman, R. (2009) On projection-based tests for directional and compositional data. Statistics and Computing, 19:367–380. doi:10.1007/s11222-008-9098-3
Examples
# Load data
data("comets")
# Add normal vectors
comets$normal <- cbind(sin(comets$i) * sin(comets$om),
-sin(comets$i) * cos(comets$om),
cos(comets$i))
# Tests to be performed
type_tests <- c("PCvM", "PAD", "PRt")
# Excluding the C/1882 R1-X (Great September comet) records with X = B, C, D
comets_ccf09 <- comets[comets$ccf09, ][-c(13:15), ]
# Sample size
nrow(comets_ccf09)
# Tests for the data in Cuesta-Albertos et al. (2009)
tests_ccf09 <- unif_test(data = comets_ccf09$normal, type = type_tests,
p_value = "asymp")
tests_ccf09
Craters named by the IUA
Description
Named craters of the Solar System by the Gazetteer of Planetary Nomenclature of the International Astronomical Union (IUA).
Usage
craters
Format
A data frame with 5235 rows and 7 variables:
- ID
database ID.
- name
name of the crater.
- target
name of the celestial body. A factor with 43 levels, such as
"Moon"
,"Venus"
, or"Europa"
.- target_type
type of celestial body. A factor with 3 levels:
"Planet"
,"Moon"
,"Dwarf planet"
, or"Asteroid"
.- diameter
diameter of the crater (in km).
- theta
longitude angle
\theta \in [0, 2\pi)
of the crater center.- phi
latitude angle
\phi \in [-\pi/2, \pi/2]
of the crater center.
Details
"Craters" are understood in the Gazetteer of Planetary Nomenclature as roughly circular depressions resulting from impact or volcanic activity (the geological origin is unspecified).
Be aware that the dataset only contains named craters by the IUA.
Therefore, there is likely a high uniform bias on the distribution
of craters. Presumably the naming process attempts to cover the planet in
a somehow uniform fashion (distant craters are more likely to be named than
neighboring craters). Also, there are substantially more craters in the
listed bodies than those named by the IUA. See venus
and
rhea
for more detailed and specific crater datasets.
The (\theta, \phi)
angles are such their associated planetocentric
coordinates are:
(\cos(\phi) \cos(\theta), \cos(\phi) \sin(\theta), \sin(\phi))',
with (0, 0, 1)'
denoting the north pole.
The script performing the data preprocessing is available at
craters.R
. The data was retrieved on 2020-05-31.
Source
https://planetarynames.wr.usgs.gov/AdvancedSearch
Examples
# Load data
data("craters")
# Add Cartesian coordinates
craters$X <- cbind(cos(craters$theta) * cos(craters$phi),
sin(craters$theta) * cos(craters$phi),
sin(craters$phi))
# Tests to be performed
type_tests <- c("PCvM", "PAD", "PRt")
# Tests for Venus and Rhea
unif_test(data = craters$X[craters$target == "Venus", ], type = type_tests,
p_value = "asymp")
unif_test(data = craters$X[craters$target == "Rhea", ], type = type_tests,
p_value = "asymp")
Efficient evaluation of the empirical cumulative distribution function
Description
Evaluates the empirical cumulative distribution function
(ecdf) of a sample data
at the evaluation points sorted_x
.
This is done through binary search.
Usage
ecdf_bin(data, sorted_x, data_sorted = FALSE, efic = TRUE,
divide_n = TRUE)
Arguments
data |
a vector or column matrix containing the sample. |
sorted_x |
a vector or column matrix with the evaluation points sorted increasingly. |
data_sorted |
is |
efic |
use the more efficient version of the ecdf evaluation? Set to
|
divide_n |
if |
Value
The ecdf evaluated at sorted_x
.
Warning
Be careful on avoiding the next bad usages of the function, which will produce spurious results:
-
sorted_x
is not sorted increasingly. -
data
is not sorted increasingly whendata_sorted = TRUE
-
Author(s)
Original code from Douglas Bates' https://github.com/dmbates/ecdfExample. Minor adaptations by Eduardo García-Portugués.
(Hyper)spherical harmonics
Description
Computation of a certain explicit representation of
(hyper)spherical harmonics on
S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}
, p\ge 2
. Details are available in
García-Portugués et al. (2024).
Usage
g_i_k(x, i = 1, k = 1, m = NULL, show_m = FALSE)
Arguments
x |
locations in |
i , k |
alternative indexing to refer to the |
m |
(hyper)spherical harmonic index, as used in Proposition 3.1. The
index is computed internally from |
show_m |
flag to print |
Details
The implementation uses Proposition 3.1 in García-Portugués et al. (2024),
which adapts Theorem 1.5.1 in Dai and Xu (2013) with the correction of
typos in the normalizing constant h_\alpha
and in the definition of
the function g_\alpha
of the latter theorem.
Value
A vector of size nrow(x)
.
References
Dai, F. and Xu, Y. (2013). Approximation Theory and Harmonic Analysis on Spheres and Balls. Springer, New York. doi:10.1007/978-1-4614-6660-4
García-Portugués, E., Paindaveine, D., and Verdebout, T. (2024). On a class of Sobolev tests for symmetry of directions, their detection thresholds, and asymptotic powers. arXiv:2108.09874v2. doi:10.48550/arXiv.2108.09874
Examples
n <- 3e3
old_par <- par(mfrow = c(2, 3))
k <- 2
for (i in 1:d_p_k(p = 3, k = k)) {
X <- r_unif_sph(n = n, p = 3, M = 1)[, , 1]
col <- rainbow(n)[rank(g_i_k(x = X, k = k, i = i, show_m = TRUE))]
scatterplot3d::scatterplot3d(X[, 1], X[, 2], X[, 3], color = col,
axis = FALSE, pch = 19)
}
for (k in 0:5) {
X <- r_unif_sph(n = n, p = 3, M = 1)[, , 1]
col <- rainbow(n)[rank(g_i_k(x = X, k = k, i = 1, show_m = TRUE))]
scatterplot3d::scatterplot3d(X[, 1], X[, 2], X[, 3], color = col,
axis = FALSE, pch = 19)
}
par(old_par)
Monte Carlo integration of functions on the (hyper)sphere
Description
Monte Carlo approximation of the integral
\int_{S^{p-1}}f(x)\,\mathrm{d}x
of a function f:S^{p-1} \rightarrow R
defined on the (hyper)sphere
S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}
, p\ge 2
.
Usage
int_sph_MC(f, p, M = 10000, cores = 1, chunks = ceiling(M/1000),
seeds = NULL, ...)
Arguments
f |
function to be integrated. Its first argument must be the
(hyper)sphere position. Must be vectorized and return a vector of size
|
p |
integer giving the dimension of the ambient space |
M |
number of Monte Carlo samples. Defaults to |
cores |
number of cores to perform the integration. Defaults to
|
chunks |
number of chunks to split the |
seeds |
if provided, a vector of size |
... |
optional arguments to be passed to |
Details
It is possible to have a progress bar if int_sph_MC
is wrapped with
progressr::with_progress
or if
progressr::handlers(global = TRUE)
is invoked (once) by the user.
See the examples below. The progress bar is updated with the number of
finished chunks.
Value
A scalar with the approximate integral.
Examples
## Sequential simulation
# Vectorized functions to be integrated
x1 <- function(x) x[, 1]
quad <- function(x, a = 0) a + rowSums(x^4)
# Approximate \int_{S^{p-1}} x_1 dx = 0
int_sph_MC(f = x1, p = 3, M = 1e4, chunks = 2)
# Approximate \int_{S^{p-1}} (a + \sum_i x_i^4) dx
int_sph_MC(f = quad, p = 2, M = 1e4, a = 0, chunks = 2)
# Compare with Gauss--Legendre integration on S^2
th_k <- Gauss_Legen_nodes(a = 0, b = 2 * pi, N = 40)
w_k <- Gauss_Legen_weights(a = 0, b = 2 * pi, N = 40)
sum(w_k * quad(cbind(cos(th_k), sin(th_k)), a = 1))
## Parallel simulation with a progress bar
# Define a progress bar
require(progress)
require(progressr)
handlers(handler_progress(
format = paste("(:spin) [:bar] :percent Iter: :current/:total Rate:",
":tick_rate iter/sec ETA: :eta Elapsed: :elapsedfull"),
clear = FALSE))
# Call int_sph_MC() within with_progress()
with_progress(int_sph_MC(f = x1, p = 3, cores = 2, M = 1e5, chunks = 100))
# Instead of using with_progress() each time, it is more practical to run
# handlers(global = TRUE)
# once to activate progress bars in your R session
Local projected alternatives to uniformity
Description
Density and random generation for local projected alternatives to uniformity with densities
f_{\kappa, \boldsymbol{\mu}}({\bf x}): =
\frac{1 - \kappa}{\omega_p} + \kappa f({\bf x}'\boldsymbol{\mu})
where
f(z) = \frac{1}{\omega_p}\left\{1 + \sum_{k = 1}^\infty u_{k, p}
C_k^{p / 2 - 1}(z)\right\}
is the angular function controlling the local alternative in a
Gegenbauer series, 0\le \kappa \le 1
,
\boldsymbol{\mu}
is a direction on S^{p - 1}
, and
\omega_p
is the surface area of S^{p - 1}
. The sequence
\{u_{k, p}\}
is typically such that
u_{k, p} = \left(1 + \frac{2k}{p - 2}\right) b_{k, p}
for the Gegenbauer coefficients
\{b_{k, p}\}
of the kernel function of a Sobolev statistic (see the
transformation between the coefficients u_{k, p}
and b_{k, p}
).
Also, automatic truncation of the series \sum_{k = 1}^\infty u_{k, p}
C_k^{p / 2 - 1}(z)
according to the proportion of "Gegenbauer norm"
explained.
Usage
f_locdev(z, p, uk)
con_f(f, p, N = 320)
d_locdev(x, mu, f, kappa)
r_locdev(n, mu, f, kappa, F_inv = NULL, ...)
cutoff_locdev(p, K_max = 10000, thre = 0.001, type, Rothman_t = 1/3,
Pycke_q = 0.5, verbose = FALSE, Gauss = TRUE, N = 320, tol = 1e-06)
Arguments
z |
projected evaluation points for |
p |
integer giving the dimension of the ambient space |
uk |
coefficients |
f |
angular function defined on |
N |
number of points used in the
Gauss–Legendre quadrature for computing the Gegenbauer coefficients.
Defaults to |
x |
locations in |
mu |
a unit norm vector of size |
kappa |
the strength of the local alternative, between |
n |
sample size, a positive integer. |
F_inv |
quantile function associated to |
... |
further parameters passed to |
K_max |
integer giving the truncation of the series. Defaults to
|
thre |
proportion of norm not explained by the first terms of the
truncated series. Defaults to |
type |
name of the Sobolev statistic, using the naming from
|
Rothman_t |
|
Pycke_q |
|
verbose |
output information about the truncation ( |
Gauss |
use a Gauss–Legendre quadrature rule of |
tol |
tolerance passed to |
Details
See the definitions of local alternatives in Prentice (1978) and in García-Portugués et al. (2023).
The truncation of \sum_{k = 1}^\infty u_{k, p} C_k^{p / 2 - 1}(z)
is done to the first
K_max
terms and then up to the index such that the first terms
leave unexplained the proportion thre
of the norm of the whole series.
Setting thre = 0
truncates to K_max
terms exactly. If the
series only contains odd or even non-zero terms, then only K_max / 2
addends are effectively taken into account in the first truncation.
Value
-
f_locdev
: angular function evaluated atx
, a vector. -
con_f
: normalizing constantc_f
off
, a scalar. -
d_locdev
: density function evaluated atx
, a vector. -
r_locdev
: a matrix of sizec(n, p)
containing a random sample from the densityf_{\kappa, \boldsymbol{\mu}}
. -
cutoff_locdev
: vector of coefficients\{u_{k, p}\}
automatically truncated according toK_max
andthre
(see details).
References
García-Portugués, E., Navarro-Esteban, P., Cuesta-Albertos, J. A. (2023) On a projection-based class of uniformity tests on the hypersphere. Bernoulli, 29(1):181–204. doi:10.3150/21-BEJ1454.
Prentice, M. J. (1978). On invariant tests of uniformity for directions and orientations. The Annals of Statistics, 6(1):169–176. doi:10.1214/aos/1176344075
Examples
## Local alternatives diagnostics
loc_alt_diagnostic <- function(p, type, thre = 1e-3, K_max = 1e3) {
# Coefficients of the alternative
uk <- cutoff_locdev(K_max = K_max, p = p, type = type, thre = thre,
N = 640)
old_par <- par(mfrow = c(2, 2))
# Construction of f
z <- seq(-1, 1, l = 1e3)
f <- function(z) f_locdev(z = z, p = p, uk = uk)
plot(z, f(z), type = "l", xlab = expression(z), ylab = expression(f(z)),
main = paste0("Local alternative f, ", type, ", p = ", p), log = "y")
# Projected density on [-1, 1]
f_proj <- function(z) rotasym::w_p(p = p - 1) * f(z) *
(1 - z^2)^((p - 3) / 2)
plot(z, f_proj(z), type = "l", xlab = expression(z),
ylab = expression(omega[p - 1] * f(z) * (1 - z^2)^{(p - 3) / 2}),
main = paste0("Projected density, ", type, ", p = ", p), log = "y",
sub = paste("Integral:", round(con_f(f = f, p = p), 4)))
# Quantile function for projected density
mu <- c(rep(0, p - 1), 1)
F_inv <- F_inv_from_f(f = f, p = p, K = 5e2)
plot(F_inv, xlab = expression(x), ylab = expression(F^{-1}*(x)),
main = paste0("Quantile function, ", type, ", p = ", p))
# Sample from the alternative and plot the projected sample
n <- 5e4
samp <- r_locdev(n = n, mu = mu, f = f, kappa = 1, F_inv = F_inv)
plot(z, f_proj(z), col = 2, type = "l",
main = paste0("Simulated projected data, ", type, ", p = ", p),
ylim = c(0, 1.75))
hist(samp %*% mu, freq = FALSE, breaks = seq(-1, 1, l = 50), add = TRUE)
par(old_par)
}
## Local alternatives for the PCvM test
loc_alt_diagnostic(p = 2, type = "PCvM")
loc_alt_diagnostic(p = 3, type = "PCvM")
loc_alt_diagnostic(p = 4, type = "PCvM")
loc_alt_diagnostic(p = 5, type = "PCvM")
loc_alt_diagnostic(p = 11, type = "PCvM")
## Local alternatives for the PAD test
loc_alt_diagnostic(p = 2, type = "PAD")
loc_alt_diagnostic(p = 3, type = "PAD")
loc_alt_diagnostic(p = 4, type = "PAD")
loc_alt_diagnostic(p = 5, type = "PAD")
loc_alt_diagnostic(p = 11, type = "PAD")
## Local alternatives for the PRt test
loc_alt_diagnostic(p = 2, type = "PRt")
loc_alt_diagnostic(p = 3, type = "PRt")
loc_alt_diagnostic(p = 4, type = "PRt")
loc_alt_diagnostic(p = 5, type = "PRt")
loc_alt_diagnostic(p = 11, type = "PRt")
Asymptotic distributions for circular uniformity statistics
Description
Computation of the asymptotic null distributions of circular uniformity statistics.
Usage
p_Kolmogorov(x, K_Kolmogorov = 25L, alternating = TRUE)
d_Kolmogorov(x, K_Kolmogorov = 25L, alternating = TRUE)
p_cir_stat_Ajne(x, K_Ajne = 15L)
d_cir_stat_Ajne(x, K_Ajne = 15L)
p_cir_stat_Bingham(x)
d_cir_stat_Bingham(x)
p_cir_stat_Greenwood(x)
d_cir_stat_Greenwood(x)
p_cir_stat_Gini(x)
d_cir_stat_Gini(x)
p_cir_stat_Gini_squared(x)
d_cir_stat_Gini_squared(x)
p_cir_stat_Hodges_Ajne2(x, n, asymp_std = FALSE)
p_cir_stat_Hodges_Ajne(x, n, exact = TRUE, asymp_std = FALSE)
d_cir_stat_Hodges_Ajne(x, n, exact = TRUE, asymp_std = FALSE)
p_cir_stat_Kuiper(x, n, K_Kuiper = 12L, second_term = TRUE,
Stephens = FALSE)
d_cir_stat_Kuiper(x, n, K_Kuiper = 12L, second_term = TRUE,
Stephens = FALSE)
p_cir_stat_Log_gaps(x, abs_val = TRUE)
d_cir_stat_Log_gaps(x, abs_val = TRUE)
p_cir_stat_Max_uncover(x)
d_cir_stat_Max_uncover(x)
p_cir_stat_Num_uncover(x)
d_cir_stat_Num_uncover(x)
p_cir_stat_Pycke(x)
d_cir_stat_Pycke(x)
p_cir_stat_Vacancy(x)
d_cir_stat_Vacancy(x)
p_cir_stat_Watson(x, n = 0L, K_Watson = 25L, Stephens = FALSE)
d_cir_stat_Watson(x, n = 0L, K_Watson = 25L, Stephens = FALSE)
p_cir_stat_Watson_1976(x, K_Watson_1976 = 8L, N = 40L)
d_cir_stat_Watson_1976(x, K_Watson_1976 = 8L)
p_cir_stat_Range(x, n, max_gap = TRUE)
d_cir_stat_Range(x, n, max_gap = TRUE)
p_cir_stat_Rao(x)
d_cir_stat_Rao(x)
p_cir_stat_Rayleigh(x)
d_cir_stat_Rayleigh(x)
p_cir_stat_Bakshaev(x, K_max = 1000, thre = 0, method = "I", ...)
d_cir_stat_Bakshaev(x, K_max = 1000, thre = 0, method = "I", ...)
p_cir_stat_Gine_Fn(x, K_max = 1000, thre = 0, method = "I", ...)
d_cir_stat_Gine_Fn(x, K_max = 1000, thre = 0, method = "I", ...)
p_cir_stat_Gine_Gn(x, K_max = 1000, thre = 0, method = "I", ...)
d_cir_stat_Gine_Gn(x, K_max = 1000, thre = 0, method = "I", ...)
p_cir_stat_Hermans_Rasson(x, K_max = 1000, thre = 0, method = "I", ...)
d_cir_stat_Hermans_Rasson(x, K_max = 1000, thre = 0, method = "I", ...)
p_cir_stat_PAD(x, K_max = 1000, thre = 0, method = "I", ...)
d_cir_stat_PAD(x, K_max = 1000, thre = 0, method = "I", ...)
p_cir_stat_PCvM(x, K_max = 1000, thre = 0, method = "I", ...)
d_cir_stat_PCvM(x, K_max = 1000, thre = 0, method = "I", ...)
p_cir_stat_PRt(x, t = 1/3, K_max = 1000, thre = 0, method = "I", ...)
d_cir_stat_PRt(x, t = 1/3, K_max = 1000, thre = 0, method = "I", ...)
p_cir_stat_Poisson(x, rho = 0.5, K_max = 1000, thre = 0, method = "I",
...)
d_cir_stat_Poisson(x, rho = 0.5, K_max = 1000, thre = 0, method = "I",
...)
p_cir_stat_Pycke_q(x, q = 0.5, K_max = 1000, thre = 0, method = "I",
...)
d_cir_stat_Pycke_q(x, q = 0.5, K_max = 1000, thre = 0, method = "I",
...)
p_cir_stat_Rothman(x, t = 1/3, K_max = 1000, thre = 0, method = "I",
...)
d_cir_stat_Rothman(x, t = 1/3, K_max = 1000, thre = 0, method = "I",
...)
p_cir_stat_Riesz(x, s = 1, K_max = 1000, thre = 0, method = "I", ...)
d_cir_stat_Riesz(x, s = 1, K_max = 1000, thre = 0, method = "I", ...)
p_cir_stat_Sobolev(x, vk2 = c(0, 0, 1), method = "I", ...)
d_cir_stat_Sobolev(x, vk2 = c(0, 0, 1), method = "I", ...)
p_cir_stat_Softmax(x, kappa = 1, K_max = 1000, thre = 0, method = "I",
...)
d_cir_stat_Softmax(x, kappa = 1, K_max = 1000, thre = 0, method = "I",
...)
Arguments
x |
a vector of size |
K_Kolmogorov , K_Kuiper , K_Watson , K_Watson_1976 , K_Ajne |
integer giving
the truncation of the series present in the null asymptotic distributions.
For the Kolmogorov-Smirnov-related series defaults to |
alternating |
use the alternating series expansion for the distribution
of the Kolmogorov-Smirnov statistic? Defaults to |
n |
sample size employed for computing the statistic. |
asymp_std |
compute the distribution associated to the normalized
Hodges-Ajne statistic? Defaults to |
exact |
use the exact distribution for the Hodges-Ajne statistic?
Defaults to |
second_term |
use the second-order series expansion for the
distribution of the Kuiper statistic? Defaults to |
Stephens |
compute Stephens (1970) modification so that the null distribution of the is less dependent on the sample size? The modification does not alter the test decision. |
abs_val |
compute the distribution associated to the absolute value of
the Darling's log gaps statistic? Defaults to |
N |
number of points used in the
Gauss-Legendre quadrature. Defaults to |
max_gap |
compute the distribution associated to the maximum gap for
the range statistic? Defaults to |
K_max |
integer giving the truncation of the series that compute the
asymptotic p-value of a Sobolev test. Defaults to |
thre |
error threshold for the tail probability given by the
the first terms of the truncated series of a Sobolev test. Defaults to
|
method |
method for approximating the density, distribution, or
quantile function of the weighted sum of chi squared random variables. Must
be |
... |
further parameters passed to |
t |
|
rho |
|
q |
|
s |
|
vk2 |
weights for the finite Sobolev test. A non-negative vector or
matrix. Defaults to |
kappa |
|
Details
Descriptions and references for most of the tests are available in García-Portugués and Verdebout (2018).
Value
A matrix of size c(nx, 1)
with the evaluation of the
distribution or density function at x
.
References
García-Portugués, E. and Verdebout, T. (2018) An overview of uniformity tests on the hypersphere. arXiv:1804.00286. doi:10.48550/arXiv.1804.00286.
Examples
# Ajne
curve(d_cir_stat_Ajne(x), to = 1.5, n = 2e2, ylim = c(0, 4))
curve(p_cir_stat_Ajne(x), n = 2e2, col = 2, add = TRUE)
# Bakshaev
curve(d_cir_stat_Bakshaev(x, method = "HBE"), to = 6, n = 2e2,
ylim = c(0, 1))
curve(p_cir_stat_Bakshaev(x, method = "HBE"), n = 2e2, add = TRUE, col = 2)
# Bingham
curve(d_cir_stat_Bingham(x), to = 12, n = 2e2, ylim = c(0, 1))
curve(p_cir_stat_Bingham(x), n = 2e2, col = 2, add = TRUE)
# Greenwood
curve(d_cir_stat_Greenwood(x), from = -6, to = 6, n = 2e2, ylim = c(0, 1))
curve(p_cir_stat_Greenwood(x), n = 2e2, col = 2, add = TRUE)
# Hermans-Rasson
curve(p_cir_stat_Hermans_Rasson(x, method = "HBE"), to = 10, n = 2e2,
ylim = c(0, 1))
curve(d_cir_stat_Hermans_Rasson(x, method = "HBE"), n = 2e2, add = TRUE,
col = 2)
# Hodges-Ajne
plot(25:45, d_cir_stat_Hodges_Ajne(cbind(25:45), n = 50), type = "h",
lwd = 2, ylim = c(0, 1))
lines(25:45, p_cir_stat_Hodges_Ajne(cbind(25:45), n = 50), type = "s",
col = 2)
# Kolmogorov-Smirnov
curve(d_Kolmogorov(x), to = 3, n = 2e2, ylim = c(0, 2))
curve(p_Kolmogorov(x), n = 2e2, col = 2, add = TRUE)
# Kuiper
curve(d_cir_stat_Kuiper(x, n = 50), to = 3, n = 2e2, ylim = c(0, 2))
curve(p_cir_stat_Kuiper(x, n = 50), n = 2e2, col = 2, add = TRUE)
# Kuiper and Watson with Stephens modification
curve(d_cir_stat_Kuiper(x, n = 8, Stephens = TRUE), to = 2.5, n = 2e2,
ylim = c(0, 10))
curve(d_cir_stat_Watson(x, n = 8, Stephens = TRUE), n = 2e2, lty = 2,
add = TRUE)
n <- c(10, 20, 30, 40, 50, 100, 500)
col <- rainbow(length(n))
for (i in seq_along(n)) {
curve(d_cir_stat_Kuiper(x, n = n[i], Stephens = TRUE), n = 2e2,
col = col[i], add = TRUE)
curve(d_cir_stat_Watson(x, n = n[i], Stephens = TRUE), n = 2e2,
col = col[i], lty = 2, add = TRUE)
}
# Maximum uncovered spacing
curve(d_cir_stat_Max_uncover(x), from = -3, to = 6, n = 2e2, ylim = c(0, 1))
curve(p_cir_stat_Max_uncover(x), n = 2e2, col = 2, add = TRUE)
# Number of uncovered spacing
curve(d_cir_stat_Num_uncover(x), from = -4, to = 4, n = 2e2, ylim = c(0, 1))
curve(p_cir_stat_Num_uncover(x), n = 2e2, col = 2, add = TRUE)
# Log gaps
curve(d_cir_stat_Log_gaps(x), from = -1, to = 4, n = 2e2, ylim = c(0, 1))
curve(p_cir_stat_Log_gaps(x), n = 2e2, col = 2, add = TRUE)
# Gine Fn
curve(d_cir_stat_Gine_Fn(x, method = "HBE"), to = 2.5, n = 2e2,
ylim = c(0, 2))
curve(p_cir_stat_Gine_Fn(x, method = "HBE"), n = 2e2, add = TRUE, col = 2)
# Gine Gn
curve(d_cir_stat_Gine_Gn(x, method = "HBE"), to = 2.5, n = 2e2,
ylim = c(0, 2))
curve(p_cir_stat_Gine_Gn(x, method = "HBE"), n = 2e2, add = TRUE, col = 2)
# Gini mean difference
curve(d_cir_stat_Gini(x), from = -4, to = 4, n = 2e2, ylim = c(0, 1))
curve(p_cir_stat_Gini(x), n = 2e2, col = 2, add = TRUE)
# Gini mean squared difference
curve(d_cir_stat_Gini_squared(x), from = -10, to = 10, n = 2e2,
ylim = c(0, 1))
curve(p_cir_stat_Gini_squared(x), n = 2e2, col = 2, add = TRUE)
# PAD
curve(d_cir_stat_PAD(x, method = "HBE"), to = 3, n = 2e2, ylim = c(0, 1.5))
curve(p_cir_stat_PAD(x, method = "HBE"), n = 2e2, add = TRUE, col = 2)
# PCvM
curve(d_cir_stat_PCvM(x, method = "HBE"), to = 4, n = 2e2, ylim = c(0, 2))
curve(p_cir_stat_PCvM(x, method = "HBE"), n = 2e2, add = TRUE, col = 2)
# PRt
curve(d_cir_stat_PRt(x, method = "HBE"), n = 2e2, ylim = c(0, 5))
curve(p_cir_stat_PRt(x, method = "HBE"), n = 2e2, add = TRUE, col = 2)
# Poisson
curve(d_cir_stat_Poisson(x, method = "HBE"), from = -1, to = 5, n = 2e2,
ylim = c(0, 2))
curve(p_cir_stat_Poisson(x, method = "HBE"), n = 2e2, col = 2, add = TRUE)
# Pycke
curve(d_cir_stat_Pycke(x), from = -5, to = 10, n = 2e2, ylim = c(0, 1))
curve(p_cir_stat_Pycke(x), n = 2e2, col = 2, add = TRUE)
# Pycke q
curve(d_cir_stat_Pycke_q(x, method = "HBE"), to = 15, n = 2e2,
ylim = c(0, 1))
curve(p_cir_stat_Pycke_q(x, method = "HBE"), n = 2e2, add = TRUE, col = 2)
# Range
curve(d_cir_stat_Range(x, n = 50), to = 2, n = 2e2, ylim = c(0, 4))
curve(p_cir_stat_Range(x, n = 50), n = 2e2, col = 2, add = TRUE)
# Rao
curve(d_cir_stat_Rao(x), from = -6, to = 6, n = 2e2, ylim = c(0, 1))
curve(p_cir_stat_Rao(x), n = 2e2, col = 2, add = TRUE)
# Rayleigh
curve(d_cir_stat_Rayleigh(x), to = 12, n = 2e2, ylim = c(0, 1))
curve(p_cir_stat_Rayleigh(x), n = 2e2, col = 2, add = TRUE)
# Riesz
curve(d_cir_stat_Riesz(x, method = "HBE"), to = 6, n = 2e2,
ylim = c(0, 1))
curve(p_cir_stat_Riesz(x, method = "HBE"), n = 2e2, add = TRUE, col = 2)
# Rothman
curve(d_cir_stat_Rothman(x, method = "HBE"), n = 2e2, ylim = c(0, 5))
curve(p_cir_stat_Rothman(x, method = "HBE"), n = 2e2, add = TRUE, col = 2)
# Vacancy
curve(d_cir_stat_Vacancy(x), from = -4, to = 4, n = 2e2, ylim = c(0, 1))
curve(p_cir_stat_Vacancy(x), n = 2e2, col = 2, add = TRUE)
# Watson
curve(d_cir_stat_Watson(x), to = 0.5, n = 2e2, ylim = c(0, 15))
curve(p_cir_stat_Watson(x), n = 2e2, col = 2, add = TRUE)
# Watson (1976)
curve(d_cir_stat_Watson_1976(x), to = 1.5, n = 2e2, ylim = c(0, 3))
curve(p_cir_stat_Watson_1976(x), n = 2e2, col = 2, add = TRUE)
# Softmax
curve(d_cir_stat_Softmax(x, method = "HBE"), to = 3, n = 2e2, ylim = c(0, 2))
curve(p_cir_stat_Softmax(x, method = "HBE"), n = 2e2, col = 2, add = TRUE)
# Sobolev
vk2 <- c(0.5, 0)
curve(d_cir_stat_Sobolev(x = x, vk2 = vk2), to = 3, n = 2e2, ylim = c(0, 2))
curve(p_cir_stat_Sobolev(x = x, vk2 = vk2), n = 2e2, col = 2, add = TRUE)
Asymptotic distributions for spherical uniformity statistics
Description
Computation of the asymptotic null distributions of spherical uniformity statistics.
Usage
p_sph_stat_Bingham(x, p)
d_sph_stat_Bingham(x, p)
p_sph_stat_CJ12(x, regime = 1L, beta = 0)
d_sph_stat_CJ12(x, regime = 3L, beta = 0)
p_sph_stat_Rayleigh(x, p)
d_sph_stat_Rayleigh(x, p)
p_sph_stat_Rayleigh_HD(x, p)
d_sph_stat_Rayleigh_HD(x, p)
p_sph_stat_Ajne(x, p, K_max = 1000, thre = 0, method = "I", ...)
d_sph_stat_Ajne(x, p, K_max = 1000, thre = 0, method = "I", ...)
p_sph_stat_Bakshaev(x, p, K_max = 1000, thre = 0, method = "I", ...)
d_sph_stat_Bakshaev(x, p, K_max = 1000, thre = 0, method = "I", ...)
p_sph_stat_Gine_Fn(x, p, K_max = 1000, thre = 0, method = "I", ...)
d_sph_stat_Gine_Fn(x, p, K_max = 1000, thre = 0, method = "I", ...)
p_sph_stat_Gine_Gn(x, p, K_max = 1000, thre = 0, method = "I", ...)
d_sph_stat_Gine_Gn(x, p, K_max = 1000, thre = 0, method = "I", ...)
p_sph_stat_PAD(x, p, K_max = 1000, thre = 0, method = "I", ...)
d_sph_stat_PAD(x, p, K_max = 1000, thre = 0, method = "I", ...)
p_sph_stat_PCvM(x, p, K_max = 1000, thre = 0, method = "I", ...)
d_sph_stat_PCvM(x, p, K_max = 1000, thre = 0, method = "I", ...)
p_sph_stat_Poisson(x, p, rho = 0.5, K_max = 1000, thre = 0,
method = "I", ...)
d_sph_stat_Poisson(x, p, rho = 0.5, K_max = 1000, thre = 0,
method = "I", ...)
p_sph_stat_PRt(x, p, t = 1/3, K_max = 1000, thre = 0, method = "I",
...)
d_sph_stat_PRt(x, p, t = 1/3, K_max = 1000, thre = 0, method = "I",
...)
p_sph_stat_Riesz(x, p, s = 1, K_max = 1000, thre = 0, method = "I",
...)
d_sph_stat_Riesz(x, p, s = 1, K_max = 1000, thre = 0, method = "I",
...)
p_sph_stat_Sobolev(x, p, vk2 = c(0, 0, 1), method = "I", ...)
d_sph_stat_Sobolev(x, p, vk2 = c(0, 0, 1), method = "I", ...)
p_sph_stat_Softmax(x, p, kappa = 1, K_max = 1000, thre = 0,
method = "I", ...)
d_sph_stat_Softmax(x, p, kappa = 1, K_max = 1000, thre = 0,
method = "I", ...)
p_sph_stat_Stereo(x, p, a = 0, K_max = 1000, method = "I", ...)
d_sph_stat_Stereo(x, p, a = 0, K_max = 1000, method = "I", ...)
Arguments
x |
a vector of size |
p |
integer giving the dimension of the ambient space |
regime |
type of asymptotic regime for the CJ12 test, either |
beta |
|
K_max |
integer giving the truncation of the series that compute the
asymptotic p-value of a Sobolev test. Defaults to |
thre |
error threshold for the tail probability given by the
the first terms of the truncated series of a Sobolev test. Defaults to
|
method |
method for approximating the density, distribution, or
quantile function of the weighted sum of chi squared random variables. Must
be |
... |
further parameters passed to |
rho |
|
t |
|
s |
|
vk2 |
weights for the finite Sobolev test. A non-negative vector or
matrix. Defaults to |
kappa |
|
a |
either:
|
Details
Descriptions and references on most of the asymptotic distributions are available in García-Portugués and Verdebout (2018).
Value
-
r_sph_stat_*
: a matrix of sizec(n, 1)
containing the sample. -
p_sph_stat_*
,d_sph_stat_*
: a matrix of sizec(nx, 1)
with the evaluation of the distribution or density functions atx
.
Examples
# Ajne
curve(d_sph_stat_Ajne(x, p = 3, method = "HBE"), n = 2e2, ylim = c(0, 4))
curve(p_sph_stat_Ajne(x, p = 3, method = "HBE"), n = 2e2, col = 2,
add = TRUE)
# Bakshaev
curve(d_sph_stat_Bakshaev(x, p = 3, method = "HBE"), to = 5, n = 2e2,
ylim = c(0, 2))
curve(p_sph_stat_Bakshaev(x, p = 3, method = "HBE"), n = 2e2, col = 2,
add = TRUE)
# Bingham
curve(d_sph_stat_Bingham(x, p = 3), to = 20, n = 2e2, ylim = c(0, 1))
curve(p_sph_stat_Bingham(x, p = 3), n = 2e2, col = 2, add = TRUE)
# CJ12
curve(d_sph_stat_CJ12(x, regime = 1), from = -10, to = 10, n = 2e2,
ylim = c(0, 1))
curve(d_sph_stat_CJ12(x, regime = 2, beta = 0.1), n = 2e2, col = 2,
add = TRUE)
curve(d_sph_stat_CJ12(x, regime = 3), n = 2e2, col = 3, add = TRUE)
curve(p_sph_stat_CJ12(x, regime = 1), n = 2e2, col = 1, add = TRUE)
curve(p_sph_stat_CJ12(x, regime = 2, beta = 0.1), n = 2e2, col = 2,
add = TRUE)
curve(p_sph_stat_CJ12(x, regime = 3), col = 3, add = TRUE)
# Gine Fn
curve(d_sph_stat_Gine_Fn(x, p = 3, method = "HBE"), to = 2, n = 2e2,
ylim = c(0, 2))
curve(p_sph_stat_Gine_Fn(x, p = 3, method = "HBE"), n = 2e2, col = 2,
add = TRUE)
# Gine Gn
curve(d_sph_stat_Gine_Gn(x, p = 3, method = "HBE"), to = 1.5, n = 2e2,
ylim = c(0, 2.5))
curve(p_sph_stat_Gine_Gn(x, p = 3, method = "HBE"), n = 2e2, col = 2,
add = TRUE)
# PAD
curve(d_sph_stat_PAD(x, p = 3, method = "HBE"), to = 3, n = 2e2,
ylim = c(0, 1.5))
curve(p_sph_stat_PAD(x, p = 3, method = "HBE"), n = 2e2, col = 2,
add = TRUE)
# PCvM
curve(d_sph_stat_PCvM(x, p = 3, method = "HBE"), to = 0.6, n = 2e2,
ylim = c(0, 7))
curve(p_sph_stat_PCvM(x, p = 3, method = "HBE"), n = 2e2, col = 2,
add = TRUE)
# Poisson
curve(d_sph_stat_Poisson(x, p = 3, method = "HBE"), to = 2, n = 2e2,
ylim = c(0, 2))
curve(p_sph_stat_Poisson(x, p = 3, method = "HBE"), n = 2e2, col = 2,
add = TRUE)
# PRt
curve(d_sph_stat_PRt(x, p = 3, method = "HBE"), n = 2e2, ylim = c(0, 5))
curve(p_sph_stat_PRt(x, p = 3, method = "HBE"), n = 2e2, col = 2, add = TRUE)
# Rayleigh
curve(d_sph_stat_Rayleigh(x, p = 3), to = 15, n = 2e2, ylim = c(0, 1))
curve(p_sph_stat_Rayleigh(x, p = 3), n = 2e2, col = 2, add = TRUE)
# HD-standardized Rayleigh
curve(d_sph_stat_Rayleigh_HD(x, p = 3), from = -4, to = 4, n = 2e2,
ylim = c(0, 1))
curve(p_sph_stat_Rayleigh_HD(x, p = 3), n = 2e2, col = 2, add = TRUE)
# Riesz
curve(d_sph_stat_Riesz(x, p = 3, method = "HBE"), n = 2e2, from = 0, to = 5,
ylim = c(0, 2))
curve(p_sph_stat_Riesz(x, p = 3, method = "HBE"), n = 2e2, col = 2,
add = TRUE)
# Sobolev
x <- seq(-1, 5, by = 0.05)
vk2 <- diag(rep(0.3, 2))
matplot(x, d_sph_stat_Sobolev(x = x, vk2 = vk2, p = 3), type = "l",
ylim = c(0, 1), lty = 1)
matlines(x, p_sph_stat_Sobolev(x = x, vk2 = vk2, p = 3), lty = 1)
matlines(x, d_sph_stat_Sobolev(x = x, vk2 = vk2 + 0.01, p = 3), lty = 2)
matlines(x, p_sph_stat_Sobolev(x = x, vk2 = vk2 + 0.01, p = 3), lty = 2)
# Softmax
curve(d_sph_stat_Softmax(x, p = 3, method = "HBE"), to = 2, n = 2e2,
ylim = c(0, 2))
curve(p_sph_stat_Softmax(x, p = 3, method = "HBE"), n = 2e2, col = 2,
add = TRUE)
# Stereo
curve(d_sph_stat_Stereo(x, p = 4, method = "HBE"), from=-5,to = 10, n = 2e2,
ylim = c(0, 2))
curve(p_sph_stat_Stereo(x, p = 4, method = "HBE"), n = 2e2, col = 2,
add = TRUE)
Planet orbits
Description
Planet orbits data from the
JPL Keplerian Elements for Approximate Positions of the Major Planets.
The normal vector of a planet orbit represents is a vector on S^2
.
Usage
planets
Format
A data frame with 9 rows and 3 variables:
- planet
names of the planets and Pluto.
- i
inclination; the orbit's plane angle with respect to the ecliptic plane, in radians in
[0, \pi]
.- om
longitude of the ascending node; the counterclockwise angle from the vector pointing to the First Point of Aries and that pointing to the ascending node (the intersection between orbit and ecliptic plane), in radians in
[0, 2\pi)
. (Both vectors are heliocentric and within the ecliptic plane.)
Details
The normal vector to the ecliptic plane of the planet with inclination
i
and longitude of the ascending node \omega
is
(\sin(i) \sin(\omega), -\sin(i) \cos(\omega), \cos(i))'.
The script performing the data preprocessing is available at
planets.R
. The data was retrieved on 2020-05-16.
Source
Table 2a in https://ssd.jpl.nasa.gov/planets/approx_pos.html
Examples
# Load data
data("planets")
# Add normal vectors
planets$normal <- cbind(sin(planets$i) * sin(planets$om),
-sin(planets$i) * cos(planets$om),
cos(planets$i))
# Tests to be performed
type_tests <- c("PCvM", "PAD", "PRt")
# Tests with Pluto
unif_test(data = planets$normal, type = type_tests, p_value = "MC")
# Tests without Pluto
unif_test(data = planets$normal[-9, ], type = type_tests, p_value = "MC")
Projection of the spherical uniform distribution
Description
Density, distribution, and quantile functions of the
projection of the spherical uniform random variable on an arbitrary
direction, that is, the random variable
\boldsymbol{\gamma}'{\bf X}
, where {\bf X}
is uniformly distributed on the (hyper)sphere
S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}
, p\ge 2
, and
\boldsymbol{\gamma}\in S^{p-1}
is an
arbitrary projection direction. Note that the distribution is
invariant to the choice of \boldsymbol{\gamma}
. Also,
efficient simulation of \boldsymbol{\gamma}'{\bf X}
.
Usage
d_proj_unif(x, p, log = FALSE)
p_proj_unif(x, p, log = FALSE)
q_proj_unif(u, p)
r_proj_unif(n, p)
Arguments
x |
a vector of size |
p |
integer giving the dimension of the ambient space |
log |
compute the logarithm of the density or distribution? |
u |
vector of probabilities. |
n |
sample size. |
Value
A matrix of size c(nx, 1)
with the evaluation of the
density, distribution, or quantile function at x
or u
.
For r_proj_unif
, a random vector of size n
.
Author(s)
Eduardo García-Portugués and Paula Navarro-Esteban.
Examples
# Density function
curve(d_proj_unif(x, p = 2), from = -2, to = 2, n = 2e2, ylim = c(0, 2))
curve(d_proj_unif(x, p = 3), n = 2e2, col = 2, add = TRUE)
curve(d_proj_unif(x, p = 4), n = 2e2, col = 3, add = TRUE)
curve(d_proj_unif(x, p = 5), n = 2e2, col = 4, add = TRUE)
curve(d_proj_unif(x, p = 6), n = 2e2, col = 5, add = TRUE)
# Distribution function
curve(p_proj_unif(x, p = 2), from = -2, to = 2, n = 2e2, ylim = c(0, 1))
curve(p_proj_unif(x, p = 3), n = 2e2, col = 2, add = TRUE)
curve(p_proj_unif(x, p = 4), n = 2e2, col = 3, add = TRUE)
curve(p_proj_unif(x, p = 5), n = 2e2, col = 4, add = TRUE)
curve(p_proj_unif(x, p = 6), n = 2e2, col = 5, add = TRUE)
# Quantile function
curve(q_proj_unif(u = x, p = 2), from = 0, to = 1, n = 2e2, ylim = c(-1, 1))
curve(q_proj_unif(u = x, p = 3), n = 2e2, col = 2, add = TRUE)
curve(q_proj_unif(u = x, p = 4), n = 2e2, col = 3, add = TRUE)
curve(q_proj_unif(u = x, p = 5), n = 2e2, col = 4, add = TRUE)
curve(q_proj_unif(u = x, p = 6), n = 2e2, col = 5, add = TRUE)
# Sampling
hist(r_proj_unif(n = 1e4, p = 4), freq = FALSE, breaks = 50)
curve(d_proj_unif(x, p = 4), n = 2e2, col = 3, add = TRUE)
Sample non-uniformly distributed spherical data
Description
Simple simulation of prespecified non-uniform spherical distributions: von Mises–Fisher (vMF), Mixture of vMF (MvMF), Angular Central Gaussian (ACG), Small Circle (SC), Watson (W), Cauchy-like (C), Mixture of Cauchy-like (MC), or Uniform distribution with Antipodal-Dependent observations (UAD).
Usage
r_alt(n, p, M = 1, alt = "vMF", mu = c(rep(0, p - 1), 1), kappa = 1,
nu = 0.5, F_inv = NULL, K = 1000, axial_mix = TRUE)
Arguments
n |
sample size. |
p |
integer giving the dimension of the ambient space |
M |
number of samples of size |
alt |
alternative, must be |
mu |
location parameter for |
kappa |
non-negative parameter measuring the strength of the deviation
with respect to uniformity (obtained with |
nu |
projection along |
F_inv |
quantile function returned by |
K |
number of equispaced points on |
axial_mix |
use a mixture of von Mises–Fisher or Cauchy-like that is
axial (i.e., symmetrically distributed about the origin)? Defaults to
|
Details
The parameter kappa
is used as \kappa
in the following
distributions:
-
"vMF"
: von Mises–Fisher distribution with concentration\kappa
and directional mean\boldsymbol{\mu}
. -
"MvMF"
: equally-weighted mixture ofp
von Mises–Fisher distributions with common concentration\kappa
and directional means\pm{\bf e}_1, \ldots, \pm{\bf e}_p
ifaxial_mix = TRUE
. Ifaxial_mix = FALSE
, then only means with positive signs are considered. -
"ACG"
: Angular Central Gaussian distribution with diagonal shape matrix with diagonal given by(1, \ldots, 1, 1 + \kappa) / (p + \kappa).
-
"SC"
: Small Circle distribution with axis mean\boldsymbol{\mu}
and concentration\kappa
about the projection along the mean,\nu
. -
"W"
: Watson distribution with axis mean\boldsymbol{\mu}
and concentration\kappa
. The Watson distribution is a particular case of the Bingham distribution. -
"C"
: Cauchy-like distribution with directional mode\boldsymbol{\mu}
and concentration\kappa = \rho / (1 - \rho^2)
. The circular Wrapped Cauchy distribution is a particular case of this Cauchy-like distribution. -
"MC"
: equally-weighted mixture ofp
Cauchy-like distributions with common concentration\kappa
and directional means\pm{\bf e}_1, \ldots, \pm{\bf e}_p
ifaxial_mix = TRUE
. Ifaxial_mix = FALSE
, then only means with positive signs are considered.
The alternative "UAD"
generates a sample formed by
\lceil n/2\rceil
observations drawn uniformly on S^{p-1}
and the remaining observations drawn from a uniform spherical cap
distribution of angle \pi-\kappa
about each of the
\lceil n/2\rceil
observations (see unif_cap
). Hence,
kappa = 0
corresponds to a spherical cap covering the whole sphere and
kappa = pi
is a one-point degenerate spherical cap.
Much faster sampling for "SC"
, "W"
, "C"
, and "MC"
is achieved providing F_inv
; see examples.
Value
An array of size c(n, p, M)
with M
random
samples of size n
of non-uniformly-generated directions on
S^{p-1}
.
Examples
## Simulation with p = 2
p <- 2
n <- 50
kappa <- 20
nu <- 0.5
angle <- pi / 10
rho <- ((2 * kappa + 1) - sqrt(4 * kappa + 1)) / (2 * kappa)
F_inv_SC_2 <- F_inv_from_f(f = function(z) exp(-kappa * (z - nu)^2), p = 2)
F_inv_W_2 <- F_inv_from_f(f = function(z) exp(kappa * z^2), p = 2)
F_inv_C_2 <- F_inv_from_f(f = function(z) (1 - rho^2) /
(1 + rho^2 - 2 * rho * z)^(p / 2), p = 2)
x1 <- r_alt(n = n, p = p, alt = "vMF", kappa = kappa)[, , 1]
x2 <- r_alt(n = n, p = p, alt = "C", F_inv = F_inv_C_2)[, , 1]
x3 <- r_alt(n = n, p = p, alt = "SC", F_inv = F_inv_SC_2)[, , 1]
x4 <- r_alt(n = n, p = p, alt = "ACG", kappa = kappa)[, , 1]
x5 <- r_alt(n = n, p = p, alt = "W", F_inv = F_inv_W_2)[, , 1]
x6 <- r_alt(n = n, p = p, alt = "MvMF", kappa = kappa)[, , 1]
x7 <- r_alt(n = n, p = p, alt = "MC", kappa = kappa)[, , 1]
x8 <- r_alt(n = n, p = p, alt = "UAD", kappa = 1 - angle)[, , 1]
r <- runif(n, 0.95, 1.05) # Radius perturbation to improve visualization
plot(r * x1, pch = 16, xlim = c(-1.1, 1.1), ylim = c(-1.1, 1.1), col = 1)
points(r * x2, pch = 16, col = 2)
points(r * x3, pch = 16, col = 3)
plot(r * x4, pch = 16, xlim = c(-1.1, 1.1), ylim = c(-1.1, 1.1), col = 1)
points(r * x5, pch = 16, col = 2)
points(r * x6, pch = 16, col = 3)
points(r * x7, pch = 16, col = 4)
col <- rep(rainbow(n / 2), 2)
plot(r * x8, pch = 16, xlim = c(-1.1, 1.1), ylim = c(-1.1, 1.1), col = col)
for (i in seq(1, n, by = 2)) lines((r * x8)[i + 0:1, ], col = col[i])
## Simulation with p = 3
n <- 50
p <- 3
kappa <- 20
angle <- pi / 10
nu <- 0.5
rho <- ((2 * kappa + 1) - sqrt(4 * kappa + 1)) / (2 * kappa)
F_inv_SC_3 <- F_inv_from_f(f = function(z) exp(-kappa * (z - nu)^2), p = 3)
F_inv_W_3 <- F_inv_from_f(f = function(z) exp(kappa * z^2), p = 3)
F_inv_C_3 <- F_inv_from_f(f = function(z) (1 - rho^2) /
(1 + rho^2 - 2 * rho * z)^(p / 2), p = 3)
x1 <- r_alt(n = n, p = p, alt = "vMF", kappa = kappa)[, , 1]
x2 <- r_alt(n = n, p = p, alt = "C", F_inv = F_inv_C_3)[, , 1]
x3 <- r_alt(n = n, p = p, alt = "SC", F_inv = F_inv_SC_3)[, , 1]
x4 <- r_alt(n = n, p = p, alt = "ACG", kappa = kappa)[, , 1]
x5 <- r_alt(n = n, p = p, alt = "W", F_inv = F_inv_W_3)[, , 1]
x6 <- r_alt(n = n, p = p, alt = "MvMF", kappa = kappa)[, , 1]
x7 <- r_alt(n = n, p = p, alt = "MC", kappa = kappa)[, , 1]
x8 <- r_alt(n = n, p = p, alt = "UAD", kappa = 1 - angle)[, , 1]
s3d <- scatterplot3d::scatterplot3d(x1, pch = 16, xlim = c(-1.1, 1.1),
ylim = c(-1.1, 1.1), zlim = c(-1.1, 1.1))
s3d$points3d(x2, pch = 16, col = 2)
s3d$points3d(x3, pch = 16, col = 3)
s3d <- scatterplot3d::scatterplot3d(x4, pch = 16, xlim = c(-1.1, 1.1),
ylim = c(-1.1, 1.1), zlim = c(-1.1, 1.1))
s3d$points3d(x5, pch = 16, col = 2)
s3d$points3d(x6, pch = 16, col = 3)
s3d$points3d(x7, pch = 16, col = 4)
col <- rep(rainbow(n / 2), 2)
s3d <- scatterplot3d::scatterplot3d(x8, pch = 16, xlim = c(-1.1, 1.1),
ylim = c(-1.1, 1.1), zlim = c(-1.1, 1.1),
color = col)
for (i in seq(1, n, by = 2)) s3d$points3d(x8[i + 0:1, ], col = col[i],
type = "l")
Sample uniformly distributed circular and spherical data
Description
Simulation of the uniform distribution on [0, 2\pi)
and
S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}
, p\ge 2
.
Usage
r_unif_cir(n, M = 1L, sorted = FALSE)
r_unif_sph(n, p, M = 1L)
Arguments
n |
sample size. |
M |
number of samples of size |
sorted |
return each circular sample sorted? Defaults to |
p |
integer giving the dimension of the ambient space |
Value
-
r_unif_cir
: a matrix of sizec(n, M)
withM
random samples of sizen
of uniformly-generated circular data on[0, 2\pi)
. -
r_unif_sph
: an array of sizec(n, p, M)
withM
random samples of sizen
of uniformly-generated directions onS^{p-1}
.
Examples
# A sample on [0, 2*pi)
n <- 5
r_unif_cir(n = n)
# A sample on S^1
p <- 2
samp <- r_unif_sph(n = n, p = p)
samp
rowSums(samp^2)
# A sample on S^2
p <- 3
samp <- r_unif_sph(n = n, p = p)
samp
rowSums(samp^2)
Rhea craters from Hirata (2016)
Description
Craters on Rhea from Hirata (2016).
Usage
rhea
Format
A data frame with 3596 rows and 4 variables:
- name
name of the crater (if named).
- diameter
diameter of the crater (in km).
- theta
longitude angle
\theta \in [0, 2\pi)
of the crater center.- phi
latitude angle
\phi \in [-\pi/2, \pi/2]
of the crater center.
Details
The (\theta, \phi)
angles are such their associated planetocentric
coordinates are:
(\cos(\phi) \cos(\theta), \cos(\phi) \sin(\theta), \sin(\phi))',
with (0, 0, 1)'
denoting the north pole.
The script performing the data preprocessing is available at
rhea.R
.
Source
References
Hirata, N. (2016) Differential impact cratering of Saturn's satellites by heliocentric impactors. Journal of Geophysical Research: Planets, 121:111–117. doi:10.1002/2015JE004940
Examples
# Load data
data("rhea")
# Add Cartesian coordinates
rhea$X <- cbind(cos(rhea$theta) * cos(rhea$phi),
sin(rhea$theta) * cos(rhea$phi),
sin(rhea$phi))
# Tests
unif_test(data = rhea$X[rhea$diam > 15 & rhea$diam < 20, ],
type = c("PCvM", "PAD", "PRt"), p_value = "asymp")
Sort the columns of a matrix
Description
Convenience functions to sort the columns of a matrix in an increasing way.
Usage
sort_each_col(A)
sort_index_each_col(A)
Arguments
A |
a matrix of arbitrary dimensions. |
Value
A matrix with the same dimensions as A
such that each
column is sorted increasingly (for sort_each_col
) or contains the
sorting indexes (for sort_index_each_col
).
sort_index_each_col
.
Statistics for testing (hyper)spherical uniformity
Description
Low-level implementation of several statistics for assessing
uniformity on the (hyper)sphere
S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}
, p\ge 2
.
Usage
sph_stat_Rayleigh(X)
sph_stat_Bingham(X)
sph_stat_Ajne(X, Psi_in_X = FALSE)
sph_stat_Gine_Gn(X, Psi_in_X = FALSE, p = 0L)
sph_stat_Gine_Fn(X, Psi_in_X = FALSE, p = 0L)
sph_stat_Pycke(X, Psi_in_X = FALSE, p = 0L)
sph_stat_Bakshaev(X, Psi_in_X = FALSE, p = 0L)
sph_stat_Riesz(X, Psi_in_X = FALSE, p = 0L, s = 1)
sph_stat_PCvM(X, Psi_in_X = FALSE, p = 0L, N = 160L, L = 1000L)
sph_stat_PRt(X, Psi_in_X = FALSE, p = 0L, t = 1/3, N = 160L,
L = 1000L)
sph_stat_PAD(X, Psi_in_X = FALSE, p = 0L, N = 160L, L = 1000L)
sph_stat_Poisson(X, Psi_in_X = FALSE, p = 0L, rho = 0.5)
sph_stat_Softmax(X, Psi_in_X = FALSE, p = 0L, kappa = 1)
sph_stat_Stereo(X, Psi_in_X = FALSE, p = 0L, a = 0)
sph_stat_CCF09(X, dirs, K_CCF09 = 25L, original = FALSE)
sph_stat_Rayleigh_HD(X)
sph_stat_CJ12(X, Psi_in_X = FALSE, p = 0L, regime = 3L)
Arguments
X |
an array of size |
Psi_in_X |
does |
p |
integer giving the dimension of the ambient space |
s |
|
N |
number of points used in the
Gauss-Legendre quadrature. Defaults to
|
L |
number of discretization points to interpolate angular functions
that require evaluating an integral. Defaults to |
t |
|
rho |
|
kappa |
|
a |
either:
|
dirs |
a matrix of size |
K_CCF09 |
integer giving the truncation of the series present in the
asymptotic distribution of the Kolmogorov-Smirnov statistic. Defaults to
|
original |
return the CCF09 statistic as originally defined?
If |
regime |
type of asymptotic regime for the CJ12 test, either |
Details
Detailed descriptions and references of the statistics are available in García-Portugués and Verdebout (2018).
The Pycke and CJ12 statistics employ the scalar products matrix,
rather than the shortest angles matrix, when Psi_in_X = TRUE
. This
matrix is obtained by setting scalar_prod = TRUE
in
Psi_mat
.
Value
A matrix of size c(M, 1)
containing the statistics for each
of the M
samples.
Warning
Be careful on avoiding the next bad usages of the functions, which will produce spurious results:
The directions in
X
do not have unit norm.-
X
does not containPsi_mat(X)
whenX_in_Theta = TRUE
. The parameter
p
does not match with the dimension ofR^p
.-
Not passing the scalar products matrix to
sph_stat_CJ12
whenPsi_in_X = TRUE
. The directions in
dirs
do not have unit norm.
References
García-Portugués, E. and Verdebout, T. (2018) An overview of uniformity tests on the hypersphere. arXiv:1804.00286. doi:10.48550/arXiv.1804.00286.
Examples
## Sample uniform spherical data
M <- 2
n <- 100
p <- 3
set.seed(123456789)
X <- r_unif_sph(n = n, p = p, M = M)
## Sobolev tests
# Rayleigh
sph_stat_Rayleigh(X)
# Bingham
sph_stat_Bingham(X)
# Ajne
Psi <- Psi_mat(X)
dim(Psi) <- c(dim(Psi), 1)
sph_stat_Ajne(X)
sph_stat_Ajne(Psi, Psi_in_X = TRUE)
# Gine Gn
sph_stat_Gine_Gn(X)
sph_stat_Gine_Gn(Psi, Psi_in_X = TRUE, p = p)
# Gine Fn
sph_stat_Gine_Fn(X)
sph_stat_Gine_Fn(Psi, Psi_in_X = TRUE, p = p)
# Pycke
sph_stat_Pycke(X)
sph_stat_Pycke(Psi, Psi_in_X = TRUE, p = p)
# Bakshaev
sph_stat_Bakshaev(X)
sph_stat_Bakshaev(Psi, Psi_in_X = TRUE, p = p)
# Riesz
sph_stat_Riesz(X, s = 1)
sph_stat_Riesz(Psi, Psi_in_X = TRUE, p = p, s = 1)
# Projected Cramér-von Mises
sph_stat_PCvM(X)
sph_stat_PCvM(Psi, Psi_in_X = TRUE, p = p)
# Projected Rothman
sph_stat_PRt(X)
sph_stat_PRt(Psi, Psi_in_X = TRUE, p = p)
# Projected Anderson-Darling
sph_stat_PAD(X)
sph_stat_PAD(Psi, Psi_in_X = TRUE, p = p)
## Other tests
# CCF09
dirs <- r_unif_sph(n = 3, p = p, M = 1)[, , 1]
sph_stat_CCF09(X, dirs = dirs)
## High-dimensional tests
# Rayleigh HD-Standardized
sph_stat_Rayleigh_HD(X)
# CJ12
sph_stat_CJ12(X, regime = 1)
sph_stat_CJ12(Psi, regime = 1, Psi_in_X = TRUE, p = p)
sph_stat_CJ12(X, regime = 2)
sph_stat_CJ12(Psi, regime = 2, Psi_in_X = TRUE, p = p)
sph_stat_CJ12(X, regime = 3)
sph_stat_CJ12(Psi, regime = 3, Psi_in_X = TRUE, p = p)
Finite Sobolev statistics for testing (hyper)spherical uniformity
Description
Computes the finite Sobolev statistic
S_{n, p}(\{b_{k, p}\}_{k=1}^K) = \sum_{i, j = 1}^n
\sum_{k = 1}^K b_{k, p}C_k^(p / 2 - 1)(\cos^{-1}({\bf X}_i'{\bf X}_j)),
for a sequence \{b_{k, p}\}_{k = 1}^K
of non-negative weights. For
p = 2
, the Gegenbauer polynomials are replaced by Chebyshev ones.
Usage
sph_stat_Sobolev(X, Psi_in_X = FALSE, p = 0, vk2 = c(0, 0, 1))
cir_stat_Sobolev(Theta, Psi_in_Theta = FALSE, vk2 = c(0, 0, 1))
Arguments
X |
an array of size |
Psi_in_X |
does |
p |
integer giving the dimension of the ambient space |
vk2 |
weights for the finite Sobolev test. A non-negative vector or
matrix. Defaults to |
Theta |
a matrix of size |
Psi_in_Theta |
does |
Value
A matrix of size c(M, ncol(vk2))
containing the statistics for
each of the M
samples.
Uniform spherical cap distribution
Description
Density, simulation, and associated functions for a uniform
distribution within a spherical cap of angle 0\leq \alpha\leq \pi
about
a direction \boldsymbol{\mu}
on S^{p-1}:=\{\mathbf{x} \in
R^p:||\mathbf{x}||=1\}
, p \geq 2
. The
density at \mathbf{x} \in S^{p-1}
is given by
c_{p,r} 1_{[1 - r, 1]}(\mathbf{x}' \boldsymbol{\mu}) \quad
\mathrm{with}\quad c_{p,r} := \omega_{p}\left[1 - F_p(1 - r)\right],
where
r=\cos(\alpha)
is the projected radius of the spherical cap about
\boldsymbol{\mu}
, \omega_p
is the surface area of
S^{p-1}
, and F_p
is the projected uniform distribution (see
p_proj_unif
).
The angular function of the uniform cap distribution is
g(t):=1_{[1 - r, 1]}(t)
. The associated
projected density is \tilde{g}(t):=\omega_{p-1} c_{p,r}
(1 - t^2)^{(p - 3) / 2} 1_{[1 - r, 1]}(t)
.
Usage
d_unif_cap(x, mu, angle = pi/10)
c_unif_cap(p, angle = pi/10)
r_unif_cap(n, mu, angle = pi/10)
p_proj_unif_cap(x, p, angle = pi/10)
q_proj_unif_cap(u, p, angle = pi/10)
d_proj_unif_cap(x, p, angle = pi/10, scaled = TRUE)
r_proj_unif_cap(n, p, angle = pi/10)
Arguments
x |
locations to evaluate the density or distribution. For
|
mu |
the directional mean |
angle |
angle |
p |
integer giving the dimension of the ambient space |
n |
sample size, a positive integer. |
u |
vector of probabilities. |
scaled |
whether to scale the angular function by the normalizing
constant. Defaults to |
Value
Depending on the function:
-
d_unif_cap
: a vector of lengthnx
or1
with the evaluated density atx
. -
r_unif_cap
: a matrix of sizec(n, p)
with the random sample. -
c_unif_cap
: the normalizing constant. -
p_proj_unif_cap
: a vector of lengthx
with the evaluated distribution function atx
. -
q_proj_unif_cap
: a vector of lengthu
with the evaluated quantile function atu
. -
d_proj_unif_cap
: a vector of sizenx
with the evaluated angular function. -
r_proj_unif_cap
: a vector of lengthn
containing simulated values from the cosines density associated to the angular function.
Author(s)
Alberto Fernández-de-Marcos and Eduardo García-Portugués.
Examples
# Simulation and density evaluation for p = 2
mu <- c(0, 1)
angle <- pi / 5
n <- 1e2
x <- r_unif_cap(n = n, mu = mu, angle = angle)
col <- viridisLite::viridis(n)
r_noise <- runif(n, 0.95, 1.05) # Perturbation to improve visualization
color <- col[rank(d_unif_cap(x = x, mu = mu, angle = angle))]
plot(r_noise * x, pch = 16, col = color, xlim = c(-1, 1), ylim = c(-1, 1))
# Simulation and density evaluation for p = 3
mu <- c(0, 0, 1)
angle <- pi / 5
x <- r_unif_cap(n = n, mu = mu, angle = angle)
color <- col[rank(d_unif_cap(x = x, mu = mu, angle = angle))]
scatterplot3d::scatterplot3d(x, size = 5, xlim = c(-1, 1), ylim = c(-1, 1),
zlim = c(-1, 1), color = color)
# Simulated data from the cosines density
n <- 1e3
p <- 3
angle <- pi / 3
hist(r_proj_unif_cap(n = n, p = p, angle = angle),
breaks = seq(cos(angle), 1, l = 10), probability = TRUE,
main = "Simulated data from proj_unif_cap", xlab = "t", xlim = c(-1, 1))
t <- seq(-1, 1, by = 0.01)
lines(t, d_proj_unif_cap(x = t, p = p, angle = angle), col = "red")
Circular and (hyper)spherical uniformity statistics
Description
Implementation of several statistics for assessing uniformity
on the (hyper)sphere
S^{p-1} := \{{\bf x} \in R^p : ||{\bf x}|| = 1\}
, p\ge 2
, for a sample
{\bf X}_1,\ldots,{\bf X}_n\in S^{p-1}
.
unif_stat
receives a (several) sample(s) of directions in
Cartesian coordinates, except for the circular case (p=2
) in
which the sample(s) can be angles
\Theta_1,\ldots,\Theta_n\in [0, 2\pi)
.
unif_stat
allows to compute several statistics to several samples
within a single call, facilitating thus Monte Carlo experiments.
Usage
unif_stat(data, type = "all", data_sorted = FALSE, CCF09_dirs = NULL,
CJ12_reg = 3, cov_a = 2 * pi, Cressie_t = 1/3, K_CCF09 = 25,
Poisson_rho = 0.5, Pycke_q = 0.5, Rayleigh_m = 1, Riesz_s = 1,
Rothman_t = 1/3, Sobolev_vk2 = c(0, 0, 1), Softmax_kappa = 1,
Stereo_a = 0)
Arguments
data |
sample to compute the test statistic. An array of size
|
type |
type of test to be applied. A character vector containing any of
the following types of tests, depending on the dimension
If |
data_sorted |
is the circular data sorted? If |
CCF09_dirs |
a matrix of size |
CJ12_reg |
type of asymptotic regime for CJ12 test, either |
cov_a |
|
Cressie_t |
|
K_CCF09 |
integer giving the truncation of the series present in the
asymptotic distribution of the Kolmogorov-Smirnov statistic. Defaults to
|
Poisson_rho |
|
Pycke_q |
|
Rayleigh_m |
integer |
Riesz_s |
|
Rothman_t |
|
Sobolev_vk2 |
weights for the finite Sobolev test. A non-negative
vector or matrix. Defaults to |
Softmax_kappa |
|
Stereo_a |
|
Details
Except for CCF09_dirs
, K_CCF09
, and CJ12_reg
, all the
test-specific parameters are vectorized.
Descriptions and references for most of the statistics are available in García-Portugués and Verdebout (2018).
Value
A data frame of size c(M, length(type))
, with column names
given by type
, that contains the values of the test statistics.
References
García-Portugués, E. and Verdebout, T. (2018) An overview of uniformity tests on the hypersphere. arXiv:1804.00286. doi:10.48550/arXiv.1804.00286.
Examples
## Circular data
# Sample
n <- 10
M <- 2
Theta <- r_unif_cir(n = n, M = M)
# Matrix
unif_stat(data = Theta, type = "all")
# Array
unif_stat(data = array(Theta, dim = c(n, 1, M)), type = "all")
# Vector
unif_stat(data = Theta[, 1], type = "all")
## Spherical data
# Circular sample in Cartesian coordinates
n <- 10
M <- 2
X <- array(dim = c(n, 2, M))
for (i in 1:M) X[, , i] <- cbind(cos(Theta[, i]), sin(Theta[, i]))
# Array
unif_stat(data = X, type = "all")
# High-dimensional data
X <- r_unif_sph(n = n, p = 3, M = M)
unif_stat(data = X, type = "all")
## Specific arguments
# Rothman
unif_stat(data = Theta, type = "Rothman", Rothman_t = 0.5)
# CCF09
unif_stat(data = X, type = "CCF09", CCF09_dirs = X[, , 1])
unif_stat(data = X, type = "CCF09", CCF09_dirs = X[, , 1], K_CCF09 = 1)
# CJ12
unif_stat(data = X, type = "CJ12", CJ12_reg = 3)
unif_stat(data = X, type = "CJ12", CJ12_reg = 1)
Monte Carlo simulation of circular and (hyper)spherical uniformity statistics
Description
Utility for performing Monte Carlo simulation of several
statistics for assessing uniformity on the (hyper)sphere
S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}
, p\ge 2
.
unif_stat_MC
provides a convenient wrapper for parallel
evaluation of unif_stat
, the estimation of critical values under the
null distribution, and the computation of empirical powers under the
alternative.
Usage
unif_stat_MC(n, type = "all", p, M = 10000, r_H1 = NULL,
crit_val = NULL, alpha = c(0.1, 0.05, 0.01), return_stats = TRUE,
stats_sorted = FALSE, chunks = ceiling((n * M)/1e+05), cores = 1,
seeds = NULL, CCF09_dirs = NULL, CJ12_reg = 3, cov_a = 2 * pi,
Cressie_t = 1/3, K_CCF09 = 25, Poisson_rho = 0.5, Pycke_q = 0.5,
Rayleigh_m = 1, Riesz_s = 1, Rothman_t = 1/3, Sobolev_vk2 = c(0, 0,
1), Softmax_kappa = 1, Stereo_a = 0, ...)
Arguments
n |
sample size. |
type |
type of test to be applied. A character vector containing any of
the following types of tests, depending on the dimension
If |
p |
integer giving the dimension of the ambient space |
M |
number of Monte Carlo replications. Defaults to |
r_H1 |
if provided, the computation of empirical powers is
carried out for the alternative hypothesis sampled with |
crit_val |
if provided, must be the critical values as returned by
|
alpha |
vector with significance levels. Defaults to
|
return_stats |
return the Monte Carlo statistics? If only the critical
values or powers are desired, |
stats_sorted |
sort the returned Monte Carlo statistics? If
|
chunks |
number of chunks to split the |
cores |
number of cores to perform the simulation. Defaults to |
seeds |
if provided, a vector of size |
CCF09_dirs |
a matrix of size |
CJ12_reg |
type of asymptotic regime for CJ12 test, either |
cov_a |
|
Cressie_t |
|
K_CCF09 |
integer giving the truncation of the series present in the
asymptotic distribution of the Kolmogorov-Smirnov statistic. Defaults to
|
Poisson_rho |
|
Pycke_q |
|
Rayleigh_m |
integer |
Riesz_s |
|
Rothman_t |
|
Sobolev_vk2 |
weights for the finite Sobolev test. A non-negative
vector or matrix. Defaults to |
Softmax_kappa |
|
Stereo_a |
|
... |
optional arguments to be passed to the |
Details
It is possible to have a progress bar if unif_stat_MC
is wrapped with
progressr::with_progress
or if
progressr::handlers(global = TRUE)
is invoked (once) by the user.
See the examples below. The progress bar is updated with the number of
finished chunks.
All the tests reject for large values of the test statistic
(max_gap = TRUE
is assumed for the Range test), so the critical
values for the significance levels alpha
correspond to the
alpha
-upper quantiles of the null distribution of the test statistic.
The Monte Carlo simulation for the CCF09 test is made conditionally
on the choice of CCF09_dirs
. That is, all the Monte Carlo statistics
share the same random directions.
Except for CCF09_dirs
, K_CCF09
, and CJ12_reg
, all the
test-specific parameters are vectorized.
Value
A list with the following entries:
-
crit_val_MC
: a data frame of sizec(length(alpha), length(type))
, with column names given bytype
and rows corresponding to the significance levelsalpha
, that contains the estimated critical values of the tests. -
power_MC
: a data frame of sizec(nrow(crit_val), length(type))
, with column names given bytype
and rows corresponding to the significance levels ofcrit_val
, that contains the empirical powers of the tests.NA
ifcrit_val = NULL
. -
stats_MC
: a data frame of sizec(M, length(type))
, with column names given bytype
, that contains the Monte Carlo statistics.
Examples
## Critical values
# Single statistic, specific alpha
cir <- unif_stat_MC(n = 10, M = 1e2, type = "Ajne", p = 2, alpha = 0.15)
summary(cir$stats_MC)
cir$crit_val_MC
# All circular statistics
cir <- unif_stat_MC(n = 10, M = 1e2, p = 2)
head(cir$stats_MC)
cir$crit_val_MC
# All spherical statistics
sph <- unif_stat_MC(n = 10, M = 1e2, p = 3)
head(sph$stats_MC)
sph$crit_val_MC
## Using a progress bar
# Define a progress bar
require(progress)
require(progressr)
handlers(handler_progress(
format = paste("(:spin) [:bar] :percent Iter: :current/:total Rate:",
":tick_rate iter/sec ETA: :eta Elapsed: :elapsedfull"),
clear = FALSE))
# Call unif_stat_MC() within with_progress()
with_progress(unif_stat_MC(n = 10, M = 1e2, p = 3, chunks = 10))
# With several cores
with_progress(unif_stat_MC(n = 10, M = 1e2, p = 3, chunks = 10, cores = 2))
# Instead of using with_progress() each time, it is more practical to run
# handlers(global = TRUE)
# once to activate progress bars in your R session
## Power computation
# Single statistic
cir_pow <- unif_stat_MC(n = 10, M = 1e2, type = "Ajne", p = 2,
crit_val = cir$crit_val_MC)
cir_pow$crit_val_MC
cir_pow$power_MC
# All circular statistics
cir_pow <- unif_stat_MC(n = 10, M = 1e2, p = 2, crit_val = cir$crit_val_MC)
cir_pow$crit_val_MC
cir_pow$power_MC
# All spherical statistics
sph_pow <- unif_stat_MC(n = 10, M = 1e2, p = 3, crit_val = sph$crit_val_MC)
sph_pow$crit_val_MC
sph_pow$power_MC
## Custom r_H1
# Circular
r_H1 <- function(n, p, M, l = 0.05) {
stopifnot(p == 2)
Theta_to_X(matrix(runif(n * M, 0, (2 - l) * pi), n, M))
}
dirs <- r_unif_sph(n = 5, p = 2, M = 1)[, , 1]
cir <- unif_stat_MC(n = 50, M = 1e2, p = 2, CCF09_dirs = dirs)
cir_pow <- unif_stat_MC(n = 50, M = 1e2, p = 2, r_H1 = r_H1, l = 0.10,
crit_val = cir$crit_val_MC, CCF09_dirs = dirs)
cir_pow$crit_val_MC
cir_pow$power_MC
# Spherical
r_H1 <- function(n, p, M, l = 0.5) {
samp <- array(dim = c(n, p, M))
for (j in 1:M) {
samp[, , j] <- mvtnorm::rmvnorm(n = n, mean = c(l, rep(0, p - 1)),
sigma = diag(rep(1, p)))
samp[, , j] <- samp[, , j] / sqrt(rowSums(samp[, , j]^2))
}
return(samp)
}
dirs <- r_unif_sph(n = 5, p = 3, M = 1)[, , 1]
sph <- unif_stat_MC(n = 50, M = 1e2, p = 3, CCF09_dirs = dirs)
sph_pow <- unif_stat_MC(n = 50, M = 1e2, p = 3, r_H1 = r_H1, l = 0.5,
crit_val = sph$crit_val_MC, CCF09_dirs = dirs)
sph_pow$power_MC
## Pre-built r_H1
# Circular
dirs <- r_unif_sph(n = 5, p = 2, M = 1)[, , 1]
cir_pow <- unif_stat_MC(n = 50, M = 1e2, p = 2, r_H1 = r_alt, alt = "vMF",
kappa = 1, crit_val = cir$crit_val_MC,
CCF09_dirs = dirs)
cir_pow$power_MC
# Spherical
dirs <- r_unif_sph(n = 5, p = 3, M = 1)[, , 1]
sph_pow <- unif_stat_MC(n = 50, M = 1e2, p = 3, r_H1 = r_alt, alt = "vMF",
kappa = 1, crit_val = sph$crit_val_MC,
CCF09_dirs = dirs)
sph_pow$power_MC
Null distributions for circular and (hyper)spherical uniformity statistics
Description
Approximate computation of the null distributions of several
statistics for assessing uniformity on the (hyper)sphere
S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}
, p\ge 2
. The approximation is done either by
means of the asymptotic distribution or by Monte Carlo.
Usage
unif_stat_distr(x, type, p, n, approx = "asymp", M = 10000,
stats_MC = NULL, K_max = 10000, method = "I", Stephens = FALSE,
CCF09_dirs = NULL, CJ12_beta = 0, CJ12_reg = 3, cov_a = 2 * pi,
Cressie_t = 1/3, K_Ajne = 500, K_CCF09 = 25, K_Kuiper = 25,
K_Watson = 25, K_Watson_1976 = 5, Poisson_rho = 0.5, Pycke_q = 0.5,
Rayleigh_m = 1, Riesz_s = 1, Rothman_t = 1/3, Sobolev_vk2 = c(0, 0,
1), Softmax_kappa = 1, Stereo_a = 0, ...)
Arguments
x |
evaluation points for the null distribution(s). Either a vector of
size |
type |
type of test to be applied. A character vector containing any of
the following types of tests, depending on the dimension
If |
p |
integer giving the dimension of the ambient space |
n |
sample size employed for computing the statistic. |
approx |
type of approximation to the null distribution, either
|
M |
number of Monte Carlo replications for approximating the null
distribution when |
stats_MC |
a data frame of size |
K_max |
integer giving the truncation of the series that compute the
asymptotic p-value of a Sobolev test. Defaults to |
method |
method for approximating the density, distribution, or
quantile function of the weighted sum of chi squared random variables. Must
be |
Stephens |
compute Stephens (1970) modification so that the null distribution of the is less dependent on the sample size? The modification does not alter the test decision. |
CCF09_dirs |
a matrix of size |
CJ12_beta |
|
CJ12_reg |
type of asymptotic regime for CJ12 test, either |
cov_a |
|
Cressie_t |
|
K_CCF09 |
integer giving the truncation of the series present in the
asymptotic distribution of the Kolmogorov-Smirnov statistic. Defaults to
|
K_Kuiper , K_Watson , K_Watson_1976 , K_Ajne |
integer giving the truncation
of the series present in the null asymptotic distributions. For the
Kolmogorov-Smirnov-related series defaults to |
Poisson_rho |
|
Pycke_q |
|
Rayleigh_m |
integer |
Riesz_s |
|
Rothman_t |
|
Sobolev_vk2 |
weights for the finite Sobolev test. A non-negative
vector or matrix. Defaults to |
Softmax_kappa |
|
Stereo_a |
|
... |
if |
Details
When approx = "asymp"
, statistics that do not have an implemented or
known asymptotic are omitted, and a warning is generated.
For Sobolev tests, K_max = 1e4
produces probabilities uniformly
accurate with three digits for the "PCvM"
, "PAD"
, and
"PRt"
tests, for dimensions p \le 11
. With K_max = 5e4
,
these probabilities are uniformly accurate in the fourth digit. With
K_max = 1e3
, only two-digit uniform accuracy is obtained. Uniform
accuracy deteriorates when p
increases, e.g., a digit accuracy is lost
when p = 51
.
Descriptions and references on most of the asymptotic distributions are available in García-Portugués and Verdebout (2018).
Value
A data frame of size c(nx, length(type))
, with column names
given by type
, that contains the values of the null distributions of
the statistics evaluated at x
.
References
García-Portugués, E. and Verdebout, T. (2018) An overview of uniformity tests on the hypersphere. arXiv:1804.00286. doi:10.48550/arXiv.1804.00286.
Examples
## Asymptotic distribution
# Circular statistics
x <- seq(0, 1, l = 5)
unif_stat_distr(x = x, type = "Kuiper", p = 2, n = 10)
unif_stat_distr(x = x, type = c("Ajne", "Kuiper"), p = 2, n = 10)
unif_stat_distr(x = x, type = c("Ajne", "Kuiper"), p = 2, n = 10, K_Ajne = 5)
# All circular statistics
unif_stat_distr(x = x, type = avail_cir_tests, p = 2, n = 10, K_max = 1e3)
# Spherical statistics
unif_stat_distr(x = cbind(x, x + 1), type = c("Rayleigh", "Bingham"),
p = 3, n = 10)
unif_stat_distr(x = cbind(x, x + 1), type = c("Rayleigh", "Bingham"),
p = 3, n = 10, M = 100)
# All spherical statistics
unif_stat_distr(x = x, type = avail_sph_tests, p = 3, n = 10, K_max = 1e3)
## Monte Carlo distribution
# Circular statistics
x <- seq(0, 5, l = 10)
unif_stat_distr(x = x, type = avail_cir_tests, p = 2, n = 10, approx = "MC")
unif_stat_distr(x = x, type = "Kuiper", p = 2, n = 10, approx = "MC")
unif_stat_distr(x = x, type = c("Ajne", "Kuiper"), p = 2, n = 10,
approx = "MC")
# Spherical statistics
unif_stat_distr(x = x, type = avail_sph_tests, p = 3, n = 10,
approx = "MC")
unif_stat_distr(x = cbind(x, x + 1), type = c("Rayleigh", "Bingham"),
p = 3, n = 10, approx = "MC")
unif_stat_distr(x = cbind(x, x + 1), type = c("Rayleigh", "Bingham"),
p = 3, n = 10, approx = "MC")
## Specific arguments
# Rothman
unif_stat_distr(x = x, type = "Rothman", p = 2, n = 10, Rothman_t = 0.5,
approx = "MC")
# CCF09
dirs <- r_unif_sph(n = 5, p = 3, M = 1)[, , 1]
x <- seq(0, 1, l = 10)
unif_stat_distr(x = x, type = "CCF09", p = 3, n = 10, approx = "MC",
CCF09_dirs = dirs)
unif_stat_distr(x = x, type = "CCF09", p = 3, n = 10, approx = "MC")
# CJ12
unif_stat_distr(x = x, type = "CJ12", p = 3, n = 100, CJ12_reg = 3)
unif_stat_distr(x = x, type = "CJ12", p = 3, n = 100, CJ12_reg = 2,
CJ12_beta = 0.01)
unif_stat_distr(x = x, type = "CJ12", p = 3, n = 100, CJ12_reg = 1)
## Sobolev
x <- seq(0, 1, l = 10)
vk2 <- diag(1, nrow = 3)
unif_stat_distr(x = x, type = "Sobolev", approx = "asymp", p = 3, n = 100,
Sobolev_vk2 = vk2)
sapply(1:3, function(i)
unif_stat_distr(x = x, type = "Sobolev", approx = "asymp", p = 3, n = 100,
Sobolev_vk2 = vk2[i, ])$Sobolev)
sapply(1:3, function(i)
unif_stat_distr(x = x, type = "Sobolev", approx = "MC", p = 3, n = 100,
Sobolev_vk2 = vk2[i, ], M = 1e3)$Sobolev)
unif_stat_distr(x = x, type = "Sobolev", approx = "MC", p = 3, n = 100,
Sobolev_vk2 = vk2, M = 1e3)
Circular and (hyper)spherical uniformity tests
Description
Implementation of several uniformity tests on the (hyper)sphere
S^{p-1}:=\{{\bf x}\in R^p:||{\bf x}||=1\}
, p\ge 2
, with calibration either in
terms of their asymptotic/exact distributions, if available, or Monte Carlo.
unif_test
receives a sample of directions
{\bf X}_1,\ldots,{\bf X}_n\in S^{p-1}
in
Cartesian coordinates, except for the circular case (p=2
) in
which the sample can be represented in terms of angles
\Theta_1,\ldots,\Theta_n\in [0, 2\pi)
.
unif_test
allows to perform several tests within a single call,
facilitating thus the exploration of a dataset by applying several tests.
Usage
unif_test(data, type = "all", p_value = "asymp", alpha = c(0.1, 0.05,
0.01), M = 10000, stats_MC = NULL, crit_val = NULL,
data_sorted = FALSE, K_max = 10000, method = "I", CCF09_dirs = NULL,
CJ12_beta = 0, CJ12_reg = 3, cov_a = 2 * pi, Cressie_t = 1/3,
K_CCF09 = 25, Poisson_rho = 0.5, Pycke_q = 0.5, Rayleigh_m = 1,
Riesz_s = 1, Rothman_t = 1/3, Sobolev_vk2 = c(0, 0, 1),
Softmax_kappa = 1, Stereo_a = 0, ...)
Arguments
data |
sample to perform the test. A matrix of size |
type |
type of test to be applied. A character vector containing any of
the following types of tests, depending on the dimension
If |
p_value |
type of |
alpha |
vector with significance levels. Defaults to
|
M |
number of Monte Carlo replications for approximating the null
distribution when |
stats_MC |
a data frame of size |
crit_val |
table with critical values for the tests, to be used if
|
data_sorted |
is the circular data sorted? If |
K_max |
integer giving the truncation of the series that compute the
asymptotic p-value of a Sobolev test. Defaults to |
method |
method for approximating the density, distribution, or
quantile function of the weighted sum of chi squared random variables. Must
be |
CCF09_dirs |
a matrix of size |
CJ12_beta |
|
CJ12_reg |
type of asymptotic regime for CJ12 test, either |
cov_a |
|
Cressie_t |
|
K_CCF09 |
integer giving the truncation of the series present in the
asymptotic distribution of the Kolmogorov-Smirnov statistic. Defaults to
|
Poisson_rho |
|
Pycke_q |
|
Rayleigh_m |
integer |
Riesz_s |
|
Rothman_t |
|
Sobolev_vk2 |
weights for the finite Sobolev test. A non-negative
vector or matrix. Defaults to |
Softmax_kappa |
|
Stereo_a |
|
... |
If |
Details
All the tests reject for large values of the test statistic, so the critical
values for the significance levels alpha
correspond to the
alpha
-upper quantiles of the null distribution of the test statistic.
When p_value = "asymp"
, tests that do not have an implemented or
known asymptotic are omitted, and a warning is generated.
When p_value = "MC"
, it is possible to have a progress bar indicating
the Monte Carlo simulation progress if unif_test
is wrapped with
progressr::with_progress
or if
progressr::handlers(global = TRUE)
is invoked (once) by the user.
See the examples below. The progress bar is updated with the number of
finished chunks.
All the statistics are continuous random variables except the
Hodges–Ajne statistic ("Hodges_Ajne"
), the Cressie statistic
("Cressie"
), and the number of (different) uncovered spacings
("Num_uncover"
). These three statistics are discrete random variables.
The Monte Carlo calibration for the CCF09 test is made conditionally
on the choice of CCF09_dirs
. That is, all the Monte
Carlo statistics share the same random directions.
Except for CCF09_dirs
, K_CCF09
, and CJ12_reg
, all the
test-specific parameters are vectorized.
Descriptions and references for most of the tests are available in García-Portugués and Verdebout (2018).
Value
If only a single test is performed, a list with class
htest
containing the following components:
-
statistic
: the value of the test statistic. -
p.value
: the p-value of the test. Ifp_value = "crit_val"
, anNA
. -
alternative
: a character string describing the alternative hypothesis. -
method
: a character string indicating what type of test was performed. -
data.name
: a character string giving the name of the data. -
reject
: the rejection decision for the levels of significancealpha
. -
crit_val
: a vector with the critical values for the significance levelsalpha
used withp_value = "MC"
orp_value = "asymp"
. -
param
: parameter(s) used in the test (if any).
If several tests are performed, a type
-named list with
entries for each test given by the above list.
References
García-Portugués, E. and Verdebout, T. (2018) An overview of uniformity tests on the hypersphere. arXiv:1804.00286. doi:10.48550/arXiv.1804.00286.
Examples
## Asymptotic distribution
# Circular data
n <- 10
samp_cir <- r_unif_cir(n = n)
# Matrix
unif_test(data = samp_cir, type = "Ajne", p_value = "asymp")
# Vector
unif_test(data = samp_cir[, 1], type = "Ajne", p_value = "asymp")
# Array
unif_test(data = array(samp_cir, dim = c(n, 1, 1)), type = "Ajne",
p_value = "asymp")
# Several tests
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "asymp")
# Spherical data
n <- 10
samp_sph <- r_unif_sph(n = n, p = 3)
# Array
unif_test(data = samp_sph, type = "Bingham", p_value = "asymp")
# Matrix
unif_test(data = samp_sph[, , 1], type = "Bingham", p_value = "asymp")
# Several tests
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "asymp")
## Monte Carlo
# Circular data
unif_test(data = samp_cir, type = "Ajne", p_value = "MC")
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "MC")
# Spherical data
unif_test(data = samp_sph, type = "Bingham", p_value = "MC")
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC")
# Caching stats_MC
stats_MC_cir <- unif_stat_MC(n = nrow(samp_cir), p = 2)$stats_MC
stats_MC_sph <- unif_stat_MC(n = nrow(samp_sph), p = 3)$stats_MC
unif_test(data = samp_cir, type = avail_cir_tests,
p_value = "MC", stats_MC = stats_MC_cir)
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
stats_MC = stats_MC_sph)
## Critical values
# Circular data
unif_test(data = samp_cir, type = avail_cir_tests, p_value = "crit_val")
# Spherical data
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "crit_val")
# Caching crit_val
crit_val_cir <- unif_stat_MC(n = n, p = 2)$crit_val_MC
crit_val_sph <- unif_stat_MC(n = n, p = 3)$crit_val_MC
unif_test(data = samp_cir, type = avail_cir_tests,
p_value = "crit_val", crit_val = crit_val_cir)
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "crit_val",
crit_val = crit_val_sph)
## Specific arguments
# Rothman
unif_test(data = samp_cir, type = "Rothman", Rothman_t = 0.5)
# CCF09
unif_test(data = samp_sph, type = "CCF09", p_value = "MC",
CCF09_dirs = samp_sph[1:2, , 1])
unif_test(data = samp_sph, type = "CCF09", p_value = "MC",
CCF09_dirs = samp_sph[3:4, , 1])
## Using a progress bar when p_value = "MC"
# Define a progress bar
require(progress)
require(progressr)
handlers(handler_progress(
format = paste("(:spin) [:bar] :percent Iter: :current/:total Rate:",
":tick_rate iter/sec ETA: :eta Elapsed: :elapsedfull"),
clear = FALSE))
# Call unif_test() within with_progress()
with_progress(
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
chunks = 10, M = 1e3)
)
# With several cores
with_progress(
unif_test(data = samp_sph, type = avail_sph_tests, p_value = "MC",
cores = 2, chunks = 10, M = 1e3)
)
# Instead of using with_progress() each time, it is more practical to run
# handlers(global = TRUE)
# once to activate progress bars in your R session
Low-level utilities for sphunif
Description
Internal and undocumented low-level utilities for sphunif.
Usage
n_from_dist_vector(n_dist)
t_inv_sqrt_one(t)
Arguments
n_dist |
a positive integer |
t |
a vector to evaluate |
Venus craters
Description
Craters on Venus from the USGS Astrogeology Science Center.
Usage
venus
Format
A data frame with 967 rows and 4 variables:
- name
name of the crater (if named).
- diameter
diameter of the crater (in km).
- theta
longitude angle
\theta \in [0, 2\pi)
of the crater center.- phi
latitude angle
\phi \in [-\pi/2, \pi/2]
of the crater center.
Details
The (\theta, \phi)
angles are such their associated planetocentric
coordinates are:
(\cos(\phi) \cos(\theta), \cos(\phi) \sin(\theta), \sin(\phi))',
with (0, 0, 1)'
denoting the north pole.
The script performing the data preprocessing is available at
venus.R
.
Source
https://astrogeology.usgs.gov/search/map/Venus/venuscraters
Examples
# Load data
data("venus")
# Add Cartesian coordinates
venus$X <- cbind(cos(venus$theta) * cos(venus$phi),
sin(venus$theta) * cos(venus$phi),
sin(venus$phi))
# Tests
unif_test(data = venus$X, type = c("PCvM", "PAD", "PRt"), p_value = "asymp")
Weighted sums of non-central chi squared random variables
Description
Approximated density, distribution, and quantile functions for weighted sums of non-central chi squared random variables:
Q_K = \sum_{i = 1}^K w_i \chi^2_{d_i}(\lambda_i),
where w_1, \ldots, w_n
are positive weights, d_1, \ldots, d_n
are positive degrees of freedom, and \lambda_1, \ldots, \lambda_n
are non-negative non-centrality parameters. Also, simulation of Q_K
.
Usage
d_wschisq(x, weights, dfs, ncps = 0, method = c("I", "SW", "HBE")[1],
exact_chisq = TRUE, imhof_epsabs = 1e-06, imhof_epsrel = 1e-06,
imhof_limit = 10000, grad_method = "simple",
grad_method.args = list(eps = 1e-07))
p_wschisq(x, weights, dfs, ncps = 0, method = c("I", "SW", "HBE", "MC")[1],
exact_chisq = TRUE, imhof_epsabs = 1e-06, imhof_epsrel = 1e-06,
imhof_limit = 10000, M = 10000, MC_sample = NULL)
q_wschisq(u, weights, dfs, ncps = 0, method = c("I", "SW", "HBE", "MC")[1],
exact_chisq = TRUE, imhof_epsabs = 1e-06, imhof_epsrel = 1e-06,
imhof_limit = 10000, nlm_gradtol = 1e-06, nlm_iterlim = 1000,
M = 10000, MC_sample = NULL)
r_wschisq(n, weights, dfs, ncps = 0)
cutoff_wschisq(thre = 1e-04, weights, dfs, ncps = 0, log = FALSE,
x_tail = NULL)
Arguments
x |
vector of quantiles. |
weights |
vector with the positive weights of the sum. Must have the
same length as |
dfs |
vector with the positive degrees of freedom of the chi squared
random variables. Must have the same length as |
ncps |
non-centrality parameters. Either |
method |
method for approximating the density, distribution, or
quantile function. Must be |
exact_chisq |
if |
imhof_epsabs , imhof_epsrel , imhof_limit |
precision parameters passed to
|
grad_method , grad_method.args |
numerical differentiation parameters
passed to |
M |
number of Monte Carlo samples for approximating the distribution if
|
MC_sample |
if provided, it is employed when |
u |
vector of probabilities. |
nlm_gradtol , nlm_iterlim |
convergence control parameters passed to
|
n |
sample size. |
thre |
vector with the error thresholds of the tail probability and
mean/variance explained by the first terms of the series. Defaults to
|
log |
are |
x_tail |
scalar evaluation point for determining the upper tail
probability. If |
Details
Four methods are implemented for approximating the distribution of a weighted sum of chi squared random variables:
-
"I"
: Imhof's approximation (Imhof, 1961) for the evaluation of the distribution function. If this method is selected, the function is simply a wrapper toimhof
from theCompQuadForm
package (Duchesne and Lafaye De Micheaux, 2010). -
"SW"
: Satterthwaite–Welch (Satterthwaite, 1946; Welch, 1938) approximation, consisting in matching the first two moments ofQ_K
with a gamma distribution. -
"HBE"
: Hall–Buckley–Eagleson (Hall, 1983; Buckley and Eagleson, 1988) approximation, consisting in matching the first three moments ofQ_K
with a gamma distribution. -
"MC"
: Monte Carlo approximation using the empirical cumulative distribution function withM
simulated samples.
The Imhof method is exact up to the prescribed numerical accuracy. It is also the most time-consuming method. The density and quantile functions for this approximation are obtained by numerical differentiation and inversion, respectively, of the approximated distribution.
For the methods based on gamma matching, the GammaDist
density, distribution, and quantile functions are invoked. The
Hall–Buckley–Eagleson approximation tends to overperform the
Satterthwaite–Welch approximation.
The Monte Carlo method is relatively inaccurate and slow, but serves as an
unbiased reference of the true distribution function. The inversion of the
empirical cumulative distribution is done by quantile
.
An empirical comparison of these and other approximation methods is given in Bodenham and Adams (2016).
cutoff_wschisq
removes NA
s/NaN
s in weights
or
dfs
with a message. The threshold thre
ensures that the
tail probability of the truncated and whole series differ less than
thre
at x_tail
, or that thre
is the proportion of the
mean/variance of the whole series that is not retained. The (upper)
tail probabilities for evaluating truncation are computed using the
Hall–Buckley–Eagleson approximation at x_tail
.
Value
-
d_wschisq
: density function evaluated atx
, a vector. -
p_wschisq
: distribution function evaluated atx
, a vector. -
q_wschisq
: quantile function evaluated atu
, a vector. -
r_wschisq
: a vector of sizen
containing a random sample. -
cutoff_wschisq
: a data frame with the indexes up to which the truncated series explains the tail probability with absolute errorthre
, or the proportion of the mean/variance of the whole series that is not explained by the truncated series.
Author(s)
Eduardo García-Portugués and Paula Navarro-Esteban.
References
Bodenham, D. A. and Adams, N. M. (2016). A comparison of efficient approximations for a weighted sum of chi-squared random variables. Statistics and Computing, 26(4):917–928. doi:10.1007/s11222-015-9583-4
Buckley, M. J. and Eagleson, G. K. (1988). An approximation to the distribution of quadratic forms in normal random variables. Australian Journal of Statistics, 30(1):150–159. doi:10.1111/j.1467-842X.1988.tb00471.x
Duchesne, P. and Lafaye De Micheaux, P. (2010) Computing the distribution of quadratic forms: Further comparisons between the Liu–Tang–Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4):858–862. doi:10.1016/j.csda.2009.11.025
Hall, P. (1983). Chi squared approximations to the distribution of a sum of independent random variables. Annals of Probability, 11(4):1028–1036. doi:10.1214/aop/1176993451
Imhof, J. P. (1961). Computing the distribution of quadratic forms in normal variables. Biometrika, 48(3/4):419–426. doi:10.2307/2332763
Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. Biometrics Bulletin, 2(6):110–114. doi:10.2307/3002019
Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. Biometrika, 29(3/4):350–362. doi:10.2307/2332010
Examples
# Plotting functions for the examples
add_approx_dens <- function(x, dfs, weights, ncps) {
lines(x, d_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "SW", exact_chisq = FALSE), col = 3)
lines(x, d_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "HBE", exact_chisq = FALSE), col = 4)
lines(x, d_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "I", exact_chisq = TRUE), col = 2)
legend("topright", legend = c("True", "SW", "HBE", "I"), lwd = 2,
col = c(1, 3:4, 2))
}
add_approx_distr <- function(x, dfs, weights, ncps, ...) {
lines(x, p_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "SW", exact_chisq = FALSE), col = 3)
lines(x, p_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "HBE", exact_chisq = FALSE), col = 4)
lines(x, p_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "MC", exact_chisq = FALSE), col = 5,
type = "s")
lines(x, p_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "I", exact_chisq = TRUE), col = 2)
legend("bottomright", legend = c("True", "SW", "HBE", "MC", "I"), lwd = 2,
col = c(1, 3:5, 2))
}
add_approx_quant <- function(u, dfs, weights, ncps, ...) {
lines(u, q_wschisq(u, weights = weights, dfs = dfs, ncps = ncps,
method = "SW", exact_chisq = FALSE), col = 3)
lines(u, q_wschisq(u, weights = weights, dfs = dfs, ncps = ncps,
method = "HBE", exact_chisq = FALSE), col = 4)
lines(u, q_wschisq(u, weights = weights, dfs = dfs, ncps = ncps,
method = "MC", exact_chisq = FALSE), col = 5,
type = "s")
lines(u, q_wschisq(u, weights = weights, dfs = dfs, ncps = ncps,
method = "I", exact_chisq = TRUE), col = 2)
legend("topleft", legend = c("True", "SW", "HBE", "MC", "I"), lwd = 2,
col = c(1, 3:5, 2))
}
# Validation plots for density, distribution, and quantile functions
u <- seq(0.01, 0.99, l = 100)
old_par <- par(mfrow = c(1, 3))
# Case 1: 1 * ChiSq_3(0) + 1 * ChiSq_3(0) = ChiSq_6(0)
weights <- c(1, 1)
dfs <- c(3, 3)
ncps <- 0
x <- seq(-1, 30, l = 100)
main <- expression(1 * chi[3]^2 * (0) + 1 * chi[3]^2 * (0))
plot(x, dchisq(x, df = 6), type = "l", main = main, ylab = "Density")
add_approx_dens(x = x, weights = weights, dfs = dfs, ncps = ncps)
plot(x, pchisq(x, df = 6), type = "l", main = main, ylab = "Distribution")
add_approx_distr(x = x, weights = weights, dfs = dfs, ncps = ncps)
plot(u, qchisq(u, df = 6), type = "l", main = main, ylab = "Quantile")
add_approx_quant(u = u, weights = weights, dfs = dfs, ncps = ncps)
# Case 2: 2 * ChiSq_3(1) + 1 * ChiSq_6(0.5) + 0.5 * ChiSq_12(0.25)
weights <- c(2, 1, 0.5)
dfs <- c(3, 6, 12)
ncps <- c(1, 0.5, 0.25)
x <- seq(0, 70, l = 100)
main <- expression(2 * chi[3]^2 * (1)+ 1 * chi[6]^2 * (0.5) +
0.5 * chi[12]^2 * (0.25))
samp <- r_wschisq(n = 1e4, weights = weights, dfs = dfs, ncps = ncps)
hist(samp, breaks = 50, freq = FALSE, main = main, ylab = "Density",
xlim = range(x), xlab = "x"); box()
add_approx_dens(x = x, weights = weights, dfs = dfs, ncps = ncps)
plot(x, ecdf(samp)(x), main = main, ylab = "Distribution", type = "s")
add_approx_distr(x = x, weights = weights, dfs = dfs, ncps = ncps)
plot(u, quantile(samp, probs = u), type = "s", main = main,
ylab = "Quantile")
add_approx_quant(u = u, weights = weights, dfs = dfs, ncps = ncps)
# Case 3: \sum_{k = 1}^K k^(-3) * ChiSq_{5k}(1 / k^2)
K <- 1e2
weights<- 1 / (1:K)^3
dfs <- 5 * 1:K
ncps <- 1 / (1:K)^2
x <- seq(0, 25, l = 100)
main <- substitute(sum(k^(-3) * chi[5 * k]^2 * (1 / k^2), k == 1, K),
list(K = K))
samp <- r_wschisq(n = 1e4, weights = weights, dfs = dfs, ncps = ncps)
hist(samp, breaks = 50, freq = FALSE, main = main, ylab = "Density",
xlim = range(x), xlab = "x"); box()
add_approx_dens(x = x, weights = weights, dfs = dfs, ncps = ncps)
plot(x, ecdf(samp)(x), main = main, ylab = "Distribution", type = "s")
add_approx_distr(x = x, weights = weights, dfs = dfs, ncps = ncps)
plot(u, quantile(samp, probs = u), type = "s", main = main,
ylab = "Quantile")
add_approx_quant(u = u, weights = weights, dfs = dfs, ncps = ncps)
par(old_par)
# Cutoffs for infinite series of the last example
K <- 1e7
log_weights<- -3 * log(1:K)
log_dfs <- log(5) + log(1:K)
(cutoff <- cutoff_wschisq(thre = 10^(-(1:4)), weights = log_weights,
dfs = log_dfs, log = TRUE))
# Approximation
x <- seq(0, 25, l = 100)
l <- length(cutoff$mean)
main <- expression(sum(k^(-3) * chi[5 * k]^2, k == 1, K))
col <- viridisLite::viridis(l)
plot(x, d_wschisq(x, weights = exp(log_weights[1:cutoff$mean[l]]),
dfs = exp(log_dfs[1:cutoff$mean[l]])), type = "l",
ylab = "Density", col = col[l], lwd = 3)
for(i in rev(seq_along(cutoff$mean)[-l])) {
lines(x, d_wschisq(x, weights = exp(log_weights[1:cutoff$mean[i]]),
dfs = exp(log_dfs[1:cutoff$mean[i]])), col = col[i])
}
legend("topright", legend = paste0(rownames(cutoff), " (", cutoff$mean, ")"),
lwd = 2, col = col)
Utilities for weighted sums of non-central chi squared random variables
Description
Simulation from a weighted sum of non-central chi squared random variables and Monte Carlo approximation of its distribution function.
Usage
r_wschisq_Cpp(n, weights, dfs, ncps)
p_wschisq_MC(x, weights = 0L, dfs = 0L, ncps = 0L, M = 10000L,
sample = 0L, use_sample = FALSE, x_sorted = FALSE)
Arguments
n |
sample size. |
weights |
vector with the positive weights of the sum. Must have the
same length as |
dfs |
vector with the positive degrees of freedom of the chi squared
random variables. Must have the same length as |
ncps |
non-negative non-centrality parameters. A vector with the same
length as |
x |
vector of quantiles. |
M |
number of Monte Carlo samples for approximating the distribution.
Defaults to |
sample |
if |
use_sample |
use the already computed |
Value
-
r_wschisq_Cpp
: a matrix of sizec(n, 1)
containing a random sample. -
p_wschisq_MC
: a matrix of sizec(nx, 1)
with the evaluation of the distribution function atx
.