These are helper functions included in the package.
The gen_bkgnoise() function allows users to generate
multivariate Gaussian noise to serve as background data in
high-dimensional spaces.
# Example: Generate 4D background noise
bkg_data <- gen_bkgnoise(n = 500, p = 4,
m = c(0, 0, 0, 0), s = c(2, 2, 2, 2))
head(bkg_data)
#> # A tibble: 6 × 4
#> x1 x2 x3 x4
#> <dbl> <dbl> <dbl> <dbl>
#> 1 2.54 2.83 -2.84 2.19
#> 2 1.81 -0.134 0.522 4.74
#> 3 -1.74 -3.47 0.744 3.26
#> 4 0.350 1.10 -0.416 -1.27
#> 5 2.61 3.86 2.04 0.964
#> 6 -2.34 1.65 -0.111 1.27The generated data has independent dimensions with specified means
(m) and standard deviations (s).
randomize_rows() ensures the rows of the input data is
randomized.
relocate_clusters() allows users to translate clusters
in any dimension(s). This is achieved by centering each cluster
(subtracting its mean) and then adding a translation vector from a
provided matrix (vert_mat).
df <- tibble::tibble(
x1 = rnorm(12),
x2 = rnorm(12),
x3 = rnorm(12),
x4 = rnorm(12),
cluster = rep(1:3, each = 4)
)
vert_mat <- matrix(c(
5, 0, 0, 0,
0, 5, 0, 0,
0, 0, 5, 0
), nrow = 3, byrow = TRUE)
relocated_df <- relocate_clusters(df, vert_mat)
head(relocated_df)
#> # A tibble: 6 × 5
#> x1 x2 x3 x4 cluster
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 0.533 -0.127 5.79 1.82 3
#> 2 -0.533 0.360 4.23 -1.08 3
#> 3 0.365 4.71 -0.483 -0.652 2
#> 4 0.928 0.307 3.71 -1.67 3
#> 5 -1.04 4.44 -0.122 -0.386 2
#> 6 -0.410 4.69 -0.407 2.13 2The gen_rotation() function creates a rotation matrix in
high-dimensional space for given planes and angles.
rotations_4d <- list(
list(plane = c(1, 2), angle = 60),
list(plane = c(3, 4), angle = 90)
)
rot_mat <- gen_rotation(p = 4, planes_angles = rotations_4d)
rot_mat
#> [,1] [,2] [,3] [,4]
#> [1,] 0.5000000 -0.8660254 0.000000e+00 0.000000e+00
#> [2,] 0.8660254 0.5000000 0.000000e+00 0.000000e+00
#> [3,] 0.0000000 0.0000000 6.123234e-17 -1.000000e+00
#> [4,] 0.0000000 0.0000000 1.000000e+00 6.123234e-17When combining clusters or transforming data geometrically,
magnitudes can differ drastically. The normalize_data()
function rescales the entire dataset to fit within ([-1, 1]) based on
its maximum absolute value.
norm_data <- normalize_data(bkg_data)
head(norm_data)
#> x1 x2 x3 x4
#> 1 0.32966791 0.36746895 -0.36831134 0.2838941
#> 2 0.23503031 -0.01744658 0.06767172 0.6148489
#> 3 -0.22513820 -0.45013699 0.09646890 0.4227887
#> 4 0.04540999 0.14280257 -0.05402839 -0.1647246
#> 5 0.33877160 0.50095126 0.26458171 0.1250784
#> 6 -0.30422734 0.21405969 -0.01444044 0.1646372To place clusters in different positions, gen_clustloc()
generates points forming a simplex-like arrangement
ensuring each cluster center is equidistant from others as much as
possible.
centers <- gen_clustloc(p = 4, k = 5)
head(centers)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -0.1871367 0.74955875 0.1813817 -0.7647630 0.02095925
#> [2,] 0.5307424 -0.04382745 0.4547328 -0.8621243 -0.07952343
#> [3,] -1.6393512 -1.59734549 0.0829876 1.4700328 1.68367622
#> [4,] -0.8793591 -0.05091480 0.6331562 0.1565104 0.14060736Two helper functions, gen_nproduct() and
gen_nsum(), generate numeric vectors of positive integers
that approximately satisfy a user-specified target product or sum,
respectively.
The function gen_nsum(n, k) divides a total sum
n into k positive integers. It first assigns
an equal base value to each element and then randomly distributes any
remainder, ensuring the elements sum exactly to n.
The function gen_nproduct(n, p) aims to produce
p positive integers whose product is approximately
n. It starts with all elements equal to the rounded \(p^{th}\) root of n and
iteratively adjusts elements up or down in a randomized manner until the
product is within a small tolerance of n. This accommodates
the fact that exact integer solutions for a given product are often
impossible.