1. Introduction to matrixCorr
Introduction
matrixCorr is organised around a simple idea:
correlation, agreement, and reliability workflows are often used
together, so they should not require completely different interfaces,
object classes, and inspection patterns.
The package therefore combines several estimator families behind a
common matrix-oriented interface for wide data and a shared long-format
workflow for repeated-measures designs. The purpose of this vignette is
not to describe every supported method in detail, but to show how the
package is structured, what kinds of inputs it expects, and how the
returned objects can be inspected.
Main workflow families
At a practical level, the package is centred on five workflow
families.
- Wide-data correlation matrices for numeric inputs.
- Robust and high-dimensional association estimation.
- Latent and mixed-scale correlation for binary, ordinal, and mixed
inputs.
- Agreement and reliability analysis for wide data.
- Repeated-measures correlation and repeated-measures agreement.
The same broad inspection pattern is used throughout to fit an
estimator, print it for a compact view, call summary() for
a longer digest, and use plot() when a graphical summary is
available.
Wide matrix workflow
The matrix-style functions accept a numeric matrix or data frame and
return a square result indexed by the original column names.
library(matrixCorr)
set.seed(1)
X <- as.data.frame(matrix(rnorm(120), ncol = 4))
names(X) <- paste0("V", 1:4)
fit_pearson <- pearson_corr(X, ci = TRUE)
fit_spearman <- spearman_rho(X, ci = TRUE)
print(fit_pearson, digits = 2)
#> Pearson correlation matrix
#> method : pearson
#> dimensions : 4 x 4
#> ci : yes
#>
#> V1 V2 V3 V4
#> V1 1.00 0.05 0.02 0.00
#> V2 0.05 1.00 0.45 -0.12
#> V3 0.02 0.45 1.00 0.00
#> V4 0.00 -0.12 0.00 1.00
summary(fit_spearman)
#> Spearman correlation summary
#> output : matrix
#> dimensions : 4 x 4
#> retained_pairs: 10
#> threshold : 0.0000
#> diag : included
#> estimate : -0.1524 to 1.0000
#> ci : 95%
#> ci_method : jackknife_euclidean_likelihood
#> ci_width : 0.644 to 0.908
#> cross_zero : 5 pair(s)
#>
#> item1 item2 estimate n_complete lwr upr fisher_z statistic p_value
#> V2 V3 0.4318 30 0.1077 0.7514 0.4621 2.4013 0.0163
#> V1 V3 -0.1524 30 -0.5819 0.2781 -0.1536 -0.7981 0.4248
#> V1 V4 -0.0861 30 -0.4636 0.2924 -0.0863 -0.4485 0.6538
#> V2 V4 -0.0345 30 -0.4660 0.3982 -0.0345 -0.1792 0.8577
#> V1 V2 0.0154 30 -0.3972 0.4276 0.0154 0.0798 0.9364
#> V3 V4 0.0087 30 -0.4458 0.4624 0.0087 0.0451 0.9640
#>
#> Strongest pairs by |estimate|
#>
#> item1 item2 estimate n_complete lwr upr fisher_z
#> V2 V3 0.43181313 30 0.1076611 0.7514310 0.46212324
#> V1 V3 -0.15239155 30 -0.5819059 0.2780522 -0.15358793
#> V1 V4 -0.08609566 30 -0.4635512 0.2923705 -0.08630934
#> V2 V4 -0.03448276 30 -0.4660429 0.3981963 -0.03449644
#> V1 V2 0.01535039 30 -0.3971767 0.4276203 0.01535160
#> statistic p_value
#> 2.40126277 0.0163386
#> -0.79806631 0.4248320
#> -0.44847649 0.6538094
#> -0.17924874 0.8577424
#> 0.07976923 0.9364208
This pattern extends to other wide-data estimators such as
kendall_tau(), dcor(), bicor(),
pbcor(), wincor(),
skipped_corr(), shrinkage_corr(),
pcorr(), ccc(), and icc().
Agreement and reliability workflow
Agreement methods answer a different question from ordinary
correlation. Correlation asks whether two variables move together.
Agreement asks whether two methods give sufficiently similar values on
the measurement scale itself.
set.seed(2)
ref <- rnorm(40, mean = 10, sd = 2)
alt <- ref + 0.3 + rnorm(40, sd = 0.8)
fit_ba <- ba(ref, alt)
print(fit_ba)
#> Bland-Altman preview:
#> based_on : 40
#> loa_rule : mean +/- 1.96 * SD
#> ci : 95%
#> sd_diff : 0.927
#> width : 3.632
#>
#> quantity estimate lwr upr
#> Mean difference -0.269 -0.566 0.027
#> Lower LoA -2.085 -2.599 -1.572
#> Upper LoA 1.547 1.034 2.060
wide_methods <- data.frame(
m1 = ref + rnorm(40, sd = 0.2),
m2 = ref + 0.2 + rnorm(40, sd = 0.3),
m3 = ref - 0.1 + rnorm(40, sd = 0.4)
)
fit_ccc <- ccc(wide_methods)
fit_icc <- icc(wide_methods, scope = "pairwise")
summary(fit_ccc)
#> Correlation summary
#> output : matrix
#> dimensions : 3 x 3
#> retained_pairs: 6
#> threshold : 0.0000
#> diag : included
#> estimate : 0.9567 to 1.0000
#>
#> item1 item2 estimate n_complete fisher_z
#> m1 m2 0.9813 40 2.3309
#> m1 m3 0.9803 40 2.3047
#> m2 m3 0.9567 40 1.9056
#>
#> Strongest pairs by |estimate|
#>
#> item1 item2 estimate n_complete fisher_z
#> m1 m2 0.9812785 40 2.330914
#> m1 m3 0.9802809 40 2.304703
#> m2 m3 0.9567105 40 1.905555
summary(fit_icc)
#> Intraclass correlation summary
#> method : Intraclass correlation (one-way, consistency, single)
#> dimensions : 3 x 3
#> pairs : 3
#> n_complete : 40
#> estimate : 0.9574 to 0.9817
#> most_negative: m2-m3 (0.9574)
#> most_positive: m1-m2 (0.9817)
#>
#> Strongest pairs by |estimate|
#>
#> item1 item2 estimate n_complete
#> m1 m2 0.9817 40
#> m1 m3 0.9807 40
#> m2 m3 0.9574 40
For ICC, scope = "pairwise" and
scope = "overall" answer different questions as well.
Pairwise ICC asks how reliable each specific method pair is. Overall ICC
asks how reliable the full set of columns is when analysed jointly.
fit_icc_overall <- icc(wide_methods, scope = "overall", ci = TRUE)
print(fit_icc_overall)
#> Overall intraclass correlation
#> method : Overall intraclass correlation table
#> subjects : 40
#> raters : 3
#> selected : ICC1
#>
#> Coefficient table
#>
#> coefficient label estimate ... upr selected
#> ICC1 Single absolute 0.9732 ... 0.9848 TRUE
#> ICC2 Single random 0.9733 ... 0.9876 FALSE
#> ICC3 Single fixed 0.9819 ... 0.9898 FALSE
#> ICC1k Average absolute 0.9909 ... 0.9949 FALSE
#> ICC2k Average random 0.9909 ... 0.9958 FALSE
#> ICC3k Average fixed 0.9939 ... 0.9966 FALSE
#> ... 5 more variables not shown (omitted)
#> Use as.data.frame()/tidy()/as.matrix() to inspect the full result.
summary(fit_icc_overall)
#> Overall intraclass correlation summary
#> selected : ICC1
#> subjects : 40
#> raters : 3
#> n_complete : 40
#> ms_subject : 14.4485
#> ms_rater : 1.8009
#> ms_error : 0.0884
#>
#> ICC coefficients
#>
#> coefficient label estimate statistic df1 df2 p_value lwr upr
#> ICC1 Single absolute 0.9732 110.1464 39 80 0 0.96 0.98
#> ICC2 Single random 0.9733 163.5137 39 78 0 0.93 0.99
#> ICC3 Single fixed 0.9819 163.5137 39 78 0 0.97 0.99
#> ICC1k Average absolute 0.9909 110.1464 39 80 0 0.98 0.99
#> ICC2k Average random 0.9909 163.5137 39 78 0 0.98 1.00
#> ICC3k Average fixed 0.9939 163.5137 39 78 0 0.99 1.00
#> selected
#> TRUE
#> FALSE
#> FALSE
#> FALSE
#> FALSE
#> FALSE
#>
#> ANOVA decomposition
#>
#> component df sum_sq mean_sq statistic p_value
#> subjects 39 563.4912 14.4485 110.1464 0
#> raters 2 3.6017 1.8009 20.3805 0
#> residual 78 6.8923 0.0884 NA NA
Repeated-measures workflow
Repeated-measures functions require long-format data and explicit
identifiers for subjects and, when relevant, methods and time.
set.seed(3)
n_subject <- 12
n_rep <- 3
subject <- rep(seq_len(n_subject), each = n_rep)
signal <- rnorm(n_subject * n_rep)
subject_x <- rnorm(n_subject, sd = 1.2)[subject]
subject_y <- rnorm(n_subject, sd = 1.0)[subject]
dat_rm <- data.frame(
id = subject,
x = subject_x + signal + rnorm(n_subject * n_rep, sd = 0.2),
y = subject_y + 0.7 * signal + rnorm(n_subject * n_rep, sd = 0.3),
z = subject_y - 0.4 * signal + rnorm(n_subject * n_rep, sd = 0.4)
)
fit_rmcorr <- rmcorr(dat_rm, response = c("x", "y", "z"), subject = "id")
print(fit_rmcorr, digits = 2)
#> Repeated-measures correlation matrix
#> method : rmcorr
#> dimensions : 3 x 3
#> p_values : yes
#>
#> x y z
#> x 1.00 0.93 -0.60
#> y 0.93 1.00 -0.64
#> z -0.60 -0.64 1.00
summary(fit_rmcorr)
#> Correlation summary
#> output : matrix
#> dimensions : 3 x 3
#> retained_pairs: 6
#> threshold : 0.0000
#> diag : included
#> estimate : -0.6408 to 1.0000
#>
#> item1 item2 estimate n_complete fisher_z
#> x y 0.9294 36 1.6541
#> y z -0.6408 36 -0.7596
#> x z -0.6018 36 -0.6960
#>
#> Strongest pairs by |estimate|
#>
#> item1 item2 estimate n_complete fisher_z
#> x y 0.9294245 36 1.6541468
#> y z -0.6408351 36 -0.7595894
#> x z -0.6018059 36 -0.6959737
Agreement and reliability in repeated designs use a different
long-format interface because they need method and often time
identifiers.
set.seed(4)
n_id <- 10
n_time <- 3
dat_agree <- expand.grid(
id = factor(seq_len(n_id)),
time = factor(seq_len(n_time)),
method = factor(c("A", "B"))
)
subj <- rnorm(n_id, sd = 1.0)[dat_agree$id]
subj_method <- rnorm(n_id * 2, sd = 0.2)
sm <- subj_method[(as.integer(dat_agree$id) - 1L) * 2L + as.integer(dat_agree$method)]
dat_agree$y <- subj + sm + 0.25 * (dat_agree$method == "B") +
rnorm(nrow(dat_agree), sd = 0.35)
fit_icc_rm <- icc_rm_reml(
dat_agree,
response = "y",
subject = "id",
method = "method",
time = "time",
type = "consistency"
)
summary(fit_icc_rm)
#>
#> Repeated-measures intraclass correlation (REML)
#>
#> ICC estimates
#>
#> item1 item2 estimate n_subjects n_obs se_icc residual_model
#> A B 0.9628 10 60 0.0069 iid
#>
#> Variance components
#>
#> sigma2_subject sigma2_subject_method sigma2_subject_time sigma2_error SB
#> 0.9355 0 0 0.1083 0.0066
#>
#> AR(1) diagnostics
#>
#> ar1_rho_lag1 ar1_rho_mom ar1_pairs ar1_pval use_ar1 ar1_recommend
#> -0.129 -0.129 40 0.4147 FALSE FALSE
Shared inspection methods
Most returned objects support at least print() and
summary(). Many also support plot().
Matrix-style objects are intentionally compact when printed, while
summary() returns a longer digest of the strongest or most
relevant pairs.
Display defaults can be controlled through ordinary R options. For
example:
options(
matrixCorr.print_max_rows = 20L,
matrixCorr.print_topn = 5L,
matrixCorr.print_max_vars = 10L,
matrixCorr.print_show_ci = "yes",
matrixCorr.summary_max_rows = 12L,
matrixCorr.summary_topn = 5L,
matrixCorr.summary_max_vars = 10L,
matrixCorr.summary_show_ci = "yes"
)
The print options control compact console previews returned by
print(). The summary options control the longer digest
returned by summary(). Current values can be inspected with
getOption("matrixCorr.print_max_rows") and the same pattern
for the remaining options.
This shared display layer is part of the package design. The goal is
that users can move across workflow families without relearning how
objects are inspected, while still being able to tune how much output is
shown by default.
Where to go next
The remaining vignettes focus on specific workflow families.
Specifically
vignette("v02-wide-correlation-workflows", package = "matrixCorr")
vignette("v03-robust-and-highdim-correlation", package = "matrixCorr")
vignette("v04-latent-and-mixed-scale-correlation", package = "matrixCorr")
vignette("v05-agreement-and-icc-wide", package = "matrixCorr")
vignette("v06-repeated-measures-workflows", package = "matrixCorr")