1. Introduction to matrixCorr

Main workflow families

At a practical level, the package is centred on five workflow families.

Wide-data correlation matrices for numeric inputs.
Robust and high-dimensional association estimation.
Latent and mixed-scale correlation for binary, ordinal, and mixed inputs.
Agreement and reliability analysis for wide data.
Repeated-measures correlation and repeated-measures agreement.

The same broad inspection pattern is used throughout to fit an estimator, print it for a compact view, call summary() for a longer digest, and use plot() when a graphical summary is available.

Wide matrix workflow

The matrix-style functions accept a numeric matrix or data frame and return a square result indexed by the original column names.

library(matrixCorr)

set.seed(1)
X <- as.data.frame(matrix(rnorm(120), ncol = 4))
names(X) <- paste0("V", 1:4)

fit_pearson <- pearson_corr(X, ci = TRUE)
fit_spearman <- spearman_rho(X, ci = TRUE)

print(fit_pearson, digits = 2)
#> Pearson correlation matrix
#>   method      : pearson
#>   dimensions  : 4 x 4
#>   ci          : yes
#> 
#>      V1    V2   V3    V4
#> V1 1.00  0.05 0.02  0.00
#> V2 0.05  1.00 0.45 -0.12
#> V3 0.02  0.45 1.00  0.00
#> V4 0.00 -0.12 0.00  1.00
summary(fit_spearman)
#> Spearman correlation summary
#>   output      : matrix
#>   dimensions  : 4 x 4
#>   retained_pairs: 10
#>   threshold   : 0.0000
#>   diag        : included
#>   estimate    : -0.1524 to 1.0000
#>   ci          : 95%
#>   ci_method   : jackknife_euclidean_likelihood
#>   ci_width    : 0.644 to 0.908
#>   cross_zero  : 5 pair(s)
#> 
#>  item1 item2 estimate n_complete lwr     upr    fisher_z statistic p_value
#>  V2    V3    0.4318   30         0.1077  0.7514 0.4621   2.4013    0.0163 
#>  V1    V3    -0.1524  30         -0.5819 0.2781 -0.1536  -0.7981   0.4248 
#>  V1    V4    -0.0861  30         -0.4636 0.2924 -0.0863  -0.4485   0.6538 
#>  V2    V4    -0.0345  30         -0.4660 0.3982 -0.0345  -0.1792   0.8577 
#>  V1    V2    0.0154   30         -0.3972 0.4276 0.0154   0.0798    0.9364 
#>  V3    V4    0.0087   30         -0.4458 0.4624 0.0087   0.0451    0.9640 
#> 
#> Strongest pairs by |estimate|
#> 
#>  item1 item2 estimate    n_complete lwr        upr       fisher_z   
#>  V2    V3     0.43181313 30          0.1076611 0.7514310  0.46212324
#>  V1    V3    -0.15239155 30         -0.5819059 0.2780522 -0.15358793
#>  V1    V4    -0.08609566 30         -0.4635512 0.2923705 -0.08630934
#>  V2    V4    -0.03448276 30         -0.4660429 0.3981963 -0.03449644
#>  V1    V2     0.01535039 30         -0.3971767 0.4276203  0.01535160
#>  statistic   p_value  
#>   2.40126277 0.0163386
#>  -0.79806631 0.4248320
#>  -0.44847649 0.6538094
#>  -0.17924874 0.8577424
#>   0.07976923 0.9364208

This pattern extends to other wide-data estimators such as kendall_tau(), dcor(), bicor(), pbcor(), wincor(), skipped_corr(), shrinkage_corr(), pcorr(), ccc(), and icc().

Agreement and reliability workflow

Agreement methods answer a different question from ordinary correlation. Correlation asks whether two variables move together. Agreement asks whether two methods give sufficiently similar values on the measurement scale itself.

set.seed(2)
ref <- rnorm(40, mean = 10, sd = 2)
alt <- ref + 0.3 + rnorm(40, sd = 0.8)

fit_ba <- ba(ref, alt)
print(fit_ba)
#> Bland-Altman preview:
#>   based_on    : 40
#>   loa_rule    : mean +/- 1.96 * SD
#>   ci          : 95%
#>   sd_diff     : 0.927
#>   width       : 3.632
#> 
#>  quantity        estimate lwr    upr   
#>  Mean difference -0.269   -0.566 0.027 
#>  Lower LoA       -2.085   -2.599 -1.572
#>  Upper LoA       1.547    1.034  2.060

wide_methods <- data.frame(
  m1 = ref + rnorm(40, sd = 0.2),
  m2 = ref + 0.2 + rnorm(40, sd = 0.3),
  m3 = ref - 0.1 + rnorm(40, sd = 0.4)
)

fit_ccc <- ccc(wide_methods)
fit_icc <- icc(wide_methods, scope = "pairwise")

summary(fit_ccc)
#> Correlation summary
#>   output      : matrix
#>   dimensions  : 3 x 3
#>   retained_pairs: 6
#>   threshold   : 0.0000
#>   diag        : included
#>   estimate    : 0.9567 to 1.0000
#> 
#>  item1 item2 estimate n_complete fisher_z
#>  m1    m2    0.9813   40         2.3309  
#>  m1    m3    0.9803   40         2.3047  
#>  m2    m3    0.9567   40         1.9056  
#> 
#> Strongest pairs by |estimate|
#> 
#>  item1 item2 estimate  n_complete fisher_z
#>  m1    m2    0.9812785 40         2.330914
#>  m1    m3    0.9802809 40         2.304703
#>  m2    m3    0.9567105 40         1.905555
summary(fit_icc)
#> Intraclass correlation summary
#>   method      : Intraclass correlation (one-way, consistency, single)
#>   dimensions  : 3 x 3
#>   pairs       : 3
#>   n_complete  : 40
#>   estimate    : 0.9574 to 0.9817
#>   most_negative: m2-m3 (0.9574)
#>   most_positive: m1-m2 (0.9817)
#> 
#> Strongest pairs by |estimate|
#> 
#>  item1 item2 estimate n_complete
#>  m1    m2    0.9817   40        
#>  m1    m3    0.9807   40        
#>  m2    m3    0.9574   40

For ICC, scope = "pairwise" and scope = "overall" answer different questions as well. Pairwise ICC asks how reliable each specific method pair is. Overall ICC asks how reliable the full set of columns is when analysed jointly.

fit_icc_overall <- icc(wide_methods, scope = "overall", ci = TRUE)
print(fit_icc_overall)
#> Overall intraclass correlation
#>   method      : Overall intraclass correlation table
#>   subjects    : 40
#>   raters      : 3
#>   selected    : ICC1
#> 
#> Coefficient table
#> 
#>  coefficient label            estimate ... upr    selected
#>  ICC1        Single absolute  0.9732   ... 0.9848  TRUE   
#>  ICC2        Single random    0.9733   ... 0.9876 FALSE   
#>  ICC3        Single fixed     0.9819   ... 0.9898 FALSE   
#>  ICC1k       Average absolute 0.9909   ... 0.9949 FALSE   
#>  ICC2k       Average random   0.9909   ... 0.9958 FALSE   
#>  ICC3k       Average fixed    0.9939   ... 0.9966 FALSE   
#> ... 5 more variables not shown (omitted)
#> Use as.data.frame()/tidy()/as.matrix() to inspect the full result.
summary(fit_icc_overall)
#> Overall intraclass correlation summary
#>   selected    : ICC1
#>   subjects    : 40
#>   raters      : 3
#>   n_complete  : 40
#>   ms_subject  : 14.4485
#>   ms_rater    : 1.8009
#>   ms_error    : 0.0884
#> 
#> ICC coefficients
#> 
#>  coefficient label            estimate statistic df1 df2 p_value lwr  upr 
#>  ICC1        Single absolute  0.9732   110.1464  39  80  0       0.96 0.98
#>  ICC2        Single random    0.9733   163.5137  39  78  0       0.93 0.99
#>  ICC3        Single fixed     0.9819   163.5137  39  78  0       0.97 0.99
#>  ICC1k       Average absolute 0.9909   110.1464  39  80  0       0.98 0.99
#>  ICC2k       Average random   0.9909   163.5137  39  78  0       0.98 1.00
#>  ICC3k       Average fixed    0.9939   163.5137  39  78  0       0.99 1.00
#>  selected
#>   TRUE   
#>  FALSE   
#>  FALSE   
#>  FALSE   
#>  FALSE   
#>  FALSE   
#> 
#> ANOVA decomposition
#> 
#>  component df sum_sq   mean_sq statistic p_value
#>  subjects  39 563.4912 14.4485 110.1464   0     
#>  raters     2   3.6017  1.8009  20.3805   0     
#>  residual  78   6.8923  0.0884       NA  NA

Repeated-measures workflow

Repeated-measures functions require long-format data and explicit identifiers for subjects and, when relevant, methods and time.

set.seed(3)
n_subject <- 12
n_rep <- 3

subject <- rep(seq_len(n_subject), each = n_rep)
signal <- rnorm(n_subject * n_rep)
subject_x <- rnorm(n_subject, sd = 1.2)[subject]
subject_y <- rnorm(n_subject, sd = 1.0)[subject]

dat_rm <- data.frame(
  id = subject,
  x = subject_x + signal + rnorm(n_subject * n_rep, sd = 0.2),
  y = subject_y + 0.7 * signal + rnorm(n_subject * n_rep, sd = 0.3),
  z = subject_y - 0.4 * signal + rnorm(n_subject * n_rep, sd = 0.4)
)

fit_rmcorr <- rmcorr(dat_rm, response = c("x", "y", "z"), subject = "id")
print(fit_rmcorr, digits = 2)
#> Repeated-measures correlation matrix
#>   method      : rmcorr
#>   dimensions  : 3 x 3
#>   p_values    : yes
#> 
#>       x     y     z
#> x  1.00  0.93 -0.60
#> y  0.93  1.00 -0.64
#> z -0.60 -0.64  1.00
summary(fit_rmcorr)
#> Correlation summary
#>   output      : matrix
#>   dimensions  : 3 x 3
#>   retained_pairs: 6
#>   threshold   : 0.0000
#>   diag        : included
#>   estimate    : -0.6408 to 1.0000
#> 
#>  item1 item2 estimate n_complete fisher_z
#>  x     y     0.9294   36         1.6541  
#>  y     z     -0.6408  36         -0.7596 
#>  x     z     -0.6018  36         -0.6960 
#> 
#> Strongest pairs by |estimate|
#> 
#>  item1 item2 estimate   n_complete fisher_z  
#>  x     y      0.9294245 36          1.6541468
#>  y     z     -0.6408351 36         -0.7595894
#>  x     z     -0.6018059 36         -0.6959737

Agreement and reliability in repeated designs use a different long-format interface because they need method and often time identifiers.

set.seed(4)
n_id <- 10
n_time <- 3

dat_agree <- expand.grid(
  id = factor(seq_len(n_id)),
  time = factor(seq_len(n_time)),
  method = factor(c("A", "B"))
)

subj <- rnorm(n_id, sd = 1.0)[dat_agree$id]
subj_method <- rnorm(n_id * 2, sd = 0.2)
sm <- subj_method[(as.integer(dat_agree$id) - 1L) * 2L + as.integer(dat_agree$method)]

dat_agree$y <- subj + sm + 0.25 * (dat_agree$method == "B") +
  rnorm(nrow(dat_agree), sd = 0.35)

fit_icc_rm <- icc_rm_reml(
  dat_agree,
  response = "y",
  subject = "id",
  method = "method",
  time = "time",
  type = "consistency"
)

summary(fit_icc_rm)
#> 
#> Repeated-measures intraclass correlation (REML)
#> 
#> ICC estimates
#> 
#>  item1 item2 estimate n_subjects n_obs se_icc residual_model
#>  A     B     0.9628   10         60    0.0069 iid           
#> 
#> Variance components
#> 
#>  sigma2_subject sigma2_subject_method sigma2_subject_time sigma2_error SB    
#>  0.9355         0                     0                   0.1083       0.0066
#> 
#> AR(1) diagnostics
#> 
#>  ar1_rho_lag1 ar1_rho_mom ar1_pairs ar1_pval use_ar1 ar1_recommend
#>  -0.129       -0.129      40        0.4147   FALSE   FALSE

Shared inspection methods

Most returned objects support at least print() and summary(). Many also support plot(). Matrix-style objects are intentionally compact when printed, while summary() returns a longer digest of the strongest or most relevant pairs.

Display defaults can be controlled through ordinary R options. For example:

options(
  matrixCorr.print_max_rows = 20L,
  matrixCorr.print_topn = 5L,
  matrixCorr.print_max_vars = 10L,
  matrixCorr.print_show_ci = "yes",
  matrixCorr.summary_max_rows = 12L,
  matrixCorr.summary_topn = 5L,
  matrixCorr.summary_max_vars = 10L,
  matrixCorr.summary_show_ci = "yes"
)

The print options control compact console previews returned by print(). The summary options control the longer digest returned by summary(). Current values can be inspected with getOption("matrixCorr.print_max_rows") and the same pattern for the remaining options.

This shared display layer is part of the package design. The goal is that users can move across workflow families without relearning how objects are inspected, while still being able to tune how much output is shown by default.

Where to go next

The remaining vignettes focus on specific workflow families. Specifically

vignette("v02-wide-correlation-workflows", package = "matrixCorr")
vignette("v03-robust-and-highdim-correlation", package = "matrixCorr")
vignette("v04-latent-and-mixed-scale-correlation", package = "matrixCorr")
vignette("v05-agreement-and-icc-wide", package = "matrixCorr")
vignette("v06-repeated-measures-workflows", package = "matrixCorr")