This vignette provides solutions to common issues encountered when using couplr. Each section describes a problem, explains why it occurs, and provides step-by-step solutions.
Quick reference:
| Issue | Jump to Section |
|---|---|
| No valid pairs found | Infeasible Problems |
| Poor balance after matching | Poor Balance |
| Matching is too slow | Performance Issues |
| Memory errors | Memory Errors |
| Different results with different methods | Non-Determinism |
| Unexpected NA in results | Missing Values |
| C++ compilation errors | Installation Issues |
# This fails: all assignments have Inf cost
cost <- matrix(c(1, Inf, Inf, Inf, Inf, Inf, Inf, 2, 3), nrow = 3, byrow = TRUE)
result <- lap_solve(cost)
#> Error in `lap_solve_bruteforce()`:
#> ! Infeasible given forbidden edgesError message: “No feasible assignment found” or total cost is Inf.
A feasible assignment requires that every row can be assigned to at least one column with finite cost. When rows have all-Inf or all-NA entries, no valid assignment exists.
# Check for infeasible rows
check_feasibility <- function(cost_matrix) {
finite_per_row <- rowSums(is.finite(cost_matrix))
infeasible_rows <- which(finite_per_row == 0)
if (length(infeasible_rows) > 0) {
cat("Infeasible rows (no finite costs):", infeasible_rows, "\n")
return(FALSE)
}
finite_per_col <- colSums(is.finite(cost_matrix))
infeasible_cols <- which(finite_per_col == 0)
if (length(infeasible_cols) > 0) {
cat("Infeasible columns (no finite costs):", infeasible_cols, "\n")
return(FALSE)
}
cat("Problem appears feasible\n")
return(TRUE)
}
# Check the problematic matrix
cost <- matrix(c(1, Inf, Inf, Inf, Inf, Inf, Inf, 2, 3), nrow = 3, byrow = TRUE)
check_feasibility(cost)
#> Infeasible rows (no finite costs): 2
#> [1] FALSE1. Remove infeasible rows/columns:
# Remove rows with no valid assignments
cost <- matrix(c(1, Inf, Inf, Inf, Inf, Inf, Inf, 2, 3), nrow = 3, byrow = TRUE)
valid_rows <- rowSums(is.finite(cost)) > 0
cost_valid <- cost[valid_rows, , drop = FALSE]
if (nrow(cost_valid) > 0) {
result <- lap_solve(cost_valid)
cat("Matched", nrow(result), "of", nrow(cost), "rows\n")
}
#> Matched 2 of 3 rows2. Add fallback costs:
# Replace Inf with high (but finite) penalty
cost_with_fallback <- cost
cost_with_fallback[!is.finite(cost_with_fallback)] <- 1e6 # Large penalty
result <- lap_solve(cost_with_fallback)
print(result)
#> Assignment Result
#> =================
#>
#> # A tibble: 3 × 3
#> source target cost
#> <int> <int> <dbl>
#> 1 1 1 1
#> 2 2 3 1000000
#> 3 3 2 2
#>
#> Total cost: 1000003
#> Method: bruteforce3. For matching: check covariate overlap:
# Simulate poor overlap scenario
set.seed(123)
left <- tibble(id = 1:50, age = rnorm(50, mean = 25, sd = 3))
right <- tibble(id = 1:50, age = rnorm(50, mean = 55, sd = 3))
# Visualize overlap
library(ggplot2)
combined <- bind_rows(
left %>% mutate(group = "Left"),
right %>% mutate(group = "Right")
)
ggplot(combined, aes(x = age, fill = group)) +
geom_density(alpha = 0.5) +
labs(title = "Poor Covariate Overlap",
subtitle = "No overlap means no good matches possible") +
theme_minimal() +
theme(plot.background = element_rect(fill = "transparent", color = NA),
panel.background = element_rect(fill = "transparent", color = NA))After matching, balance_diagnostics() shows |std_diff|
> 0.25 for some variables.
# Example of poor balance
set.seed(456)
left <- tibble(
id = 1:100,
age = rnorm(100, 30, 5),
income = rnorm(100, 80000, 20000) # Very different from right
)
right <- tibble(
id = 1:100,
age = rnorm(100, 32, 5),
income = rnorm(100, 40000, 10000) # Very different income
)
result <- match_couples(left, right, vars = c("age", "income"), auto_scale = TRUE)
#> Auto-selected scaling method: standardize
balance <- balance_diagnostics(result, left, right, vars = c("age", "income"))
print(balance)
#>
#> Balance Diagnostics for Matched Pairs
#> ======================================
#>
#> Matching Summary:
#> Method: lap
#> Matched pairs: 100
#> Unmatched left: 0 (of 100)
#> Unmatched right: 0 (of 100)
#>
#> Variable-level Balance:
#> # A tibble: 2 × 7
#> Variable `Mean Left` `Mean Right` `Mean Diff` `Std Diff` `Var Ratio` `KS Stat`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 age 30.6 32.5 -1.89 -0.386 1.05 0.22
#> 2 income 77666. 41353. 36313. 2.40 2.15 0.81
#>
#> Overall Balance:
#> Mean |Std Diff|: 1.391 (Poor)
#> Max |Std Diff|: 2.397
#> Vars with |Std Diff| > 0.25: 100.0%
#>
#> Balance Interpretation:
#> |Std Diff| < 0.10: Excellent balance
#> |Std Diff| 0.10-0.25: Good balance
#> |Std Diff| 0.25-0.50: Acceptable balance
#> |Std Diff| > 0.50: Poor balance1. Add more matching variables:
# Include additional relevant variables
result <- match_couples(
left, right,
vars = c("age", "income", "education", "region"), # More variables
auto_scale = TRUE
)2. Tighten caliper (fewer but better matches):
result_strict <- match_couples(
left, right,
vars = c("age", "income"),
max_distance = 0.3, # Stricter caliper
auto_scale = TRUE
)
#> Auto-selected scaling method: standardize
#> Error in `lap_solve_auction_scaled()`:
#> ! Auction(scaled): iteration guard at eps=7885383213142811494408480082840444688844086424042268422604802684204020084804602286084062426808668246202024044680060000888826240426666820268808860060886662888880668826624624282408686624866884640486622460688806226620844868208208262004280626244240268286668462608648088464686088602226464888282882.000000, phase=19
cat("Original matches:", result$info$n_matched, "\n")
#> Original matches: 100
cat("With caliper:", result_strict$info$n_matched, "\n")
#> With caliper: 40
balance_strict <- balance_diagnostics(result_strict, left, right, vars = c("age", "income"))
print(balance_strict)
#>
#> Balance Diagnostics for Matched Pairs
#> ======================================
#>
#> Matching Summary:
#> Method: lap
#> Matched pairs: 40
#> Unmatched left: 0 (of 100)
#> Unmatched right: 60 (of 100)
#>
#> Variable-level Balance:
#> # A tibble: 2 × 7
#> Variable `Mean Left` `Mean Right` `Mean Diff` `Std Diff` `Var Ratio` `KS Stat`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 age 30.4 32.9 -2.52 -0.517 1.25 0.275
#> 2 income 75769. 41145. 34624. 2.42 2.41 0.825
#>
#> Overall Balance:
#> Mean |Std Diff|: 1.468 (Poor)
#> Max |Std Diff|: 2.419
#> Vars with |Std Diff| > 0.25: 100.0%
#>
#> Balance Interpretation:
#> |Std Diff| < 0.10: Excellent balance
#> |Std Diff| 0.10-0.25: Good balance
#> |Std Diff| 0.25-0.50: Acceptable balance
#> |Std Diff| > 0.50: Poor balance3. Use blocking on problematic variable:
# Block on income tertiles to ensure exact balance
left$income_cat <- cut(left$income, breaks = 3, labels = c("low", "mid", "high"))
right$income_cat <- cut(right$income, breaks = 3, labels = c("low", "mid", "high"))
blocks <- matchmaker(left, right, block_type = "group", block_by = "income_cat")
result_blocked <- match_couples(
blocks$left, blocks$right,
vars = c("age"), # Match on age within income blocks
block_id = "block_id"
)4. Try different scaling:
# Compare scaling methods
for (scale_method in c("robust", "standardize", "range")) {
res <- match_couples(left, right, vars = c("age", "income"),
auto_scale = TRUE, scale = scale_method)
bal <- balance_diagnostics(res, left, right, vars = c("age", "income"))
cat(scale_method, "- max |std_diff|:",
round(bal$overall$max_abs_std_diff, 3), "\n")
}
#> robust - max |std_diff|: 2.397
#> standardize - max |std_diff|: 2.397
#> range - max |std_diff|: 2.397match_couples() or lap_solve() takes too
long or doesn’t complete.
Optimal matching has \(O(n^3)\) complexity. For n = 5,000, this means ~125 billion operations.
# Estimate runtime
estimate_runtime <- function(n, seconds_per_billion = 1) {
ops <- n^3
time_sec <- ops / 1e9 * seconds_per_billion
if (time_sec < 60) {
sprintf("%.1f seconds", time_sec)
} else if (time_sec < 3600) {
sprintf("%.1f minutes", time_sec / 60)
} else {
sprintf("%.1f hours", time_sec / 3600)
}
}
cat("Estimated runtime for optimal matching:\n")
#> Estimated runtime for optimal matching:
for (n in c(100, 500, 1000, 3000, 5000, 10000)) {
cat(sprintf(" n = %5d: %s\n", n, estimate_runtime(n)))
}
#> n = 100: 0.0 seconds
#> n = 500: 0.1 seconds
#> n = 1000: 1.0 seconds
#> n = 3000: 27.0 seconds
#> n = 5000: 2.1 minutes
#> n = 10000: 16.7 minutes1. Use greedy matching for large problems:
set.seed(789)
n <- 500
large_left <- tibble(id = 1:n, x1 = rnorm(n), x2 = rnorm(n))
large_right <- tibble(id = 1:n, x1 = rnorm(n), x2 = rnorm(n))
# Greedy is much faster
time_greedy <- system.time({
result_greedy <- greedy_couples(
large_left, large_right,
vars = c("x1", "x2"),
strategy = "row_best"
)
})
cat("Greedy matching (n=500):", round(time_greedy["elapsed"], 2), "seconds\n")
#> Greedy matching (n=500): 0.38 seconds
cat("Quality (mean distance):", round(mean(result_greedy$pairs$distance), 4), "\n")
#> Quality (mean distance): 0.28862. Use blocking to divide the problem:
# Create clusters to match within
blocks <- matchmaker(
large_left, large_right,
block_type = "cluster",
block_vars = c("x1", "x2"),
n_blocks = 10 # 10 blocks of ~200 each
)
# Match within blocks (10 x O(200^3) << O(2000^3))
result_blocked <- match_couples(
blocks$left, blocks$right,
vars = c("x1", "x2"),
block_id = "block_id"
)3. Choose a faster algorithm:
# For n > 1000, auction algorithm often faster
result <- match_couples(
large_left, large_right,
vars = c("x1", "x2"),
method = "auction"
)
# For sparse problems (many forbidden pairs)
result <- match_couples(
left, right,
vars = vars,
max_distance = 0.5, # Creates sparsity
method = "sap" # Sparse algorithm
)4. Pre-compute and cache distances:
# Compute once, reuse multiple times
dist_cache <- compute_distances(
large_left, large_right,
vars = c("x1", "x2"),
scale = "robust"
)
# Fast: reuse cached distances
result1 <- match_couples(dist_cache, max_distance = 0.3)
#> Error in `lap_solve_auction_scaled()`:
#> ! Auction(scaled): iteration guard at eps=7885383213142811494408480082840444688844086424042268422604802684204020084804602286084062426808668246202024044680060000888826240426666820268808860060886662888880668826624624282408686624866884640486622460688806226620844868208208262004280626244240268286668462608648088464686088602226464888282882.000000, phase=19
result2 <- match_couples(dist_cache, max_distance = 0.5)
#> Error in `lap_solve_auction_scaled()`:
#> ! Auction(scaled): iteration guard at eps=7885383213142811494408480082840444688844086424042268422604802684204020084804602286084062426808668246202024044680060000888826240426666820268808860060886662888880668826624624282408686624866884640486622460688806226620844868208208262004280626244240268286668462608648088464686088602226464888282882.000000, phase=19
result3 <- match_couples(dist_cache, max_distance = 1.0)
#> Error in `lap_solve_auction_scaled()`:
#> ! Auction(scaled): iteration guard at eps=1126483316163258824588840464686822002040606866822442644006464204284244600860220408644206628884660088462204820084486686828448800068422244884462088864286246088448668606428686082644420022446608028628488602664280848840444608420828066682648826868226802866640024264200844882840648848844628646628040.000000, phase=20R crashes or shows: “Error: cannot allocate vector of size X GB”
A full distance matrix for n×n requires 8n² bytes:
# Memory requirements
memory_needed <- function(n) {
bytes <- 8 * n^2
if (bytes < 1e6) {
sprintf("%.1f KB", bytes / 1e3)
} else if (bytes < 1e9) {
sprintf("%.1f MB", bytes / 1e6)
} else {
sprintf("%.1f GB", bytes / 1e9)
}
}
cat("Memory for full distance matrix:\n")
#> Memory for full distance matrix:
for (n in c(1000, 5000, 10000, 20000, 50000)) {
cat(sprintf(" n = %5d: %s\n", n, memory_needed(n)))
}
#> n = 1000: 8.0 MB
#> n = 5000: 200.0 MB
#> n = 10000: 800.0 MB
#> n = 20000: 3.2 GB
#> n = 50000: 20.0 GB1. Use greedy matching (avoids full matrix):
# Greedy computes distances on-the-fly
result <- greedy_couples(
left, right,
vars = covariates,
strategy = "row_best" # Most memory-efficient
)2. Use blocking to create smaller subproblems:
# Each block is much smaller
blocks <- matchmaker(left, right, block_type = "cluster", n_blocks = 20)
result <- match_couples(blocks$left, blocks$right, vars = vars, block_id = "block_id")3. Use caliper to create sparse matrix:
# Caliper excludes distant pairs (sparse representation)
result <- match_couples(
left, right,
vars = covariates,
max_distance = 0.5,
method = "sap" # Sparse-optimized algorithm
)4. Increase R’s memory limit (Windows):
Different algorithms return different assignments:
cost <- matrix(c(1, 2, 2, 2, 1, 2, 2, 2, 1), nrow = 3, byrow = TRUE)
result_jv <- lap_solve(cost, method = "jv")
result_hungarian <- lap_solve(cost, method = "hungarian")
cat("JV assignment: ", result_jv$target, "\n")
#> JV assignment: 1 2 3
cat("Hungarian assignment:", result_hungarian$target, "\n")
#> Hungarian assignment: 1 2 3
cat("JV total cost: ", get_total_cost(result_jv), "\n")
#> JV total cost: 3
cat("Hungarian total cost:", get_total_cost(result_hungarian), "\n")
#> Hungarian total cost: 3When multiple optimal solutions exist (tied costs), different algorithms may find different ones. The total cost should be the same; if not, report a bug.
# Check for ties
check_ties <- function(cost_matrix) {
n <- nrow(cost_matrix)
# Check if diagonal dominates (trivial ties)
diag_costs <- diag(cost_matrix)
if (length(unique(diag_costs)) < n) {
cat("Tied costs on diagonal - multiple optima likely\n")
}
# Check cost uniqueness
unique_costs <- length(unique(as.vector(cost_matrix)))
total_entries <- length(cost_matrix)
if (unique_costs < total_entries * 0.5) {
cat("Many repeated costs - ties possible\n")
}
}
check_ties(cost)
#> Tied costs on diagonal - multiple optima likely
#> Many repeated costs - ties possible1. Verify total costs match:
# Total cost should be identical
stopifnot(get_total_cost(result_jv) == get_total_cost(result_hungarian))
cat("Both methods found optimal solutions (same total cost)\n")
#> Both methods found optimal solutions (same total cost)2. Use Hungarian for deterministic tie-breaking:
3. Add small noise to break ties:
set.seed(42)
cost_perturbed <- cost + matrix(rnorm(9, 0, 1e-10), 3, 3)
result <- lap_solve(cost_perturbed)Matching fails or produces unexpected results due to NA values.
couplr requires complete cases for distance computation. NA values in matching variables cause issues.
1. Remove rows with NA (before matching):
left_clean <- left %>% filter(!is.na(age))
right_clean <- right %>% filter(!is.na(age))
result <- match_couples(left_clean, right_clean, vars = "age")
cat("Matched", result$info$n_matched, "pairs (excluded 1 left unit with NA)\n")
#> Matched 4 pairs (excluded 1 left unit with NA)2. Impute missing values:
# Simple mean imputation
left_imputed <- left %>%
mutate(age = if_else(is.na(age), mean(age, na.rm = TRUE), age))
result <- match_couples(left_imputed, right, vars = "age")
cat("Matched", result$info$n_matched, "pairs (imputed 1 NA with mean)\n")
#> Matched 5 pairs (imputed 1 NA with mean)3. Use preprocessing to diagnose:
health <- preprocess_matching_vars(
left, right,
vars = "age"
)
print(health)
#> Preprocessing Result
#> ====================
#>
#> Variables: 1
#> Scaling method: noneError in .Call("_couplr_assignment_impl"): object not found
or
Error: package 'couplr' was built under R version X.X
C++ code not compiled properly, or object files are stale.
1. Clean and reinstall:
# In R:
remove.packages("couplr")
install.packages("couplr")
# Or from GitHub:
devtools::install_github("gcol33/couplr", force = TRUE)2. Clean compiled files (development):
Then rebuild:
3. Check Rtools/compiler (Windows):
# Check if Rtools is properly configured
Sys.which("make")
#> make
#> "C:\\rtools45\\usr\\bin\\make.exe"
Sys.which("g++")
#> g++
#> "C:\\rtools45\\X86_64~1.POS\\bin\\G__~1.EXE"Make sure Rtools is installed and PATH is configured correctly.
Cause: Result object wasn’t created properly.
Solution: Use lap_solve() instead of
assignment() for the tidy interface.
Cause: NA or NaN values in cost matrix (not Inf).
Solution: Replace NA with Inf for forbidden assignments, or remove rows/cols.
Cause: Caliper is too strict; all potential matches exceed threshold.
Solution: Increase max_distance or
check for overlap issues.
If you encounter an issue not covered here:
?match_couples, ?lap_solve# Minimal example template
library(couplr)
# Minimal data that reproduces the issue
set.seed(123)
left <- tibble(id = 1:10, x = rnorm(10))
right <- tibble(id = 1:10, x = rnorm(10))
# Code that causes the error
result <- match_couples(left, right, vars = "x")
# Expected vs actual behavior
# Expected: ...
# Actual: [error message]
# Session info
sessionInfo()vignette("getting-started") - Basic usagevignette("matching-workflows") - Production matching
pipelinesvignette("algorithms") - Algorithm selection guidevignette("comparison") - Comparison with other
packages