---
title: "Advanced Topics"
author: "Gilles Colling"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Advanced Topics}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = FALSE,
message = FALSE,
warning = FALSE,
fig.align = "center",
out.width = "100%"
)
library(couplr)
library(knitr)
# Helper function to locate example assets - use local paths for vignettes
ext_demo <- function(...) {
# Images are in vignettes/images/ for pkgdown compatibility
# Takes nested path args and returns just the filename in images/
args <- c(...)
filename <- args[length(args)] # Get the last element (the filename)
file.path("images", filename)
}
```
## Overview
This vignette serves three purposes:
1. **Build intuition**: Use pixel morphing as a visual analogy for assignment problems
2. **Scale up**: Demonstrate approximation strategies when exact LAP becomes infeasible
3. **Scientific applications**: Show how matching applies to ecology, physics, and chemistry
**This vignette is different**: Unlike the other couplr documentation, it emphasizes *understanding* over *doing*. If you're looking to solve a matching problem today, start with `vignette("getting-started")` or `vignette("matching-workflows")`. Come here when you want to understand *why* these algorithms work and *when* approximations are appropriate.
### Who This Vignette Is For
**Audience**: Advanced users, researchers, algorithm developers, curious minds
**Prerequisites**:
- Familiarity with `lap_solve()` (`vignette("getting-started")`)
- Basic complexity analysis (Big-O notation)
- Interest in algorithm design or scientific computing
**What You'll Learn**:
- Why exact LAP becomes infeasible for large n
- Three approximation strategies and their trade-offs
- How matching problems appear in ecology, physics, and chemistry
- Mathematical connections to optimal transport theory
**Time to complete**: 45-60 minutes (conceptual reading)
### Documentation Roadmap
| Vignette | Focus | Difficulty |
|----------|-------|------------|
| Getting Started | Basic LAP solving, API introduction | Beginner |
| Algorithms | Mathematical foundations, solver selection | Intermediate |
| Matching Workflows | Production matching pipelines | Intermediate |
| **Pixel Morphing** | Scientific applications, approximations | Advanced |
*You are here: Pixel Morphing (Advanced)*
### Why Pixels?
Pixels provide an ideal testbed for understanding assignment problems:
- Each pixel is an **entity** with measurable properties
- **Color** = feature (what it looks like)
- **Position** = spatial location (where it is)
- The matching is **visually verifiable**: you can see if it worked
The same algorithms that morph images smoothly also track particles in physics, align molecules in chemistry, and match vegetation plots in ecology.
### Scientific Applications at a Glance
| Domain | Entities | Features | Spatial |
|--------|----------|----------|---------|
| **Ecology** | Vegetation plots | Species composition | Geographic location |
| **Physics** | Particles | Intensity, size | Predicted position |
| **Chemistry** | Atoms | Element type | 3D coordinates |
| **Images** | Pixels | RGB color | (x, y) position |
## The General Matching Problem
### Problem Formulation
Given two sets of entities $A = \{a_1, \ldots, a_n\}$ and $B = \{b_1, \ldots, b_n\}$, find the optimal one-to-one correspondence by minimizing
$$
\min_{\pi} \sum_{i=1}^{n} c_{i,\pi(i)}
$$
where the cost combines feature similarity and spatial proximity:
$$
c_{ij} = \alpha \, d_{\text{feature}}(a_i, b_j) + \beta \, d_{\text{spatial}}(\mathbf{x}_i, \mathbf{x}_j).
$$
**Feature distance** $d_{\text{feature}}$: domain-specific similarity
- Ecology: Bray-Curtis dissimilarity between species vectors
- Physics: difference in particle intensity or size
- Chemistry: penalty for mismatched atom types
- Images: Euclidean distance in RGB color space
**Spatial distance** $d_{\text{spatial}}$: physical proximity
- Ecology: geographic distance between plot centers
- Physics: Euclidean distance accounting for predicted motion
- Chemistry: 3D distance between atomic coordinates
- Images: 2D pixel position distance
**Weights** $\alpha, \beta \ge 0$ balance feature matching vs. spatial coherence.
### Computational Challenge
Exact solution: solve the full $n \times n$ LAP.
- Complexity: $O(n^3)$ using Jonker-Volgenant
- Feasible: up to $n \approx 1000$ (about $30 \times 30$ images, or $1000$ plots/particles/atoms)
- Prohibitive: for $n = 10\,000$ ($100 \times 100$ images), runtime and memory become expensive
Real applications often involve
- High-resolution images: $200 \times 200 = 40\,000$ pixels
- Large ecological surveys: $5000+$ plots
- Particle tracking: $10\,000+$ particles per frame
- Molecular dynamics: $100\,000+$ atoms
We therefore need approximations that are much faster but still produce high-quality matchings.
## Visual Illustration: Pixel Morphing
To make the abstract ideas concrete, we visualize them using image morphing where
- entities = pixels
- features = RGB color values
- spatial position = $(x, y)$ coordinates
We first show the static input images (all at $80 \times 80$ for display), then the animated morphs produced by different matching strategies.
```{r all-inputs, results='asis', echo=FALSE}
real_A <- ext_demo("work", "ImageA_80.png")
real_B <- ext_demo("work", "ImageB_80.png")
circle_A <- ext_demo("icons", "circleA_80.png")
circle_B <- ext_demo("icons", "circleB_80.png")
files <- c(real_A, real_B, circle_A, circle_B)
alts <- c("Source image A: photograph", "Target image B: photograph",
"Source image A: circle icon", "Target image B: circle icon")
cat('
\n')
for (i in seq_along(files)) {
cat(sprintf('

\n', files[i], alts[i]))
}
cat('
\n')
```
The first pair are real photographs, the second pair are simple geometric shapes. Internally, all matching is computed on logical $40 \times 40$ grids; we then upscale to $80 \times 80$ purely for clearer display.
### Exact Pixel Matching
The exact pixel morph uses a full LAP solution on a $1600 \times 1600$ cost matrix. For each pair of pixels $(i, j)$ we compute
$$
c_{ij} = \alpha \,\lVert \text{RGB}_i^A - \text{RGB}_j^B \rVert_2 +
\beta \,\lVert (x_i, y_i) - (x_j, y_j) \rVert_2,
$$
where color distances are normalized to $[0, \sqrt{3}]$ (RGB in $[0,1]$) and spatial distances to $[0,1]$ using the image diagonal.
```{r exact-vis, results='asis', echo=FALSE}
gif_image_exact <- ext_demo("morphs", "image_exact.gif")
gif_circle_exact <- ext_demo("icons", "circle_exact.gif")
files <- c(gif_image_exact, gif_circle_exact)
alts <- c("Animated GIF showing exact pixel morphing between two photographs",
"Animated GIF showing exact pixel morphing between two circle icons")
cat('\n')
for (i in seq_along(files)) {
cat(sprintf('

\n', files[i], alts[i]))
}
cat('
\n')
```
This yields an optimal one-to-one assignment of pixels. The resulting animations are smooth and artifact-free but require solving the full LAP.
### Feature Quantization Morph (Strategy 1)
```{r color-walk-vis, results='asis', echo=FALSE}
gif_image_cw <- ext_demo("morphs", "image_color_walk.gif")
gif_circle_cw <- ext_demo("icons", "circle_color_walk.gif")
files <- c(gif_image_cw, gif_circle_cw)
alts <- c("Animated GIF showing feature quantization morphing between two photographs",
"Animated GIF showing feature quantization morphing between two circle icons")
cat('\n')
for (i in seq_along(files)) {
cat(sprintf('

\n', files[i], alts[i]))
}
cat('
\n')
```
In the feature quantization morph, similar colors are grouped, and groups are matched rather than individual pixels. Colors move as coherent “bands,” preserving global color structure but losing fine-grained per-pixel detail.
### Hierarchical Morph (Strategy 2)
```{r recursive-vis, results='asis', echo=FALSE}
gif_image_rec <- ext_demo("morphs", "image_recursive.gif")
gif_circle_rec <- ext_demo("icons", "circle_recursive.gif")
files <- c(gif_image_rec, gif_circle_rec)
alts <- c("Animated GIF showing hierarchical morphing between two photographs",
"Animated GIF showing hierarchical morphing between two circle icons")
cat('\n')
for (i in seq_along(files)) {
cat(sprintf('

\n', files[i], alts[i]))
}
cat('
\n')
```
The hierarchical morph first matches large patches, then refines within patches. The motion is locally coherent and scales well to large problems, at the price of potentially missing some globally optimal cross-patch matches.
## Three Approximation Strategies
We now describe the three approximation strategies in detail. The animations above correspond directly to these methods.
### Strategy 1: Feature Quantization
**Core idea**: reduce problem size by grouping entities with similar features, then match groups.
#### Mathematical Formulation
1. **Quantize features**
Map continuous feature space to a finite palette
$$
\text{quantize}: \mathbb{R}^d \to \{1, \ldots, k\},
$$
where $k \ll n$ (for example $k \approx 64$ for $n = 1600$).
2. **Group by palette**
Form groups
$$
G_A^{(c)} = \{ i : \text{quantize}(f_i) = c \}
$$
and similarly for $B$.
3. **Match groups**
Solve a $k \times k$ LAP between palette entries with costs
$$
c'_{ij} = \alpha \, d(p_i, p_j) + \beta \, d(\bar{\mathbf{x}}_i, \bar{\mathbf{x}}_j),
$$
where $p_i$ is the palette color and $\bar{\mathbf{x}}_i$ the centroid position of group $i$.
4. **Assign entities**
Every entity in $G_A^{(c)}$ is assigned according to the group-to-group match.
#### Complexity Reduction
- Original: $O(n^3)$ for an $n \times n$ LAP
- Quantized: $O(k^3 + n k)$ for the $k \times k$ LAP plus group assignment
- Speedup: approximately $(n/k)^3$
For example, with $n = 1600$ (a $40 \times 40$ image) and $k = 64$ you get
$$
\left(\frac{1600}{64}\right)^3 = 25^3 \approx 15\,000
$$
times fewer LAP operations.
#### Quality Trade-offs
**Advantages**
- Very large speedups for big $n$
- Preserves global structure (similar features stay together)
- Produces smooth, band-like motion without large jumps
**Disadvantages**
- Loses detail within each palette group
- Quantization artifacts when $k$ is too small
- May miss optimal local pairings between similar but distinct feature values
The corresponding GIFs are the *color walk* morphs shown earlier.
### Strategy 2: Hierarchical Decomposition
**Core idea**: split the domain into smaller subproblems by spatial partitioning, solve subproblems, and combine.
#### Mathematical Formulation
1. **Spatial partitioning**
Divide the domain into $m \times m$ patches (for example $m = 4$ so you get $16$ patches). Denote the subset of entities of $A$ in patch $k$ by
$$
P_A^{(k)} = \{ a_i : \mathbf{x}_i \in \text{Patch}_k \}.
$$
2. **Patch-level matching**
Form patch representatives: centroid position and mean features per patch. Solve an $m^2 \times m^2$ LAP between patches, with costs defined using the same feature and spatial distances but now at patch level.
3. **Recursive refinement**
Within each matched patch pair $(P_A^{(k)}, P_B^{(l)})$:
- If $\lvert P_A^{(k)} \rvert \le \tau$ (a threshold, e.g. $\tau = 50$) solve the subproblem exactly.
- Otherwise, partition that patch pair again and repeat.
4. **Combine solutions**
Concatenate assignments from all leaf subproblems to obtain the global matching.
#### Complexity (Sketch)
With $d$ levels of decomposition (each level splitting into four patches), the work can be made close to $O(n \log n)$ in practice, compared to $O(n^3)$ for a single full LAP. Intuitively, the LAPs near the leaves are very small, and the costly large LAP is replaced by a series of much smaller ones.
#### Quality Trade-offs
**Advantages**
- Scales to very large $n$ (tens of thousands of entities)
- Preserves local structure: nearby entities tend to be matched within the same spatial patch
- No feature discretization, so feature precision is retained
**Disadvantages**
- May miss globally optimal cross-patch matches
- Quality depends on partitioning scheme and threshold $\tau$
- Possible boundary artifacts if important structure crosses patch boundaries
#### High-Level Algorithm
```
// Pseudocode for hierarchical LAP matching
FUNCTION match_hierarchical(region_A, region_B, threshold, level):
// Base case: region small enough for exact LAP
IF size(region_A) <= threshold THEN
cost ← compute_cost_matrix(region_A, region_B, α, β)
RETURN lap_solve(cost)
END IF
// Divide into 2×2 spatial grid (4 patches)
patches_A ← spatial_partition(region_A, grid = 2×2)
patches_B ← spatial_partition(region_B, grid = 2×2)
// Compute patch representatives
FOR each patch p DO
centroid[p] ← mean(positions in p)
mean_feature[p] ← mean(features in p)
END FOR
// Match patches using 4×4 LAP
patch_cost ← matrix(4, 4)
FOR i = 1 TO 4 DO
FOR j = 1 TO 4 DO
patch_cost[i, j] ← α·distance(mean_feature_A[i], mean_feature_B[j]) +
β·distance(centroid_A[i], centroid_B[j])
END FOR
END FOR
patch_assignment ← lap_solve(patch_cost)
// Recursively solve within matched patches
assignments ← []
FOR i = 1 TO 4 DO
j ← patch_assignment[i]
sub_assignment ← match_hierarchical(
patches_A[i],
patches_B[j],
threshold,
level + 1
)
assignments ← append(assignments, sub_assignment)
END FOR
RETURN concatenate(assignments)
END FUNCTION
```
The `couplr` implementation adds pragmatic details such as normalization of color and spatial distances, conversion between $(x, y)$ coordinates and raster indexing, and handling remainder patches when the grid does not divide evenly.
### Strategy 3: Resolution Reduction
**Core idea**: solve the LAP on a coarse grid, then lift/upscale the assignment to the full-resolution grid.
#### Mathematical Formulation
1. **Downscale**
Reduce spatial resolution by a factor $s$ (for example $s = 2$):
$$
A' = \text{downsample}(A, s), \qquad B' = \text{downsample}(B, s).
$$
Now $A'$ and $B'$ each have $n' = n / s^2$ entities.
2. **Solve at low resolution**
Compute an exact LAP solution on the $n' \times n'$ problem:
$$
\pi' = \arg\min_{\pi'} \sum_{i=1}^{n'} c'_{i,\pi'(i)}.
$$
3. **Upscale assignment**
Map the low-resolution assignment back to full resolution:
$$
\pi(i) = \text{upscale}\!\bigl(\pi'(\text{coarse\_index}(i)), s\bigr),
$$
where each full-resolution entity inherits the assignment of its coarse cell.
#### Complexity
- Original: $O(n^3)$
- Downscaled: $O\bigl((n/s^2)^3\bigr) = O(n^3 / s^6)$
- Speedup: $s^6$
For $s = 2$ this gives a $64\times$ reduction in LAP work.
#### Quality Trade-offs
**Advantages**
- Very simple to implement
- Exact LAP at the coarse level
- Large speedups for moderate $s$
**Disadvantages**
- Loss of fine detail and blocky artifacts
- Assignment is no longer a true permutation at pixel level (multiple fine pixels can map to the same coarse target)
- Quality deteriorates quickly for larger $s$
In practice, resolution reduction is most useful as a crude initialization step for very large problems ($n > 100\,000$).
### Strategy Comparison
| Approach | Speedup (vs. exact) | Quality | Best for |
|-----------------------|---------------------|------------------------|---------------------------------------|
| Exact LAP | $1\times$ | Optimal | $n \le 1000$ |
| Feature quantization | $(n/k)^3$ | Good global structure | Distinct feature groups |
| Hierarchical | $\approx n^{3/2}$ | Good local structure | Large $n$, strong spatial structure |
| Resolution reduction | $s^6$ | Moderate | Very large $n$, rough initialization |
**Practical rules of thumb**
- $n < 1000$: use the exact LAP.
- $1000 < n < 5000$: feature quantization or a shallow hierarchy.
- $n > 5000$: hierarchical decomposition with 2-3 levels.
- $n > 50\,000$: combine $s = 2$ resolution reduction with a hierarchical method.
## Implementation Details of Exact Pixel Matching
We now spell out the exact LAP-based morph more concretely.
We again use the cost
$$
c_{ij} = \alpha \,\lVert \text{RGB}_i^A - \text{RGB}_j^B \rVert_2 +
\beta \,\lVert (x_i, y_i) - (x_j, y_j) \rVert_2.
$$
The algorithm:
```
// Pseudocode for exact pixel matching
// Step 1: Compute full cost matrix (normalized)
n_pixels ← height × width
cost ← matrix(0, n_pixels, n_pixels)
FOR i = 1 TO n_pixels DO
FOR j = 1 TO n_pixels DO
// RGB color distance (normalized to [0, sqrt(3)])
color_dist ← sqrt((R_A[i] - R_B[j])^2 +
(G_A[i] - G_B[j])^2 +
(B_A[i] - B_B[j])^2) / (255 · sqrt(3))
// Spatial distance (normalized to [0, 1] by diagonal)
spatial_dist ← sqrt((x_A[i] - x_B[j])^2 +
(y_A[i] - y_B[j])^2) / diagonal_length
// Combined cost
cost[i, j] ← α · color_dist + β · spatial_dist
END FOR
END FOR
// Step 2: Solve with Jonker-Volgenant
assignment ← lap_solve(cost, method = "jv")
// Step 3: Generate morph frames by linear interpolation
FOR frame_idx = 1 TO n_frames DO
t ← frame_idx / n_frames // Time parameter in [0, 1]
FOR pixel_i = 1 TO n_pixels DO
j ← assignment[pixel_i] // Matched target pixel
// Interpolate position
x_new[pixel_i] ← (1 - t) · x_A[pixel_i] + t · x_B[j]
y_new[pixel_i] ← (1 - t) · y_A[pixel_i] + t · y_B[j]
// Keep source color (transport-only, no blending)
RGB_new[pixel_i] ← RGB_A[pixel_i]
END FOR
frames[frame_idx] ← render(x_new, y_new, RGB_new)
END FOR
```
The `couplr` implementation handles indexing, raster layout, and shows or saves the resulting GIFs.
Approximate performance: up to about $100 \times 100$ (10 000 pixels) on typical hardware is fine with the exact LAP.
## Application to Scientific Domains
We now return from pixel morphs to the scientific settings that motivated them.
### Ecology: Vegetation Plot Matching
**Problem**: match $n$ vegetation plots surveyed at time $t$ to $n$ plots at time $t + \Delta t$ to track community dynamics.
**Feature distance**: Bray-Curtis dissimilarity between species abundance vectors
$$
d_{\text{BC}}(a, b) =
\frac{\sum_s \lvert a_s - b_s \rvert}
{\sum_s (a_s + b_s)},
$$
where $a_s, b_s$ are abundances of species $s$ in plots $a$ and $b$.
**Spatial distance**: geographic distance (e.g. in kilometers) between plot centers.
Exact solution for small studies ($n < 100$):
```
// Pseudocode for ecological plot matching
FOR i = 1 TO n_plots_t DO
FOR j = 1 TO n_plots_tplus DO
// Bray-Curtis dissimilarity for species composition
numerator ← sum over species s of |abundance_t[i, s] - abundance_tplus[j, s]|
denominator ← sum over species s of (abundance_t[i, s] + abundance_tplus[j, s])
bc_distance ← numerator / denominator
// Geographic distance (kilometers)
geo_distance ← sqrt((x_t[i] - x_tplus[j])^2 +
(y_t[i] - y_tplus[j])^2)
// Combined cost (α = 0.7 emphasizes species composition)
cost[i, j] ← 0.7 · bc_distance + 0.3 · (geo_distance / max_distance)
END FOR
END FOR
plot_correspondence ← lap_solve(cost)
```
For large studies ($n > 1000$) a hierarchical approach by region is more practical:
```
// Hierarchical decomposition by geographic region
// 1. Divide landscape into spatial grid (e.g. 10 km × 10 km cells)
regions_t ← spatial_partition(plots_t, grid_size = 10 km)
regions_tplus ← spatial_partition(plots_tplus, grid_size = 10 km)
// 2. Compute region representatives
FOR each region r DO
mean_composition[r] ← average species vector across plots in r
centroid[r] ← geographic center of r
END FOR
// 3. Match regions (small LAP: ~100 regions)
region_cost ← compute_cost(mean_composition, centroids, α = 0.7, β = 0.3)
region_assignment ← lap_solve(region_cost)
// 4. Within matched regions, solve plot-level LAP
full_assignment ← []
FOR r = 1 TO n_regions DO
r_matched ← region_assignment[r]
plots_A ← plots in regions_t[r]
plots_B ← plots in regions_tplus[r_matched]
// Local LAP (smaller problem, e.g. 50 × 50)
cost_local ← compute_plot_cost(plots_A, plots_B, α = 0.7, β = 0.3)
local_assignment ← lap_solve(cost_local)
full_assignment ← append(full_assignment, local_assignment)
END FOR
RETURN full_assignment
```
This allows tracking individual plot trajectories across time, distinguishing stable communities, successional trends, and invasion fronts.
### Physics: Particle Tracking
**Problem**: track $n$ particles between frame $t$ and $t + \Delta t$ in experimental video.
**Feature distance**: differences in intensity, size, or shape.
**Spatial distance**: displacement relative to predicted motion:
$$
d_{\text{spatial}}(i, j) =
\bigl\| \mathbf{x}_i + \mathbf{v}_i \Delta t - \mathbf{x}_j \bigr\|_2,
$$
where $\mathbf{v}_i$ is the estimated velocity from previous frames.
We also impose a maximum displacement $d_{\max}$ beyond which matches are physically implausible.
Exact solution (moderate $n$):
```
// Pseudocode for particle tracking with velocity prediction
// Initialize cost matrix as forbidden everywhere
cost ← matrix(Inf, n_particles_t, n_particles_tplus)
FOR i = 1 TO n_particles_t DO
// Predict position using previous velocity
x_predicted ← x_t[i] + v_x_t[i] · Δt
y_predicted ← y_t[i] + v_y_t[i] · Δt
FOR j = 1 TO n_particles_tplus DO
// Distance from predicted position
dx ← x_predicted - x_tplus[j]
dy ← y_predicted - y_tplus[j]
spatial_distance ← sqrt(dx^2 + dy^2)
// Only consider physically plausible matches
IF spatial_distance <= max_displacement THEN
// Feature similarity (intensity, size, etc.)
feature_distance ← |intensity_t[i] - intensity_tplus[j]|
// Combined cost
cost[i, j] ← α · feature_distance + β · spatial_distance
END IF
END FOR
END FOR
// Solve assignment (Inf entries are forbidden)
particle_tracks ← lap_solve(cost)
// Update velocities from assignments
FOR i = 1 TO n_particles_t DO
j ← particle_tracks[i]
velocity_new[i] ← (position_tplus[j] - position_t[i]) / Δt
END FOR
```
For dense tracking ($n > 5000$), we can first cluster particles:
```
// Two-stage: clustering then local matching
// Stage 1: spatial clustering
clusters_t ← spatial_cluster(particles_t, radius = 2 · pixel_size)
clusters_tplus ← spatial_cluster(particles_tplus, radius = 2 · pixel_size)
// Compute cluster representatives
FOR each cluster c DO
centroid[c] ← mean position of particles in c
mean_intensity[c] ← mean intensity
mean_velocity[c] ← mean velocity (if available)
END FOR
// Match clusters
cluster_cost ← compute_cluster_similarity(clusters_t, clusters_tplus)
cluster_tracks ← lap_solve(cluster_cost)
// Stage 2: within matched clusters, track individual particles
full_tracks ← []
FOR c = 1 TO n_clusters DO
c_matched ← cluster_tracks[c]
particles_A ← particles in clusters_t[c]
particles_B ← particles in clusters_tplus[c_matched]
cost_local ← compute_particle_distance(
particles_A, particles_B,
max_displacement = 5,
α = 0.3,
β = 0.7
)
local_tracks ← lap_solve(cost_local)
full_tracks ← append(full_tracks, local_tracks)
END FOR
RETURN full_tracks
```
This yields efficient and robust trajectories even for very dense particle fields.
### Chemistry: Molecular Conformation Alignment
**Problem**: align two conformations of the same molecule (e.g. a protein) with $n$ atoms to compute RMSD and analyze structural change.
**Feature distance**: strict element matching
$$
d_{\text{element}}(i, j) =
\begin{cases}
0, & \text{if } \text{element}_i = \text{element}_j, \\
\infty, & \text{otherwise.}
\end{cases}
$$
**Spatial distance**: 3D Euclidean distance between atomic coordinates.
Exact LAP for small molecules:
```
// Pseudocode for molecular conformation alignment
n_atoms ← number of atoms in molecule
cost ← matrix(0, n_atoms, n_atoms)
FOR i = 1 TO n_atoms DO
FOR j = 1 TO n_atoms DO
// Enforce strict element type matching
IF element_type_A[i] ≠ element_type_B[j] THEN
cost[i, j] ← Inf
ELSE
dx ← x_A[i] - x_B[j]
dy ← y_A[i] - y_B[j]
dz ← z_A[i] - z_B[j]
cost[i, j] ← sqrt(dx^2 + dy^2 + dz^2)
END IF
END FOR
END FOR
// Solve alignment
alignment ← lap_solve(cost)
// Compute RMSD
sum_sq_dist ← 0
FOR i = 1 TO n_atoms DO
j ← alignment[i]
sum_sq_dist ← sum_sq_dist + cost[i, j]^2
END FOR
rmsd ← sqrt(sum_sq_dist / n_atoms)
```
For large biomolecules, we again use a hierarchical strategy, this time by secondary structure elements (helices, sheets, loops, etc.), aligning segments first and then atoms within matched segments.
## Implementation Notes
### Customizing Morph Duration
The morphing examples use default settings, but you can customize the number of frames and speed:
```r
# From inst/scripts/generate_examples.R
generate_morph <- function(assignment, pixels_A, pixels_B,
n_frames = 30, # number of frames
frame_delay = 0.1) # delay between frames (seconds)
{
frames <- lapply(seq(0, 1, length.out = n_frames), function(t) {
interpolate_frame(t, assignment, pixels_A, pixels_B)
})
save_gif(frames, delay = frame_delay)
}
```
Total animation duration is `n_frames * frame_delay` seconds.
### Using the Example Code
The morphing implementation is provided in `inst/scripts/generate_examples.R`:
```r
# View the source
example_script <- system.file("scripts", "generate_examples.R", package = "couplr")
file.show(example_script)
# Or source to use its helpers
source(example_script)
# Apply to your own data
my_cost <- build_cost_matrix(my_data_A, my_data_B)
my_assignment <- lap_solve(my_cost)
```
### Regenerating Examples
To regenerate all demo GIFs and PNGs:
```r
source("inst/scripts/generate_examples.R")
```
This will write the assets under `inst/extdata`.
## Mathematical Foundation: Optimal Transport
The matching problems discussed here are discrete instances of optimal transport.
### Monge Problem
The original Monge formulation (1781) seeks a transport map $T: A \to B$ minimizing
$$
\int_A c(\mathbf{x}, T(\mathbf{x})) \,\mathrm{d}\mu(\mathbf{x}).
$$
### Kantorovich Relaxation
Kantorovich (1942) relaxed this to a transport plan $\gamma$ on $A \times B$:
$$
\min_{\gamma} \int_{A \times B} c(\mathbf{x}, \mathbf{y}) \,\mathrm{d}\gamma(\mathbf{x}, \mathbf{y})
$$
subject to marginal constraints on $\gamma$.
### Discrete Linear Assignment
For discrete uniform distributions with $n$ points in $A$ and $B$ we obtain exactly the linear assignment problem:
$$
\min_{\pi \in S_n} \sum_{i=1}^n c_{i,\pi(i)},
$$
which is what `couplr` solves efficiently.
### Wasserstein Distance
With $c_{ij} = d(\mathbf{x}_i, \mathbf{x}_j)$ (often Euclidean distance), the optimal cost defines the $1$‑Wasserstein distance:
$$
W_1(\mu, \nu) = \min_{\pi \in S_n} \sum_{i=1}^n c_{i,\pi(i)}.
$$
This appears in
- Earth mover’s distance for image retrieval
- Distributional similarity in statistics
- Generative modeling (e.g. Wasserstein GANs)
## Further Reading
### Optimal Transport Theory
- Peyré, G., & Cuturi, M. (2019). *Computational Optimal Transport*. Foundations and Trends in Machine Learning.
- Villani, C. (2008). *Optimal Transport: Old and New*. Springer.
### Scientific Applications
**Ecology**
- Anderson, M. J. et al. (2011). Navigating the multiple meanings of beta diversity. *Ecology Letters*.
- Legendre, P., & Legendre, L. (2012). *Numerical Ecology*. Elsevier.
**Physics**
- Adrian, R. J., & Westerweel, J. (2011). *Particle Image Velocimetry*. Cambridge University Press.
- Crocker, J. C., & Grier, D. G. (1996). Methods of digital video microscopy. *Journal of Colloid and Interface Science*.
**Chemistry**
- Kabsch, W. (1976). A solution for the best rotation to relate two sets of vectors. *Acta Crystallographica*.
- Coutsias, E. A. et al. (2004). Using quaternions to calculate RMSD. *Journal of Computational Chemistry*.
### Assignment Algorithms
- Burkard, R., Dell'Amico, M., & Martello, S. (2009). *Assignment Problems*. SIAM.
- For implementation details in this package see `vignette("algorithms")`.
## Connection to couplr Workflows
The approximation strategies in this vignette become relevant when working with couplr's practical matching functions.
### When Do These Strategies Apply?
| Strategy | couplr Implementation | When to Use |
|----------|----------------------|-------------|
| Exact LAP | `match_couples(method = "jv")` | n < 3,000 |
| Feature quantization | Implicit via `scale = "robust"` | Reduces effective feature space |
| Hierarchical | `matchmaker(block_type = "cluster")` | n > 3,000, use blocking |
| Resolution reduction | Custom code | Very large n |
### Practical Recommendations
**For n < 3,000**: Use `match_couples()` with exact algorithms:
```r
result <- match_couples(left, right, vars = c("x", "y", "z"), auto_scale = TRUE)
```
**For 3,000 < n < 10,000**: Use blocking to create smaller subproblems:
```r
blocks <- matchmaker(left, right, block_type = "cluster", n_blocks = 10)
result <- match_couples(blocks$left, blocks$right, vars = vars, block_id = "block_id")
```
**For n > 10,000**: Use greedy matching:
```r
result <- greedy_couples(left, right, vars = vars, strategy = "sorted")
```
**For n > 50,000**: Combine strategies (blocking + greedy within blocks), or implement custom approximations using the techniques in this vignette.
### Limitations of Approximation Strategies
Each approximation trades accuracy for speed. Know the failure modes:
| Strategy | Works Well When | Fails When |
|----------|-----------------|------------|
| Feature quantization | Features cluster naturally | Continuous features, fine distinctions matter |
| Hierarchical | Spatial locality is meaningful | Optimal matches cross boundaries |
| Resolution reduction | Coarse structure suffices | Fine detail matters |
## Summary
This vignette explored optimal matching through the lens of pixel morphing and scientific applications.
**Key Ideas**:
1. **Assignment = matching**: LAP finds optimal correspondences between two sets
2. **Scalability matters**: $O(n^3)$ becomes prohibitive for $n > 3{,}000$
3. **Three approximations**: Feature quantization, hierarchical decomposition, resolution reduction
4. **Same math, different domains**: Pixels, particles, plots, and atoms all use the same algorithms
The same algorithms that morph images smoothly also track particles in physics, align molecules in chemistry, and match vegetation plots in ecology. Together, the methods in `couplr` let you move between exact optimal matchings and principled approximations, depending on problem size and accuracy requirements.
---
## See Also
- `vignette("getting-started")` - Basic LAP solving
- `vignette("algorithms")` - Mathematical foundations
- `vignette("matching-workflows")` - Production matching pipelines
- `?lap_solve`, `?match_couples`, `?greedy_couples`