--- title: "Sinkhorn Algorithms" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Sinkhorn Algorithms} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Get Started with Sinkhorn Algorithm ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(rwig) |> suppressPackageStartupMessages() ``` Suppose we have two probability vectors **a** and **b**, for example, ```{r} a <- c(.3, .4, .1, .1, .1) b <- c(.4, .5, .1) ``` and the cost matrix **C** between them ```{r} C <- rbind( c(.1, .2, .3), c(.2, .3, .4), c(.4, .3, .2), c(.3, .2, .1), c(.5, .5, .4) ) reg <- .1 ``` and the regularization parameter `reg = .1`. We can then compute the optimal transport plan by the `sinkhorn()` algorithm: ```{r} sinkhorn( a, b, C, sinkhorn_control = list( reg = reg, verbose = 0 ) ) ``` The optimal coupling matrix (optimal transport plan) is **P**, and two scaling vectors **f**, **g** with the method being "log". ## Vanilla, Log, and Auto Sinkhorn Algorithms There are two different methods for the Sinkhorn algorithm implemented in the `wig` package for the computation of the regularized OT problem: the vanilla Sinkhorn and the log-stabilized Sinkhorn. The vanilla algorithm is usually faster, as it is inherently parallel and the computation can be done purely in matrix-vector and matrix-matrix operations. However, the vanilla algorithm is commonly known for its instability when the regularization parameter is very small, and the log-stabilized Sinkhorn will handle any arbitrarily small `reg`. But the cost of doing so is that the log algorithm is typically slower than the vanilla one, as it requires column-by-column and row-by-row soft-minimums. For example, we can have ```{r} sol_vanilla <- sinkhorn( a, b, C, sinkhorn_control = list( reg = reg, method = "vanilla", verbose = 0 ) ) sol_log <- sinkhorn( a, b, C, sinkhorn_control = list( reg = reg, method = "log", verbose = 0 ) ) ``` To speed up the log algorithm, I used multi-threading to speed up the computation. ```{r} sol_thread <- sinkhorn( a, b, C, sinkhorn_control = list( reg = reg, method = "log", n_threads = 4, verbose = 0 ) ) ``` Moreover, I also provided an "auto" method, trying to provide a heuristics for determining whether to use "vanilla" or "log" method automatically. This is also the default algorithm. Although you mostly likely should avoid doing so, you can also tweak the `threshold` argument to tune the "auto" method. The default value is `.1`: smaller means more likely to use "vanilla" method, bigger means more likely to use "log" method. If you know which algorithm to use, you should directly set `method` argument instead of doing this. ```{r} sol_auto <- sinkhorn( a, b, C, sinkhorn_control = list( reg = reg, threshold = .1, verbose = 0 ) ) ``` See also `vignette("barycenter")`. ## Gradient of Loss w.r.t. **a** As derived in Xie (2025), I also provide the gradient of the Sinkhorn algorithm w.r.t. to the source density **a**. The default for gradient calculation is `FALSE`, but you can turn it on by the following: ```{r} sol_grad <- sinkhorn( a, b, C, sinkhorn_control = list( reg = reg, with_grad = TRUE, verbose = 0 ) ) sol_grad$grad_a ``` See also `vignette("barycenter")`, `vignette("gradient")`. ## Other Control Parameters This is the default `sinkhorn_control` argument, and we have discussed `reg`, `with_grad`, `threshold` already. ```{r, eval=FALSE} sinkhorn_control = list( reg = .1, with_grad = FALSE, method = "auto", threshold = .1, max_iter = 1000L, zero_tol = 1e-6, verbose = 0L ) ``` The `max_iter` and `zero_tol` arguments are the parameters to determine maximum number of iterations and the convergence condition for the Sinkhorn algorithms. The `verbose` are useful for printing out some useful information during the computation. You can set it as an non-negative integer (0 meaning not verbose), and the algorithm will update info for every `verbose` steps. For example, ```{r} sol_verbose <- sinkhorn( a, b, C, sinkhorn_control = list( reg = reg, verbose = 10 ) ) ``` See also `vignette("barycenter")`. ## Reference Peyré, G., & Cuturi, M. (2019). Computational Optimal Transport: With Applications to Data Science. *Foundations and Trends® in Machine Learning*, 11(5–6), 355–607. https://doi.org/10.1561/2200000073 Xie, F. (2025). Deriving the Gradients of Some Popular Optimal Transport Algorithms (No. arXiv:2504.08722). *arXiv*. https://doi.org/10.48550/arXiv.2504.08722