Abstract
This vignette demonstrates how the choice of SEL weights for smoothing the likelihood affects the final result.library(smoothemplik)
#> Smoothed Empirical Likelihood version 0.0.16 (2025-10-27).
#> Package under active development; core functions subject to change.There are many approaches to choosing the smoothing weights for conditional empirical likelihood, and there is no consensus on the optimality of one or another weighting scheme.
Recall that in the SEL approach, the empirical likelihood of the data in the observed sample is maximised subject to the constraint \(\mathbb{E}(\rho(Z, \theta) \mid X) = 0\). Assume for simplicity that the dimension of the moment function \(\rho\) is 1. The optimisation problem is therefore finding the optimal discrete distribution that enforces a sample analogue of this conditional moment restriction for any parameter \(\theta\): \[ \max_{0 < p_{ij} < 1} \sum_{i=1}^n \sum_{j=1}^n w_{ij} \log p_{ij}, \] where \(\sum_j p_{ij} = 1\) for \(i=1,\ldots,n\) and \(\sum_j p_{ij}\rho(Z_j, \theta) = 0\) for \(i=1,\ldots,n\).
The solution to this problem has a closed form: \[ \hat p_{ij} = \frac{w_{ij}}{1 + \hat\lambda_i (\theta) \rho(Z_j, \theta)}, \] where the Lagrange multipliers \(\hat\lambda_i\) are the solution to the equality \(\sum_{j=1}^n \frac{w_{ij} \rho(Z_j, \theta)}{1 + \lambda_i(\theta) \rho(Z_j, \theta)} = 0\) (the first-order condition for maximising the weighted empirical likelihood), found numerically.
Typically, \(w_{ij}\) are kernel weights. They can be obtained in a plethora of approaches:
Theoretically, approaches (3) and (4) should work the best because it ensures that there are enough observations for smoothing, and that there is neither over- or under-smoothing. The latter two phenomena are a problem:
Consider kernel weights of the form \[ w_{ij} := k\left( \frac{X_i - X_j}{b} \right), \] where \(k\) is a kernel function that is zero outside \([-1, 1]\), integrates to one, and is symmetric and non-negative within that interval. The most popular kernel functions are uniform, triangular (Bartlett), quadratic (Epanechnikov), quartic, and Gaussian.
Using a fixed bandwidth for SEL smoothing is the simplest choice, yet it can be sub-optimal for many reasons. In the regions where the observations are dense, the empirical likelihood may be over-smoothed, whereas in low-density regions, it may be undersmoothed, creating problems for the spanning condition.
The obvious ‘crutch’ to fix the spanning condition failure is the use
of some sort of extrapolation of the SELR function outside the convex
hull. The present package offers three options for the EL()
function: chull.fail = "taylor" that extrapolates its
branches quadratically following a Taylor expansion after a cut-off
point, or chull.fail = "wald" that replaces the SELR
function halfway between the outermost observations with a smooth
transition to a Wald approximation, or chull.fail = "none"
for returning -Inf. This option is experimental; envisioned
are such options as 2nd- or 4th-order log-approximation. Adjusted EL
(AEL) and Balanced AEL (BAEL) can be invoked with
chull.fail = "adjusted" (Chen et al.
(2008)), chull.fail = "adjusted2" (Liu and Chen (2010) improvement), and
chull.fail = "balanced" (Emerson and
Owen (2009)).
x <- c(-3, -2, 2, 3, 4)
ct <- c(10, 4:1)
grid.full <- c(seq(-3.5, 5.5, length.out = 201))
grid.keep <- grid.full <= -2.5 | grid.full >= 3.5
selr0 <- sapply(grid.full, function(m) -2*EL0(x, m, ct)$logelr)
selr1 <- -2*ExEL1(x, mu = grid.full, ct = ct, exel.control = list(xlim = c(-2.5, 3.5)))
selr2 <- -2*ExEL2(x, mu = grid.full, ct = ct, exel.control = list(xlim = c(-2.5, 3.5)))
plot(grid.full, selr0, ylim = c(0, 120), xlim = c(-3.5, 4.5), bty = "n",
main = "-2 * weighted log-EL", ylab = "", type = "l")
points(grid.full[grid.keep], selr1[grid.keep], col = 4, pch = 0, cex = 0.75)
points(grid.full[grid.keep], selr2[grid.keep], col = 2, pch = 2, cex = 0.75)
rug(x, lwd = 2)
abline(v = c(-2.5, 3.5), lty = 3)
legend("top", c("Taylor", "Wald"), title = "Extrapolation type", col = c(4, 2),
pch = c(0, 2), bty = "n")These extrapolations may certainly remedy the predicament of having negative infinity for the log-ELR statistic, which does not get accepted by most numerical optimisers – but they should not be relied upon. It is normal to have ‘bad’ values for ‘bad’ guesses during the optimisation process, but at the initial value and at the optimum, the spanning condition should hold.
This bandwidth is also known as the ‘baloon’ / sample-point adaptive
bandwidth. Define a function that chooses such a bandwidth that there be
ceiling(d/n) nearest neighbours for each observation, where
d is the span in (0, 1). It is assumed that
the kernel in question has bounded support and is zero outside \([-1, 1]\). This version works for multiple
dimensions as well.
pickMinBW <- function(X, d = 2 / sqrt(NROW(X)), tol = 1e-12) {
X <- as.matrix(X)
n <- nrow(X)
p <- ncol(X)
k <- ceiling(d * n)
k <- max(1, min(k, n - 1))
# has.ms <- requireNamespace("matrixStats", quietly = TRUE)
b <- sapply(seq_len(n), function(i) {
# L_inf distances from X_i
A <- abs(sweep(X, 2, X[i, ], "-", check.margin = FALSE))
# Di <- if (has.ms) matrixStats::rowMaxs(A) else do.call(pmax, as.data.frame(A))
Di <- do.call(pmax, as.data.frame(A))
Di[i] <- Inf # Exclude self
if (k < n - 1) {
return(sort(Di, partial = k + 1)[k + 1])
} else {
return(sort(Di, partial = n - 1)[n - 1])
}
})
b * (1 - tol)
}n <- 50
set.seed(1)
X <- sort(rchisq(n, df = 3))
Y <- 1 + X + (rchisq(n, df = 3) - 3) * (1 + X)
mod0 <- lm(Y ~ X)
vhat0 <- kernelSmooth(X, mod0$residuals^2, bw = max(diff(X))*1.2, kernel = "epanechnikov")
mod1 <- lm(Y ~ X, weights = 1 / vhat0)
plot(X, Y, bty = "n")
abline(c(1, 1), lty = 2)
abline(mod0, col = 1); abline(mod1, col = 2)cbind(OLS = mod0$coefficients, WOLS = mod1$coefficients)
#> OLS WOLS
#> (Intercept) 2.0887343 1.4469218
#> X 0.2858387 0.5054064These values are relatively far away from the truth.
Define four smoothing matrices:
bw0 <- bw.CV(X, kernel = "epanechnikov")
bw0 <- max(bw0, max(diff(X))*1.1)
wF <- kernelWeights(X, bw = bw0, kernel = "epanechnikov")
# Assuming Gaussian CDF, which is not true
Xs <- scale(X)
XP <- pnorm(Xs)
wP <- kernelWeights(XP, bw = bw.CV(XP), kernel = "epanechnikov")
rowMeans(wP > 0) - 1/n
#> [1] 0.14 0.16 0.20 0.24 0.26 0.28 0.28 0.28 0.26 0.26 0.28 0.28 0.28 0.28 0.30
#> [16] 0.26 0.24 0.24 0.22 0.22 0.22 0.22 0.22 0.18 0.16 0.14 0.10 0.04 0.06 0.06
#> [31] 0.08 0.12 0.14 0.12 0.16 0.18 0.22 0.22 0.18 0.18 0.16 0.14 0.16 0.22 0.14
#> [46] 0.12 0.12 0.12 0.12 0.10
bw.adapt <- pickMinBW(X, d = 0.16)
plot(X, bw.adapt, bty = "n", main = "Adaptive bandwidth ensuring 15% non-zero weights",
ylim = c(0, max(bw.adapt)))wA <- kernelWeights(X, bw = bw.adapt, kernel = "epanechnikov")
rowMeans(wA > 0) - 1/n
#> [1] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [16] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [31] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [46] 0.16 0.16 0.16 0.16 0.16
rX <- rank(X)
wNN <- kernelWeights(rX/max(rX), bw = 0.09, kernel = "epanechnikov")
rowMeans(wNN > 0) - 1/n
#> [1] 0.08 0.10 0.12 0.14 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [16] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [31] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [46] 0.16 0.14 0.12 0.10 0.08g <- function(theta, ...) Y - theta[1] - theta[2]*X
wF <- wF / rowSums(wF)
wP <- wP / rowSums(wP)
wNN <- wNN / rowSums(wNN)
wA <- wA / rowSums(wA)
g1 <- function(theta) smoothEmplik(rho = g, theta = theta, data = NULL, sel.weights = wF, minus = TRUE, type = "EuL")
g2 <- function(theta) smoothEmplik(rho = g, theta = theta, data = NULL, sel.weights = wP, minus = TRUE, type = "EuL")
g3 <- function(theta) smoothEmplik(rho = g, theta = theta, data = NULL, sel.weights = wNN, minus = TRUE, type = "EuL")
g4 <- function(theta) smoothEmplik(rho = g, theta = theta, data = NULL, sel.weights = wA, minus = TRUE, type = "EuL")
th1 <- optim(mod1$coefficients, g1, method = "BFGS", control = list(ndeps = rep(1e-5, 2), reltol = 1e-5))
th2 <- optim(mod1$coefficients, g2, method = "BFGS", control = list(ndeps = rep(1e-5, 2), reltol = 1e-5))
th3 <- optim(mod1$coefficients, g3, method = "BFGS", control = list(ndeps = rep(1e-5, 2), reltol = 1e-5))
th4 <- optim(mod1$coefficients, g4, method = "BFGS", control = list(ndeps = rep(1e-5, 2), reltol = 1e-5))
cbind(WOLS = mod1$coefficients, Fixed = th1$par, PIT = th2$par, NNeighb = th3$par, Adapt = th4$par)
#> WOLS Fixed PIT NNeighb Adapt
#> (Intercept) 1.4469218 1.6509595 1.4592058 1.6426623 0.8895890
#> X 0.5054064 0.4119324 0.5326892 0.3026498 0.6117477n <- 200
set.seed(1)
X <- matrix(rchisq(n*3, df = 3), ncol = 3)
Y <- 1 + rowSums(X) + (rchisq(n, df = 3) - 3) * (1 + rowSums(X))
mod0 <- lm(Y ~ X)
vhat0 <- kernelSmooth(X, mod0$residuals^2, PIT = TRUE, bw = 0.2,
kernel = "epanechnikov", no.dedup = TRUE)
mod1 <- lm(Y ~ X, weights = 1 / vhat0)
cbind(OLS = mod0$coefficients, WOLS = mod1$coefficients)
#> OLS WOLS
#> (Intercept) 4.1866397 0.16631928
#> X1 0.6251463 0.05875198
#> X2 0.7362969 0.99137344
#> X3 0.2928678 0.75248132These values are also far away from the truth.
Construct the same four smoothing matrices:
bw0 <- 1
atleast2 <- FALSE
while(!atleast2) {
wF <- kernelWeights(X, bw = bw0, kernel = "epanechnikov")
bw0 <- bw0 * 1.1
atleast2 <- min(rowSums(wF > 0)) > 1
}
rowMeans(wF > 0) - 1/n
#> [1] 0.535 0.390 0.070 0.570 0.070 0.665 0.260 0.645 0.555 0.630 0.555 0.635
#> [13] 0.075 0.685 0.700 0.245 0.665 0.175 0.340 0.225 0.430 0.580 0.595 0.035
#> [25] 0.255 0.360 0.565 0.575 0.430 0.430 0.280 0.330 0.545 0.585 0.525 0.575
#> [37] 0.730 0.145 0.645 0.095 0.280 0.585 0.560 0.600 0.565 0.490 0.640 0.600
#> [49] 0.270 0.545 0.375 0.545 0.055 0.790 0.605 0.685 0.675 0.660 0.675 0.550
#> [61] 0.395 0.025 0.820 0.595 0.295 0.760 0.310 0.565 0.660 0.435 0.465 0.460
#> [73] 0.825 0.365 0.695 0.035 0.580 0.500 0.675 0.585 0.620 0.575 0.115 0.640
#> [85] 0.675 0.180 0.195 0.095 0.035 0.220 0.515 0.570 0.755 0.015 0.555 0.715
#> [97] 0.330 0.635 0.120 0.600 0.680 0.595 0.545 0.650 0.655 0.610 0.590 0.470
#> [109] 0.530 0.665 0.645 0.045 0.670 0.025 0.135 0.680 0.260 0.680 0.650 0.355
#> [121] 0.175 0.445 0.555 0.665 0.115 0.260 0.460 0.680 0.455 0.065 0.495 0.090
#> [133] 0.585 0.625 0.635 0.005 0.010 0.515 0.700 0.215 0.405 0.180 0.030 0.190
#> [145] 0.555 0.300 0.565 0.040 0.520 0.640 0.580 0.640 0.215 0.300 0.640 0.285
#> [157] 0.660 0.485 0.380 0.325 0.710 0.495 0.500 0.480 0.555 0.625 0.345 0.690
#> [169] 0.330 0.615 0.400 0.665 0.185 0.665 0.545 0.325 0.465 0.645 0.555 0.170
#> [181] 0.645 0.065 0.625 0.495 0.620 0.535 0.480 0.130 0.600 0.735 0.060 0.120
#> [193] 0.280 0.535 0.670 0.260 0.550 0.595 0.410 0.310
min(rowSums(wF > 0))
#> [1] 2
Xs <- scale(X)
XP <- apply(Xs, 2, pnorm)
wP <- kernelWeights(XP, bw = bw.rot(XP)*2, kernel = "epanechnikov")
rowMeans(wP > 0) - 1/n
#> [1] 0.140 0.045 0.025 0.155 0.020 0.105 0.035 0.245 0.115 0.165 0.120 0.310
#> [13] 0.065 0.075 0.170 0.040 0.060 0.085 0.065 0.065 0.105 0.115 0.180 0.070
#> [25] 0.095 0.050 0.190 0.145 0.055 0.115 0.105 0.030 0.090 0.145 0.065 0.120
#> [37] 0.120 0.065 0.170 0.055 0.035 0.060 0.155 0.135 0.225 0.160 0.155 0.160
#> [49] 0.080 0.115 0.085 0.185 0.030 0.130 0.135 0.210 0.115 0.245 0.255 0.215
#> [61] 0.105 0.020 0.095 0.240 0.035 0.155 0.045 0.185 0.165 0.100 0.090 0.095
#> [73] 0.040 0.055 0.070 0.065 0.190 0.065 0.100 0.125 0.175 0.200 0.045 0.140
#> [85] 0.080 0.060 0.035 0.035 0.080 0.070 0.175 0.155 0.030 0.025 0.225 0.145
#> [97] 0.065 0.130 0.055 0.130 0.320 0.105 0.190 0.135 0.170 0.155 0.180 0.065
#> [109] 0.060 0.225 0.210 0.020 0.185 0.065 0.015 0.280 0.050 0.210 0.170 0.110
#> [121] 0.040 0.055 0.185 0.195 0.070 0.060 0.060 0.095 0.125 0.025 0.180 0.030
#> [133] 0.195 0.165 0.250 0.035 0.080 0.130 0.155 0.080 0.050 0.085 0.015 0.060
#> [145] 0.150 0.050 0.155 0.025 0.125 0.190 0.225 0.195 0.045 0.060 0.245 0.065
#> [157] 0.285 0.065 0.125 0.045 0.230 0.175 0.170 0.100 0.185 0.180 0.070 0.195
#> [169] 0.085 0.240 0.045 0.175 0.030 0.260 0.110 0.125 0.125 0.105 0.205 0.045
#> [181] 0.055 0.050 0.100 0.105 0.170 0.165 0.050 0.060 0.185 0.085 0.025 0.070
#> [193] 0.065 0.110 0.075 0.090 0.215 0.220 0.075 0.095
min(rowSums(wP > 0))
#> [1] 4
bw.adapt <- pickMinBW(X, d = 0.16)
plot(rowMeans(X), bw.adapt, bty = "n", main = "Adaptive bandwidth ensuring 15% non-zero weights",
ylim = c(0, max(bw.adapt)))wA <- kernelWeights(X, bw = bw.adapt, kernel = "epanechnikov")
rowMeans(wA > 0) - 1/n
#> [1] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [16] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [31] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [46] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [61] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [76] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [91] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [106] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [121] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [136] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [151] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [166] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [181] 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16
#> [196] 0.16 0.16 0.16 0.16 0.16
min(rowSums(wA > 0))
#> [1] 33g <- function(theta, ...) Y - drop(cbind(1, X) %*% theta)
wF <- wF / rowSums(wF)
wP <- wP / rowSums(wP)
wA <- wA / rowSums(wA)
g1 <- function(theta) smoothEmplik(rho = g, theta = theta, data = NULL, sel.weights = wF,
minus = TRUE, type = "EuL")
g2 <- function(theta) smoothEmplik(rho = g, theta = theta, data = NULL, sel.weights = wP,
minus = TRUE, type = "EuL")
g4 <- function(theta) smoothEmplik(rho = g, theta = theta, data = NULL, sel.weights = wA,
minus = TRUE, type = "EuL")To save speed (30 seconds), we do not run the following fragment; the results are provided below.
# th1 <- optim(mod1$coefficients, g1, method = "BFGS", control = list(ndeps = rep(1e-5, 4)))
# th2 <- optim(mod1$coefficients, g2, method = "BFGS", control = list(ndeps = rep(1e-5, 4)))
# th4 <- optim(mod1$coefficients, g4, method = "BFGS", control = list(ndeps = rep(1e-5, 4)))
# round(cbind(True = rep(1, 4), WOLS = mod1$coefficients, Fixed = th1$par, PIT = th2$par, Adapt = th4$par), 3)
# True WOLS Fixed PIT Adapt
# (Intercept) 1 0.166 2.755 3.668 0.110
# X1 1 0.059 0.132 0.241 0.790
# X2 1 0.991 1.036 0.694 0.929
# X3 1 0.752 0.625 -0.161 0.863This simulation emphasises the importance of careful bandwidth choice in efficient estimation via smoothing the empirical or Euclidean likelihood. Nearest-neighbour methods work well, whereas adaptive-bandwidth methods are comparable to them, but also possess lucrative theoretical proofs such as integrability.