In contrast to logistic regression in log-binomial regression, separation is not the only factor which decides if the maximum likelihood estimate (MLE) has infinite components. As we will see in the example below, for the log-binomial regression model (LBRM) there exist data configurations which are separated but the MLE still exists.
The MLE of the LBRM can be obtained by solving the following optimization problem. \[ \begin{equation} \begin{array}{rl} \underset{\beta}{\textrm{maximize}} & \ell(\beta) = \displaystyle\sum_{i = 1}^n y_i ~ X_{i*} \beta + (1 - y_i) ~ \log(1 - \exp(X_{i*} \beta)) \ \ \ \ \textrm{s.t.} & X \beta \leq 0. \end{array} \end{equation} \]
From the optimization problem we can already guess that the different behavior with regard to the existence of the MLE is caused by the linear constraint \(X \beta \leq 0\).
Let \(X^0\) be the submatrix of \(X\) obtained by keeping only the rows \(I^0 = \{i|y_i = 0\}\) and \(X^1\) the submatrix obtained by keeping only the rows \(I^1 = \{i|y_i = 1\}\). Schwendinger, Grün, and Hornik (2021) pointed out that the finiteness of the MLE can be checked by solving the following linear optimization problem.
\[
\begin{equation}
\begin{array}{rll}
  \underset{\beta}{\text{maximize}}~~  &
    - \sum_{i \in I^0} X_{i*} \beta \\
  \text{subject to}~~  &
    X^0 \beta \leq 0 \\
  & X^1 \beta = 0.
\end{array}
\end{equation}
\] The MLE has only finite components if the solution of this
linear program is a zero vector. If the MLE contains infinite
components, the linear programming problem is unbounded. The function
detect_infinite_estimates() from the
detectseparation implements the LP problem described
above and can therefore be used to detect infinite components in the MLE
of the LBRM.
library("detectseparation")To show the different effect of separation on the logistic regression model compared to the LBRM consider the following data.
data <- data.frame(a = c(1, 0, 3, 2, 3, 4),
                   b = c(2, 1, 1, 4, 6, 8),
                   y = c(0, 0, 0, 1, 1, 1))Clearly the data is separated, which can be verified by using the
detect_separation method (a detailed explanation of the
output can be found in Section 3).
glm(y ~ a + b, data = data, family = binomial("logit"), method = "detect_separation")
#> Implementation: ROI | Solver: lpsolve 
#> Separation: TRUE 
#> Existence of maximum likelihood estimates
#> (Intercept)           a           b 
#>        -Inf        -Inf         Inf 
#> 0: finite value, Inf: infinity, -Inf: -infinitySince separation is a property of the data, checking for separation gives the same result for the logistic regression and the LBRM.
glm(y ~ a + b, data = data, family = binomial("log"), method = "detect_separation")
#> Data separation in log-binomial models does not necessarily result in infinite estimates
#> Implementation: ROI | Solver: lpsolve 
#> Separation: TRUE 
#> Existence of maximum likelihood estimates
#> (Intercept)           a           b 
#>           0           0           0 
#> 0: finite value, Inf: infinity, -Inf: -infinityFor logistic regression separation is necessary and sufficient that the MLE contains infinite components.
glm(y ~ a + b, data = data, family = binomial("logit"), method = "detect_infinite_estimates")
#> Implementation: ROI | Solver: lpsolve 
#> Infinite estimates: TRUE 
#> Existence of maximum likelihood estimates
#> (Intercept)           a           b 
#>        -Inf        -Inf         Inf 
#> 0: finite value, Inf: infinity, -Inf: -infinityHowever, due to the linear constraint of the LBRM, there exists data allocations where the MLE does exist (i.e., has only finite components), despite the fact that the data is separated.
glm(y ~ a + b, data = data, family = binomial("log"), method = "detect_infinite_estimates")
#> Implementation: ROI | Solver: lpsolve 
#> Infinite estimates: FALSE 
#> Existence of maximum likelihood estimates
#> (Intercept)           a           b 
#>           0           0           0 
#> 0: finite value, Inf: infinity, -Inf: -infinityUsing glm to solve this problem we get the following
error message.
fit <- try(glm(y ~ a + b, data = data, family = binomial("log")))
#> Error : no valid set of coefficients has been found: please supply starting valuesThe error message means that we should provide starting values, a simple but reliable approach is to use \((-1, 0, ..., 0)\) as starting value.
Since in this example one of the constraints \(X_{i*} \beta \leq 0\) is by design binding,
the iteratively re-weighted least squares (IRLS) method used by the
glm function has convergence problems. The glm
function informs us about the convergence problems by issuing some
warnings. The warnings
#> Warning: step size truncated: out of bounds
#> Warning: glm.fit: algorithm stopped at boundary valuetell us that IRLS is not the best option for optimization problems with binding constraints. The warning
#> Warning: glm.fit: algorithm did not convergetell us that the algorithm did not converge. Practically in most cases this just means the default value for the maximum number of iterations should be increased.
args(glm.control)
#> function (epsilon = 1e-08, maxit = 25, trace = FALSE) 
#> NULLSince the default value for the maximum number of iterations is quite
low (maxit = 25). However, maxit = 25 is
typically high enough for unbounded optimization problems (almost all
models supported by glm), but is often to low for the LBRM.
For most data sets setting maxit = 10000 when estimating
LBRMs is high enough.
formula <- y ~ a + b
start <- c(-1, double(ncol(model.matrix(formula, data = data)) - 1L))
ctrl = glm.control(epsilon = 1e-8, maxit = 10000, trace = FALSE)
suppressWarnings(
  fit <- glm(formula, data = data, family = binomial("log"), start = start, control = ctrl)
)
summary(fit)
#> 
#> Call:
#> glm(formula = formula, family = binomial("log"), data = data, 
#>     start = start, control = ctrl)
#> 
#> Deviance Residuals: 
#>        1         2         3         4         5         6  
#> -0.82961  -0.83830  -0.40220   1.28265   0.90697   0.00003  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)
#> (Intercept)  -1.6452     1.0333  -1.592    0.111
#> a            -0.4462     1.2580  -0.355    0.723
#> b             0.4287     0.6421   0.668    0.504
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 8.3178  on 5  degrees of freedom
#> Residual deviance: 4.0205  on 3  degrees of freedom
#> AIC: 10.021
#> 
#> Number of Fisher Scoring iterations: 29We can verify that one of the constraints \(X_{i*} \beta \leq 0\) is binding (i.e., \(X_{i*} \beta = 0\) for at least one \(i\)) by multiplying the coefficients with the model matrix.
print(mm <- drop(model.matrix(formula, data) %*% coef(fit)))
#>             1             2             3             4             5 
#> -1.233885e+00 -1.216457e+00 -2.554908e+00 -8.225899e-01 -4.112950e-01 
#>             6 
#> -5.010219e-10
abs(drop(mm)) < 1e-6
#>     1     2     3     4     5     6 
#> FALSE FALSE FALSE FALSE FALSE  TRUEglm(y ~ a + b, data = data, family = binomial("logit"), method = "detect_separation")
#> Implementation: ROI | Solver: lpsolve 
#> Separation: TRUE 
#> Existence of maximum likelihood estimates
#> (Intercept)           a           b 
#>        -Inf        -Inf         Inf 
#> 0: finite value, Inf: infinity, -Inf: -infinityThe output above provides much information:
Separation: TRUE indicates that the data is
separated.Inf and -Inf indicate
that the underlying optimization problem is unbounded and therefore the
MLE does not exist for the logistic regression model.glm(y ~ a + b, data = data, family = binomial("log"), method = "detect_infinite_estimates")
#> Implementation: ROI | Solver: lpsolve 
#> Infinite estimates: FALSE 
#> Existence of maximum likelihood estimates
#> (Intercept)           a           b 
#>           0           0           0 
#> 0: finite value, Inf: infinity, -Inf: -infinityThe output above provides much information:
Infinite estimates: FALSE indicates that the
MLE has only finite components (the MLE exists).0 which again indicates that
the MLE has only finite components and therefore underlying optimization
problem is bounded.For the LBRM the glm function often requires the users
to specify starting values. Valid starting values have to reside in the
interior of the feasible region (\(X \beta
< 0\)). There have been different methods suggested for
finding valid starting values.
If an intercept is present in the estimation
(-1, 0, ..., 0) will always provide valid starting
values.
find_start_simple <- function(formula, data) {
  c(-1, double(ncol(model.matrix(formula, data = data)) - 1L))  
}
find_start_simple(formula, data)
#> [1] -1  0  0
max(model.matrix(formula, data = data) %*% find_start_simple(formula, data))
#> [1] -1Andrade and Leon Andrade (2018) suggest a hot start method, by using the modified estimation result of a Poisson model with log link as starting values. Again this method only works if an intercept is used.
find_start_poisson <- function(formula, data, delta = 1) {
  b0 <- coef(glm(formula, data, family = poisson(link = "log")))
  mX <- -model.matrix(formula, data = data)[, -1L, drop = FALSE]
  b0[1] <- min(mX %*% b0[-1]) - delta
  b0
}
find_start_poisson(formula, data)
#> (Intercept)           a           b 
#>  -3.5447461  -0.5364793   0.5863329
max(model.matrix(formula, data = data) %*% find_start_poisson(formula, data))
#> [1] -1One can also solve an LP to find valid start values or think of other
strategies. However, for the benchmark examples reported in Schwendinger, Grün, and Hornik (2021) we found no conclusive evidence
that one of these initialization methods outperforms the others.
Therefore, my personal favorite is the simple approach
(-1, 0, ..., 0).