---
title: "Optimization methods"
author: "Jorge Cabral"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 4
link-citations: yes
bibliography: references.bib
csl: american-medical-association-brackets.csl
description: |
Different optimization options.
vignette: >
%\VignetteIndexEntry{Optimization methods}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

## Optimization
In the ["Generalized Maximum Entropy framework"](V2_GME_framework.html)
vignette a solution for the constrained optimization problem was
proposed to compute $\widehat{\boldsymbol{\beta}}^{GME}$. However, a
more efficient way is to solve the unconstrained dual formulation of the
problem, which is a function of $\hat {\boldsymbol\lambda}$. The
resulting function, $\mathcal{M}(\boldsymbol{\lambda})$, is referred to
as the minimal value function @Golan1996.\
Let\
\begin{equation}
\Omega_k \left( \widehat{\boldsymbol\lambda}\right) = \sum_{m=1}^M exp(-z_{km}\sum_{n=1}^N \hat\lambda_n x_{nk})
\end{equation} and \begin{equation}
\Psi_n \left( \widehat{\boldsymbol\lambda}\right) = \sum_{j=1}^J exp(-\hat\lambda_n v_{n}).
\end{equation}
The primal–dual relationship is
\begin{equation}
\underset{\mathbf{p},\mathbf{w}}{\operatorname{argmax}}
\left\{-\mathbf{p}' \ln \mathbf{p} - \mathbf{w}' \ln \mathbf{w} \right\} = \underset{\boldsymbol{\lambda}}{\operatorname{argmin}}
\left\{ \mathbf{y}'\boldsymbol{\lambda} - \sum_{k=1}^{K+1}log\left( \Omega_k(\boldsymbol{\lambda})\right) - \sum_{n=1}^{N}log\left( \Psi_n(\boldsymbol{\lambda})\right)\right\} \equiv \mathcal{M}(\boldsymbol{\lambda}).
\end{equation}
$\mathcal{M}(\boldsymbol{\lambda})$, may be interpreted as a constrained
expected log-likelihood function. Estimation is considerably simplified
using this dual version. The analytical gradient of the dual problem is
\begin{equation}
\nabla_{\boldsymbol{\lambda}}\mathcal{M} = \mathbf{y}-\mathbf{XZp} - \mathbf{Vw}
\end{equation}
$\boldsymbol{\hat\lambda}$ is then used to solve
\begin{equation}
\hat p_{km} = \frac{exp(-z_{km}\sum_{n=1}^N \hat\lambda_n x_{nk})}{\Omega_k(\boldsymbol{\lambda})}
\end{equation}
and
\begin{equation}
\hat w_{nj} = \frac{exp(-\hat\lambda_n v_{n})}{\Psi_n(\boldsymbol{\lambda})}.
\end{equation}
$\widehat{\boldsymbol{\beta}}^{GME}$ and $\boldsymbol{\hat \epsilon}$ are given by
\begin{equation}
\widehat{\boldsymbol{\beta}}^{GME}=\mathbf{Z\hat p}
\end{equation}
and
\begin{equation}
\boldsymbol{\hat \epsilon}=\mathbf{V\hat w}.
\end{equation}
## Examples
Different numerical optimization options will be explained using
`dataGCE` (see ["Generalized Maximum Entropy
framework"](V2_GME_framework.html#Examples)).
```{r,echo=FALSE,eval=TRUE}
load("GCEstim_Optim.RData")
```
```{r,echo=TRUE,eval=TRUE}
library(GCEstim)
```
```{r,echo=FALSE,eval=TRUE}
coef.dataGCE <- c(1, 0, 0, 3, 6, 9)
```
Assume that $z_k^{upper}=50$, $k\in\left\lbrace 0,\dots,6\right\rbrace$
is a "wide upper bound" for the signal support space and set $M=5$ and
$J=3$.\
The primal version of the optimization problem previously defined can be
solved by the optimization methods:
- [Rsolnp package](https://cran.r-project.org/package=Rsolnp) [@Ye1987, @Ghalanos2015]
- `primal.solnp`, using Sequential Quadratic Programming (SQP);
- [NlcOptim package](https://CRAN.R-project.org/package=NlcOptim) @Chen2019
- `primal.solnl`, using the augmented Lagrange multiplier
method with an SQP interior algorithm.
```{r,echo=TRUE,eval=FALSE}
res.lmgce.50.primal.solnp <-
GCEstim::lmgce(
y ~ .,
data = dataGCE,
support.signal = c(-50, 50),
twosteps.n = 0,
method = "primal.solnp"
)
res.lmgce.50.primal.solnl <-
GCEstim::lmgce(
y ~ .,
data = dataGCE,
support.signal = c(-50, 50),
twosteps.n = 0,
method = "primal.solnl"
)
```
Both methods converged which can be confirmed by the $0$ returned by
```{r,echo=TRUE,eval=TRUE}
res.lmgce.50.primal.solnp$convergence
res.lmgce.50.primal.solnl$convergence
```
The dual version of the optimization problem can be solved by the
optimization methods:
- [optim
package](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/optim.html)
- `dual.BFGS`, **default**, using Broyden-Fletcher-Goldfarb-Shanno
quasi-Newton method [@Broyden1970; @Fletcher1970; @Goldfarb1970; @Shanno1970];\
- `dual.CG`, a conjugate gradients method @Fletcher1964;\
- `dual.L-BFGS-B`, using a box-constrained optimization with
limited-memory modification of the BFGS quasi-Newton method @Byrd1995;\
- [optimx package](https://CRAN.R-project.org/package=optimx) @Nash2011
- `dual.Rcgmin`, using conjugate gradient minimization of nonlinear functions
with box constraints using Dai/Yuan update @Dai2001 (see [Rcgmin](https://rdrr.io/rforge/Rcgmin/));\
- `dual.bobyqa`, using a derivative-free optimization by quadratic
approximation @Powell2009 (see
[minqa](https://CRAN.R-project.org/package=minqa) @Bates2024);\
- `dual.newuoa`, using a derivative-free optimization by quadratic
approximation @Powell2009 (see
[minqa](https://CRAN.R-project.org/package=minqa) @Bates2024);\
- `dual.nlminb`, unconstrained and box-constrained optimization using
PORT routines @Gay1990 (see
[nlminb](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/nlminb.html));\
- `dual.nlm`, using a Newton-type algorithm [@Dennis1983; @Schnabel1985] (see
[nlm](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/nlm.html));\
- [lbfgs package](https://CRAN.R-project.org/package=lbfgs) @Coppola2022
- `dual.lbfgs`, using the Limited-memory Broyden-Fletcher-Goldfarb-Shanno;\
- [lbfgsb3c package](https://CRAN.R-project.org/package=lbfgsb3c) @Fidler2024
- `dual.lbfgsb3c`, using L-BFSC-B implemented in Fortran code and
with an Rcpp interface;
```{r,echo=TRUE,eval=FALSE}
res.lmgce.50.dual.BFGS <-
GCEstim::lmgce(
y ~ .,
data = dataGCE,
support.signal = c(-50, 50),
twosteps.n = 0,
method = "dual.BFGS"
)
res.lmgce.50.dual.CG <-
GCEstim::lmgce(
y ~ .,
data = dataGCE,
support.signal = c(-50, 50),
twosteps.n = 0,
method = "dual.CG"
)
```
Any error or warning produced by the optimization procedure results in
outputting $1$. Note that `dual.BFGS` converged while `dual.CG` did not. Conjugate gradient methods will generally be more fragile but may be successful in much larger optimization problems.
```{r,echo=TRUE,eval=TRUE}
res.lmgce.50.dual.BFGS$convergence
res.lmgce.50.dual.CG$convergence
```
The values of $\boldsymbol{\hat\lambda}$, $\hat {\mathbf{p}}$ and
$\hat {\mathbf{w}}$ for `dual.BFGS` are, respectively,
```{r,echo=TRUE,eval=TRUE}
res.lmgce.50.dual.BFGS$lambda
```
```{r,echo=TRUE,eval=TRUE}
res.lmgce.50.dual.BFGS$p
```
```{r,echo=TRUE,eval=TRUE}
res.lmgce.50.dual.BFGS$w
```
The following table shows a comparison of efficiency between all the
optimization methods available. Note that in the current version, the
default optimization parameters of each dependency package are used.
```{r,echo=TRUE,eval=FALSE}
method.opt <-
c(
"primal.solnl",
"primal.solnp",
"dual.BFGS",
"dual.CG",
"dual.L-BFGS-B",
"dual.Rcgmin",
"dual.bobyqa",
"dual.newuoa",
"dual.nlminb",
"dual.nlm",
"dual.lbfgs",
"dual.lbfgsb3c"
)
compare_methods <- data.frame(
method = method.opt,
time = NA,
r.squared = NA,
error.measure = NA,
error.measure.cv.mean = NA,
beta.error = NA,
nep = NA,
nep.cv.mean = NA,
convergence = NA)
```
```{r,echo=TRUE,eval=FALSE}
for (i in 1:length(method.opt)) {
start.time <- Sys.time()
res.method <-
lmgce(
y ~ .,
data = dataGCE,
support.signal = c(-50,50),
twosteps.n = 0,
method = method.opt[i]
)
compare_methods$time[i] <- difftime(Sys.time(),
start.time,
units = "secs")
compare_methods$r.squared[i] <- summary(res.method)$r.squared
compare_methods$error.measure[i] <- res.method$error.measure
compare_methods$error.measure.cv.mean[i] <- res.method$error.measure.cv.mean
compare_methods$beta.error[i] <- accmeasure(coef(res.method), coef.dataGCE)
compare_methods$nep[i] <- res.method$nep
compare_methods$nep.cv.mean [i] <- res.method$nep.cv.mean
compare_methods$convergence[i] <- res.method$convergence
}
compare_methods_ordered <- compare_methods[order(compare_methods$time),]
compare_methods_ordered$convergence <- factor(compare_methods_ordered$convergence,
levels = c(0,1),
labels = c(TRUE, FALSE))
```
```{r, echo=FALSE,eval=TRUE,results = 'asis'}
kableExtra::kable(
compare_methods_ordered[, c(1,2,4,5,6,9)],
digits = 3,
align = "rcccc",
col.names = c("optimization method",
"time (s)",
"$RMSE_{\\widehat{y}}$",
"$CV \\text{-} RMSE_{\\widehat{y}}$",
"$RMSE_{\\widehat{\\beta}}$",
"Convergence"),
row.names = FALSE,
booktabs = F)
```
## References
::: {#refs}
:::
## Acknowledgements
This work was supported by Fundação para a Ciência e Tecnologia (FCT)
through CIDMA and projects
and .