\documentclass[11pt]{article}


% \VignetteEngine{knitr::knitr}
%\VignetteIndexEntry{The nieve package: Yet Another Extreme Value package?}
%\VignetteEncoding{UTF-8}

%% ===========================================================================
%% Copyright Yves Deville <deville.yves@alpestat.com> 2022
%% ===========================================================================

\usepackage{amsmath,amssymb,amsthm}
\usepackage[english]{babel}
\usepackage{hyperref}
\usepackage[T1]{fontenc} 
\usepackage{fullpage}
\usepackage{natbib}
\setcitestyle{author}
\usepackage{hyperref}

\definecolor{MonVert}{rgb}{0.398,0.801,0.000} 
\definecolor{MonVertF}{rgb}{0.13,0.54,0.13}
\definecolor{MonBleu}{rgb}{0.000,0.602,0.801} 
\definecolor{SteelBlue2}{rgb}{0.359375,0.671875,0.9296875}
\definecolor{orange}{rgb}{1.0,0.6470,0.0}
\definecolor{SteelBlue4}{rgb}{0.212, 0.392, 0.545}
\definecolor{orange1}{rgb}{0.996,0.645,0}
\newcommand{\m}{\mathbf}   
\newcommand{\bs}{\boldsymbol}
\newcommand{\pkg}[1]{\textbf{#1}}
\newcommand{\code}[1]{\texttt{#1}}
\newcommand{\Gre}[1]{{\color{MonVertF}#1}}

\title{\bf \Large \textbf{nieve}: Yet Another Extreme Value package?}

\author{Yves Deville \href{mailto:deville.yves@alpestat.com}%%
  {deville.yves@alpestat.com} }

%% \date{}

\begin{document}
\maketitle{}
\tableofcontents{}

\section{Probability functions of Extreme-Value distributions}
  
The probability functions for the Generalized Pareto (GP) and
Generalized Extreme Value (GEV) distributions \citep{Coles_ISMEV} are
of ubiquitous use in Extreme Value (EV) analysis. These functions
depend smoothly on the parameters: they are infinitely differentiable
functions of the parameters. However, these functions are not
\textit{analytic} functions of the parameters and a singularity exists
for all of them when the shape parameter, say $\xi$, is zero.  In
practice, the functions are given with different formulas depending on
whether the shape parameter~$\xi$ is zero or not; the formulas for
$\xi = 0$ relate to the exponential and Gumbel distributions and
correspond to the limit for $\xi \to 0$ of the functions given by the
formulas for $\xi \neq 0$.  As an example, consider the quantile
function of the Generalized Pareto distribution with shape $\xi$ and
unit scale
\begin{equation}
\label{eq:Quant}
q(p) = \begin{cases}
  [(1 - p)^{-\xi} - 1] / \xi & \xi \neq 0 \\
  -\log(1 - p) & \xi = 0,
\end{cases} \qquad 0 < p < 1.
\end{equation}
It can be shown that for $\xi \approx 0$ whatever be $p$
$$
q \approx - \log(1-p), \qquad
\frac{\partial q}{\partial \xi} \approx \frac{1}{2}\, \log^2(1-p), \qquad
\frac{\partial^2 q}{\partial \xi^2} \approx -\frac{1}{3}\, \log^3(1-p).
$$
It is quite easy to obtain expressions for the derivatives
w.r.t. $\xi$ using the definition (\ref{eq:Quant}). We can even rely
on the symbolic differentiation method \verb@D@ available in R which,
as opposed to me and many other humans, never makes any mistake when
differentiating.

<<deriv, size= "footnotesize">>=
qEx <- function(p, xi) ((1 - p)^(-xi) - 1) / xi
dqEx <- D(expression(((1 - p)^(-xi) - 1) / xi), name = "xi")
d2qEx <- D(dqEx, name = "xi")
p <- 0.99; pBar <- 1 - p
xis <- c(1e-4, 1e-5, 1e-6, 1e-7, 1e-8, 1e-9)
for (xi in xis) {
    r <- rbind("ord 0" = c("lim" = -log(pBar), "der" = qEx(p = p, xi = xi)),
               "ord 1" = c("lim" = log(pBar)^2 / 2, "der" = eval(dqEx, list(p = p, xi = xi))),
               "ord 2" = c("lim" = -log(pBar)^3 / 3, "der" = eval(d2qEx, list(x = p, xi = xi))))
    cat("xi = ", xi, "\n")
    print(r)
}
@ 

\noindent
We see that the formula for the function works fine. However, the
formula for the 2-nd order derivative can be completely wrong when
$\xi$ is about $\text{1e-6}$ and the formula for the 1-st order
derivative can also be wrong when $\xi$ is about
$\text{1e-8}$. Although evaluated at a reasonably small value of
$\xi$, the derivative given in the column \code{der} differs much from
its limit in column~\code{lim}. The reason is that the formulas for
the derivatives involve difference and/or fractions or small
quantities because~$\xi$ or $\xi^2$ comes at the denominator. As a
general rule, the derivatives with higher order are more difficult to
evaluate numerically, since they involve more complex expressions.
Note that using a shape $\xi$ with $|\xi| \leqslant \text{1e-6}$ is
quite common in EV analysis because the values of $\xi$ used in
practice are often quite small, and moreover very small values of
$\xi$ are often used in the initialization of the Maximum-Likelihood
(ML) optimization.

Let us now see what \pkg{nieve} tells about the derivatives. These can
be found as attributes of the result returned by the \verb@qGPD2@
function, with names \verb@"gradient"@ and \verb@"hessian"@. If the
formal argument \code{p} is a vector with length \code{n}, the
attributes are arrays with dimension \code{c(n, p)} and \code{c(n, p,
  p)} where \code{p} stands for the number of parameters, here \code{p
  = 2}. The arrays of derivatives have suitable \verb@dimnames@ hence
can be indexed with characters as in \verb@H[1 , "scale", "shape"]@ if
\verb@H@ is the Hessian found in the corresponding attribute.

<<deriv2, size= "footnotesize">>=
library(nieve)
for (xi in xis) {
    x <- qGPD2(p, shape = xi, deriv = TRUE, hessian = TRUE)
    r <- rbind("ord 0" = c("lim" = -log(pBar), "der" = x),
               "ord 1" =  c("lim" = log(pBar)^2 / 2, "der" = attr(x, "gradient")[1, "shape"]),
               "ord 2" = c("lim" = -log(pBar)^3 / 3, "der" = attr(x, "hessian")[1, "shape", "shape"]))
    cat("xi = ", xi, "\n")
    print(r) 
}
@
\noindent
So, no more major departures from the theory can be seen. 

Although not yet widespread, the use of the exact formulas for the
derivatives w.r.t. the parameters can be of great help in the
optimization tasks required in EV analysis. These tasks of course
involve the ML estimation, but also profile likelihood inference for
models with covariates. Differential equations methods can be also
used to derive confidence intervals. Note that the use of formulas for
the derivatives is called \textit{symbolic} differentiation and
differs from \textit{automatic} differentiation as increasingly
available. However, from the previous analysis it transpires that the
cure might be worse than the disease.  Unless the derivatives are
evaluated with great care, they can merely ruin an optimization in
which they are involved, instead of improving it.

Our strategy consists in fixing a small $\epsilon >0$ and use the
formulas for $\xi \neq 0$ only when $|\xi| > \epsilon$. When
$|\xi| \leqslant \epsilon$, we use a few terms from the Taylor series
at $\xi =0$ e.g.,
$$
q(p;\,\xi) \approx q(p;\, 0) +
\left. \partial_{\xi} q(p;\, \xi)\right|_{\xi = 0} \times \xi
+ \frac{1}{2} \,
\left.\partial^2_{\xi,\xi} q(p;\, \xi)\right|_{\xi = 0}
\times \xi^2 + o(\xi^2).
$$
In order to maintain the consistency between the derivatives, it seems
good to use the same $\epsilon$ for all the derivatives and use
consistent Taylor approximations, so use the order $1$ for the
derivative~$\partial_{\xi} q$ and the order order~$0$
for~$\partial^2_{\xi,\xi} q$ or for a crossed derivative such
as~$\partial^2_{\sigma,\xi} q$. Since the $2$-nd order derivatives can
be required, we must take a value for $\epsilon$ which is not too
small: $\text{1e-4}$ or $\text{1e-5}$, not much smaller. Note
that~$\epsilon$ gives the level of error for the $2$-nd order
derivative; since the error on the function is~$O(\xi^3)$, using
$\epsilon = \text{1e-4}$ leads to an error of order~$\text{1e-12}$ on
the function, which seems acceptable in practice. This kind of
approximation is used in some codes of the \textbf{revdbayes} package
by Paul Northrop, see the code if the \code{dgev} and \code{pgev}
functions on
\href{https://github.com/paulnorthrop/revdbayes/blob/master/R/distributions.R}{GitHub
  repos}.

\section{Deriving the formulas}

The technical reports provided with \textbf{nieve}:
\href{https://github.com/yvesdeville/nieve/blob/main/inst/computing/GEV.pdf}{GEV.pdf},
\href{https://github.com/yvesdeville/nieve/blob/main/inst/computing/GPD2.pdf}{GPD2.pdf},
\href{https://github.com/yvesdeville/nieve/blob/main/inst/computing/PoisGP2PP.pdf}{PoisGP2PP.pdf}
and
\href{https://github.com/yvesdeville/nieve/blob/main/inst/computing/PP2PoisGP.pdf}{PP2PoisGP.pdf}
give the exact expressions for the first-order and the second-order
derivatives of the probability functions w.r.t. the parameters and
also provide workable approximations for the case $\xi \approx 0$. We
used the \href{https://maxima.sourceforge.io/}{\pkg{Maxima} Computer Algebra
  System}~\citep{Maxima} along with the
\href{https://maxima.sourceforge.io/contrib/maxiplot/maxiplot.sty}{\pkg{maxiplot}}
package for \LaTeX{}. The technical reports follow the following rules.

\begin{itemize}
\item The {\color{MonVertF} \bf raw expressions given by Maxima are
    reported in green}. The expressions can be regarded as exact, not
  being influenced by manual computations. However these formulas are
  usually difficult to use in a compiled code and require some manual
  transformation for this aim.

\item The {\color{red} \bf simplified expressions are reported in
    red}.  These expressions are derived by us from the raw
  expressions; they are influenced by manual computations hence could
  in principle contain errors, although they have been carefully
  checked. These formulas are used to write the compiled code.  They
  often use auxiliary variables that are shared across several
  formulas.
  
\end{itemize}

\section{Testing the derivatives}

The \textbf{nieve} package comes with a series of tests in the format
of the \textbf{testthat} package. The \textbf{numDeriv} package is
used to compute the derivatives by numeric differentiation up to the
order~2; these derivatives are compared to those provided by the
formulas.

A quite difficult task when checking derivatives is to give a
threshold used to decide if the difference between the numeric
derivative and the symbolic derivative, say the ``error'', is
acceptable or not. This error has two sources: one is the numerical
evaluation of the symbolic derivative or of its approximation for
$\xi \approx 0$ (see example above), and the other is the
approximation used in numeric differentiation where the limit defining
the derivative is replaced by a finite difference.  We should use
small values of $\xi$ with $|\xi| < \epsilon$ to check that the
approximation for small~$\xi$ is correct, although we can only test
the approximation at the first order by doing so. Then, with a good
choice of $\epsilon$ the error should be mainly due to the numeric
differentiation. But when the true derivative is small, the relative
error may be large (think of a true derivative which is exactly zero).
On the other hand, when a derivative is large in absolute value, the
absolute error may also be quite large.  Mind that the derivatives can
in practice be very small or very large, and also that a gradient
vector or a Hessian matrix often contain values that are not of the
same order of magnitude.

We check that either the \textit{absolute} error or the
\textit{relative} error is small. The idea is that none of these two
things can come by chance, and if one holds, the formula used must be
good even if the other criterion suggests an opposed conclusion. The
test is made \textit{elementwise}, meaning that the relative error is
computed for each element of a gradient vector or Hessian matrix
ignoring the other elements.

\section{Parameterizations for Peaks Over Threshold models}

Peaks Over Threshold (POT) models are very popular in EV
analysis. These models relate to marked Poisson Process: at each time
$T_i$ in a sequence or random times $T_1 < T_2 < \dots$ we observe a
random variable $Y_i$ called the \textit{mark}. In applications, the
time $T_i$ is typically for the occurrence a storm and the mark $Y_i$
can be an amount of precipitation or a sea level.  The following two
frameworks are commonly used, see~\cite{NorthropEtAl_NSExtremes} for
more details.

\begin{itemize}
\item The \textit{Poisson-GP} framework involves a given threshold $u$
  having the same physical dimension as the marks $Y_i$. It assumes
  that the $T_i$ form an homogeneous Poisson Process with rate
  $\lambda_u$, and that $Y_i$ are i.i.d. with a Generalized Pareto
  distribution $\text{GPD}(u, \, \sigma_u, \,\xi)$ or equivalently
  that the so-called \textit{excesses} over the threshold $Y_i -u$
  follow the two-parameter GP distribution with scale $\sigma >0$ and
  shape $\xi$. The marks $Y_i$ are further assumed to be independent
  of the event process $\{T_i\}_i$. The parameters form the vector
  $\bs{\theta}_u=[\lambda_u,\, \sigma_u,\, \xi]$.

  \begin{figure}
    \centering
    \label{BlockMaxima}
    \includegraphics[width=0.8\textwidth]{images/POTAggreg.pdf}
    \caption{\sf \small Block maxima by aggregation of the Poisson-GP.}
  \end{figure}
  
\item An alternative framework is often named \textit{Point Process}
  (PP) or \textit{Non-Homogeneous Poisson Process} (NHPP). It involves
  a reference duration $w>0$, usually chosen to be one year, and a
  vector of parameters
  $\bs{\theta}^\star := [\mu_w^\star,\,\sigma_w^\star,\, \xi^\star]$.
  The random observations $[T_i,\, Y_i]$ are given by a Poisson
  process on the $(t,\,y)$-plane with intensity
  $$
  \gamma^\star_w(t,\, y) :=
  \frac{1}{w} \times \frac{1}{\sigma^\star_w}\, \left[1 +
    \xi^\star \, \frac{y- \mu^\star_w}{\sigma^\star_w} \right]^{-1 / \xi^\star -1} \,
  1_{\mathcal{S}_{\bs{\theta}^\star_w}}(t,\, y),
  $$
  where $\mathcal{S}_{\bs{\theta}^\star_w}$ is the domain of the plane
  $$
  \mathcal{S}_{\bs{\theta}_w^\star} := \left\{ [t,\, y]: \:
    \sigma_w^\star + \xi^\star [y - \mu_w^\star] > 0 \right\}.
  $$
  This is a half-plane for $\xi^\star \neq 0$ and the whole plane for
  $\xi^\star = 0$. 
\end{itemize}

If we take the observation region for the PP model as being the
product of the time interval $(0, \, t^\dag)$ by the $y$-interval
$(u, \, \infty)$ where the threshold~$u$ is such that
$\sigma_w^\star + \xi^\star [u - \mu_w^\star] > 0$, we get the same
model as the Poisson-GP on $(0, \, t^\dag)$, up to a
re-parameterization. As an interesting feature of the PP parameter
$\bs{\theta}^\star_w$, it does not depend on the threshold~$u$. It
relates to the distribution of the maximum $M$ of the marks $Y_i$
corresponding to a time interval with duration~$w$, see
Figure~\ref{BlockMaxima}. This distribution is indeed essentially the
$\text{GEV}(\mu_w^\star, \, \sigma_w^\star,\,\xi^\star)$, up to an
atom corresponding to the possibility that no mark is observed during
the time interval. We can then define $M:= -\infty$ since this is
arguably the maximum of the empty set, and the probability of the
corresponding event is $\exp\{- \lambda_u w\}$. We may speak of
$\text{GEV}(\mu_w^\star, \, \sigma_w^\star,\,\xi^\star)$ as the
\textit{GEV reference distribution} in relation to~$w$, although this
is not exactly the distribution of a maximum~$M$ over the reference
duration.

For a given threshold $u$ and a given reference duration $w>0$, the
one-to-one correspondence between the vectors $\bs{\theta}_u$ and
$\bs{\theta}_w^\star$ is given by
\begin{equation}
  \label{eq:poisGP2PP}
  \left\{
    \begin{array}{c c l}
      \mu^\star_w &=& u + \frac{(\lambda_u w)^\xi - 1}{\xi} \, \sigma_u, \\
      \sigma^\star_w &=& (\lambda_u w)^\xi \, \sigma_u, \rule{0pt}{1em}\\
      \xi^\star &=& \xi, \rule{0pt}{1em}
    \end{array}
  \right.
\end{equation}
the fraction $[(\lambda_u w)^\xi - 1]/\xi$ of the first equation being to
be replaced for $\xi = 0$ by its limit $\log(\lambda_u w)$. The
reciprocal transformation is
\begin{equation}
  \label{eq:PP2poisGP}
  \left\{
    \begin{array}{c c l}
      \sigma_u &=& \sigma_w^\star + \xi^\star \left[ u - \mu_w^\star \right],\\
      \lambda_u &=& w^{-1} \, \left[\sigma_u / \sigma_w^\star \right]^{-1/ \xi^\star},
                    \rule{0pt}{1.1em}\\
      \xi &=& \xi^\star, \rule{0pt}{1.1em}  \\
    \end{array}
  \right.
\end{equation}
where the second equation becomes $\lambda_u = w^{-1}$ for
$\xi^\star = 0$.
 
\begin{figure}
  \centering
  \includegraphics[width=0.8\textwidth]{images/TwoParameterisationsC.pdf}
  \caption{\sf \small Two parameterizations for POT models.}
\end{figure}

When a vector $\m{x}$ of covariates can be used, two different kinds
of so-called \textit{non-stationary} POT models can be obtained by
relating or ``linking'' the three parameters to the covariates, either
parametrically or non-parametrically. For instance the Poisson-GP
scale or its logarithm can be specified as having the parametric form
$\m{x}^\top \bs{\beta}^\sigma$ where $\bs{\beta}^\sigma$ is a vector
of parameters. Different forms of models arise from the two
frameworks. Moreover, parametric Poisson-GP models relate to a
specific threshold since the same form of link can not persist when
the threshold is changed. Remind also that the threshold should in
general depend on the covariates.  Anyway, for non-stationary models
it is often useful to transform one of the two parameterizations into
the other. If a Poisson-GP model is used, we may be interested in the
GEV reference distribution conditional on a given value $\m{x}$. When
instead a PP model is used, it may be useful to investigate the
relation of the implied rate $\lambda_u$ with the covariates, and
possibly to compare the POT estimate with a non-parametric estimate.


The \textbf{nieve} package provides the two
transformations~(\ref{eq:poisGP2PP}) and~(\ref{eq:PP2poisGP}) required
for all these tasks --~along with their Jacobian, under the names
\verb@poisGP2PP@ and \verb@PP2poisGP@. As for the probability
functions, the singularity for $\xi = 0$ or $\xi^\star = 0$ is coped
with by using a second-order Taylor approximation. As often required
when coping with non-stationary POT models, the functions
\verb@poisGP2PP@ and \verb@PP2poisGP@ are vectorized w.r.t. their
arguments including \verb@threshold@ for the \verb@PP2poisGP@
transformation. Each element $i$ in the provided vector arguments
(such as \verb@lambda@) correspond to a value $\m{x}_i$ of the
covariates, and most often the threshold is then also chosen as
depending on the covariates, hence used via a vector with $n$ elements
$u_i = u(\m{x}_i)$. The condition
$$
    \sigma^\star_i + \xi^\star_i [u_i - \mu^\star_i] > 0, \qquad i=1,\, \dots,\, n
$$
should then hold. There does not seem to exist any motivation for
using a reference duration~$w$ depending on $i$, hence the function
\code{poisGP2PP} only accepts a length-one argument~\code{w}.

\section{EV distributions from other R packages}

The EV distributions are implemented in many R packages. A variety of
strategies regarding the problem $\xi = 0$ can be found. We now
describe these strategies and provide for each of them a ``code''
(shown as framed: \fbox{NT}, ...) that is used in Table~\ref{TabXi},
columns $\xi = 0$. Each strategy is briefly discussed.

\begin{enumerate}
  
\item \fbox{NT} Use only the formula for $\xi \neq 0$ i.e., {``do
    nothing''}.  In practice, an optimization or sampling algorithm
  will never come to the case $\xi = 0$ \textit{exactly} and this can
  only happen when the user gives this value e.g., as an initial
  value.  There will be some numerical problems when $\xi$ is very
  small, say $\xi = \text{1e-14}$ or less. These problems are not so
  crucial for the usual probability functions: we get some wiggling
  when plotting the curves and zooming. Mind however that the random
  generation functions\footnote{Usually given names such as
    \code{"rgev"} or \code{"rgpd"}.} will produce silly results with a
  very small $\xi$ if they are based on the corresponding
  quantile functions.
  
\item \fbox{$0.0$} Test the exact equality $\xi = 0$, and if this is true,
  \textit{switch to the exponential/Gumbel formula}.  This helps only
  when the user gives \code{xi = 0.0}, but we are essentially doing
  the same thing as in \fbox{NT}.

\item  \fbox{$\epsilon$/S}  Test the equality $ |\xi| \leqslant \epsilon$ where
  $\epsilon >0$ is very small, and if this is true, \textit{switch
    to the exponential/Gumbel formula}. So this produces a (very
  small) discontinuity.  E.g., \pkg{Renext} uses
  $\epsilon \approx \text{2e-14}$.
  
\item \fbox{$\epsilon$/AI} Test the equality
  $ |\xi| \leqslant \epsilon$ where $\epsilon >0$ is very small, and
  if this is true, use a dedicated \textit{approximation or
    interpolation}.  Several methods can be used including Taylor
  approximations. The discontinuity should then be undetectable.  Mind
  the probability functions although not being \textit{analytic}
  functions, are infinitely differentiable $C^\infty$ w.r.t. the
  parameters.
  
\end{enumerate}

Note that some of the cited packages are quite old:
\pkg{evir}~\citep{pack_evir}, \pkg{evd}~\citep{pack_evd},
\pkg{ismev}~\citep{pack_ismev}, \pkg{Renext}~\citep{pack_Renext},
\pkg{POT}~\citep{pack_POT} and
\pkg{SpatialExtremes}~\cite{pack_SpatialExtremes}. The packages
\pkg{revdbayes}~\citep{pack_revdbayes} and \pkg{mev}~\citep{pack_mev}
are more recent. See the CRAN Task View on Extreme Value Analysis
\citep{TV_ExtremeValue} for an extended list of packages devoted to
EV. Also it is worth mentioning that the \pkg{extRemes}
package~\citep{pack_extRemes} optionally uses the exact gradient of
the log-likelihood for models with GEV and GP margins but the
derivatives are coded (in R) only for internal use in optimization
tasks.

\begin{table}
  \centering \small
  \begin{tabular}{| l || c | c | c | c | c || c | c | c | c | c |}
    \hline
    \multicolumn{1}{|c||}{\raisebox{-0.3em}{Package}}
    & \multicolumn{5}{c||}{GEV}
    & \multicolumn{5}{c|}{GPD}\\ \cline{2-6} \cline{7-11}
    \multicolumn{1}{|c||}{}
    & \multicolumn{1}{c|}{Lang.}
    & \multicolumn{1}{c|}{Vec. $\bs{\theta}$}
    & \multicolumn{1}{c|}{Grad.}
    & \multicolumn{1}{c|}{Hess.}
    & \multicolumn{1}{c||}{$\xi=0$}
    & \multicolumn{1}{c|}{Lang.}
    & \multicolumn{1}{c|}{Vec. $\bs{\theta}$}
    & \multicolumn{1}{c|}{Grad.}
    & \multicolumn{1}{c|}{Hess.}
    & \multicolumn{1}{c|}{$\xi=0$} \\ \hline\hline
    \textbf{evir}
    & R & no & no & no & NT & R & no & no & no & NT\\ \hline
    \textbf{evd}
    & R & no & no & no & 0.0 & R & no & no & no & 0.0\\ \hline
    \textbf{ismev}
    & R$^\star$ & no & no & no & NT & R$^\star$ & no & no & no & NT\\ \hline
    \textbf{Renext}
    &  &  &  &  &  & R & no & yes & yes & $\epsilon$/S \\ \hline
    \textbf{POT}
    &  &  &  &  &  & R$^\star$ & yes & no & no & 0.0 \\ \hline
    \textbf{SpatialExtremes}
    & R  & yes & no & no & $0.0$ & R & yes & no & no & $0.0$ \\ \hline   
    \textbf{revdbayes}
    & R  & yes & no & no & $\epsilon$/AI & R & yes & no & no & $\epsilon$/AI \\ \hline
    \textbf{mev}
    & R  & yes & yes$^\star$ & yes$^\star$& NT & R & yes & yes$^\star$
    & yes$^\star$ & NT \\ \hline
    \Gre{\textbf{nieve}}
    & \Gre{C} & \Gre{yes} & \Gre{yes} & \Gre{yes} & \Gre{$\epsilon$/AI}   
    & \Gre{C} & \Gre{yes} & \Gre{yes} & \Gre{yes} & \Gre{$\epsilon$/AI} \\
    \hline
  \end{tabular}
  \caption{\label{TabXi}\small \sf Features of some CRAN
    packages. \textit{Lang.}: the implementation language,
    \textit{Vec.}  $\bs{\theta}$: vectorized w.r.t. the
    parameters. The columns \textit{Grad.} and the \textit{Hess.}
    indicate if the gradient and Hessian are provided, and the
    columns $\epsilon = 0$ indicate the strategy used to cope with a
    zero or small shape, as described in the text. A star $\star$
    means that the functions are not exported.}
\end{table}

\section*{Acknowledgments}

The \pkg{nieve} package was partly funded by the French
\href{https://www.irsn.fr/}{Institut de Radioprotection et Sûreté
  Nucléaire (IRSN)} and some of the code formerly was part of R
packages owned by IRSN/Behrig.

We are grateful to the authors and contributors of \textbf{Maxima} and
to the authors of the \textbf{maxiplot} \LaTeX{} package (J.M. Planas
and José Manuel Mira univ. de Murcia, Spain) which helped much for the
tedious computations required by the package.

\bibliographystyle{apalike}
\bibliography{nieve}

\end{document}