---
title: "Miscellaneous tools"
date: today
date-format: long
bibliography: ../inst/REFERENCES.bib
vignette: >
  %\VignetteIndexEntry{Miscellaneous tools}
  %\VignetteEngine{quarto::pdf}
  %\VignetteEncoding{UTF-8}
---


# Select a subset of coefficients

Complex models may include a large set of coefficients. Consider for
example the estimation of a probit model with instrumental
variables. The exogenous covariates are denoted $x_1$, the endogenous
covariates $e$ and the external instruments $x_2$.  The density that
enters the log-likelihood function is:

$$
f(y_n \mid d_n) = \Phi\left(q_n\frac{\theta ^ \top u_n}{\sigma} \right) = 
\Phi\left(q_n\frac{\gamma ^ \top z_n + \sigma_{\epsilon\nu}
^ \top \Sigma_\nu ^ {-1}(e_n - \Pi w_n)}{\sqrt{\sigma_\epsilon ^ 2 -
\sigma_{\epsilon\nu} ^\top \Sigma_\nu ^ {-1} \sigma_{\epsilon\nu}}}\right)
$$

where $z^\top = (x_1 ^ \top, e ^ \top)$ are the covariates, $w^\top =
(x_1^\top, x_2^\top)$ the instruments, and $e_n - \Pi x_n$ the error
vector of the reduced form equation of the endegounous variable.

In this case, coefficients belong to four different groups:

- the coefficients associated with covariates (either exogenous or
  endogenous), which are the coefficients of main interest ($\gamma$),
- the coefficients associated with the errors of the reduced form
  equation ($\sigma_{\epsilon\nu}$),
- the coefficients of the reduced form equation ($\Pi$),
- the matrix of covariance of the errors of the structural equations
  ($\Sigma_\nu$).
  
Actually, a Cholesky decomposition of $\Sigma_\nu^{-1} = C ^ \top C$
is used so that the second set of coefficients are $\rho ^ * = C
\sigma_{\epsilon\nu}$ and the fourth set are the elements of $C$. The
four sets are respectively denoted `"covariates"`, `"resid"`,
`"instruments"` and `"chol"`.
  
We use as an example the `federiv` data set where the response is a
dummy for foreign exchange derivatives use. The model is fitted using
`binomreg` with a two-part formula which defines the covariates and
the instruments:

```{r }
#| warning: false
library(micsr)
bank <- binomreg(federiv ~ eqrat + optval +  mktbk +
                      perfor + dealdum | . - eqrat - optval +
                      no_emp + no_subs + no_off,
                  data = federiv, method = "ml")
```

The full set of coefficients is stored as a named vector in the
`coefficients` element of `bank`:


```{r }
names(bank$coefficients)
```

Therefore, there are a total of 25 coefficients. Note tat some of them
are prefixed by `"rho_"`, some other by `"instr_` and some other are
named as the concatenation of two series separated by `"|"`. They form
respectively the second, third, and fourth groups of parameters. This
structure is stored in the `npar` element of `bank`:

```{r }
bank$npar
```

This is a named vector, the names being the name of the groups of
parameters and the values the number of parameters for each group, in
their order of appearance in `bank$coefficients`. This vector has
an attribute called `"default"` which indicates which group, by
default, should be considered in the printing of the model. Therefore,
while using `coef`, `vcov` or `summary` methods, by default only the
two first groups will be considered:

```{r }
coef(bank)
```

but the subset argument can be filled in order to indicate a specific
set of parameters:

```{r }
coef(bank, subset = c("chol", "resid"))
```

The default value of `subset` is `NA` and in this case the default set
of parameters is considered. If `subset` is set to `NULL` or `"all"`, all the
parameters are returned

```{r }
#| eval: false
coef(bank, subset = NULL)
```

Once the groups of parameters have been define using `subset`, a more
precise selection can be using either a vector of names of
coefficients or a regular expression, using respectively the `coef`
and the `grep` arguments. For example, to select only the
coefficients of `mktbk` and `perfor`:

```{r }
coef(bank, coef = c("mktbk", "perfor"))
```

To select all the coefficients that contains `"eqrat"` or `"perfor"`:

```{r }
coef(bank, grep = "eqrat|perfor")
```

Note that only the parameters of the default subset of coefficients
are considered.

```{r }
coef(bank, grep = "eqrat|perfor", subset = c("instruments", "chol"))
```

Finally, the `invert` argument (by default `FALSE`) can be used to
return the coefficients that don't match regular expression. For
example, to return all the coefficients of the reduced form equations
of the instruments except the intercepts:

```{r }
coef(bank, subset = "instruments", grep = "Intercept", invert = TRUE)
```

# Covariance matrix and standard errors

The `vcov` method for `micsr` objects return the covariance matrix. A
subset of coefficients can be selected using the same syntax as previously:

```{r }
vcov(bank, subset = "resid")
```

The `vcov` argument enables to use different flavors of covariance
matrices:

- `"info"` uses the information matrix,
- `"hessian"` uses the hessian,
- `"opg"` uses the outer product of the gradient,
- `"hc"` uses the heteroscedastic consistent estimor.

All these estimators are not available for all the models. If the
argument `vcov` is left empty, the default behavior is to choose
`"info"` if it exists, then `"hessian"` if not and `"opg"` if the
previous two estimators are not available. Using `vcov = "hc"` results
in an internal call to the `sandwich::vcovHC` function:

```{r }
vcov(bank, subset = "resid")
vcov(bank, subset = "resid", vcov = "hessian")
vcov(bank, subset = "resid", vcov = "opg")
vcov(bank, subset = "resid", vcov = "hc")
```
**micsr** provides the convenient `stder` function to extract the
standards errors of the coefficients. It can be used with the same
argument as the `vcov` method:

```{r }
stder(bank, subset = "resid")
stder(bank, vcov = "hc", subset = "resid")
```

# Dummies

Categorial variables are stored with **R** using **factors** with a
limited set of **levels**. For the purpose of estimation, using
`model.matrix`, factors are translated into a set of contrats. The most
common contrast is **treatment** contrast, it consists on generating a
set of dummy variables for every level, except the first. However, it
is sometimes easier to work directly with the dummy variables, and the
`micsr::dummy` enables to create simply such dummies. Consider for
example the `charitable` data set that contains two factors,
`education` and `religion`. `dummy` can be used with a data frame as
first arguments and factors for which one wants to create dummies as
further unnamed arguments:

```{r }
charitable |> dummy( education, religion) |> head(2)
```

Two further named boolean arguments (`FALSE` by default) are
available:

- `keep` to indicate whether the factor should be kept in the
  resulting data frame,
- `ref` to indicate whether a dummy should be generated for the
  reference level.
  
```{r }
charitable |> dummy(religion, keep = TRUE) |> head(2)
charitable |> dummy(religion, ref = TRUE) |> head(2)
```
  
# Short summary

**R** is a not a verbose language as every function returns an object
that can be stored by the user. Then, special `print` methods (or
`print.summary` methods) are used to print some of the
result. However, the output is quite heavy and take too much place to
be used in a document. For this reason, `micsr` provides a `gaze`
function with several methods to print the results of a model or of a
test. Consider for example a t-test for charitable donations with two
subsamples defined by the `married` dummy variable:

```{r }
t.test(donation ~ married, charitable)
```
The result of `t.test` is an object of class `htest` and the `print`
method results in 12 lines, including 2 blank lines. `gaze` returns a
one line result, with the essential informations:

```{r }
#| collapse: true
t.test(donation ~ married, charitable) |> gaze()
```

For a fitted model, `gaze` returns the table of coefficients for all
the fitted parameters and the value of the log-likelihood:


```{r }
tobit1(log(donation / 25) ~ married, charitable) |> gaze()
```