---
title: "Multiseason models in flocker"
author: "Jacob Socolar"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Multiseason models in flocker}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
Multiseason occupancy models, also called dynamic occupancy models, are models
that assume [closure](https://jsocolar.github.io/closureOccupancy/) over units
that are linked by explicit colonization and extinction
processes.^[As we will see, this is true even of multiseason models
with autologistic occupancy dynamics] That is, sites are surveyed over a series
of more than one timestep,^[It's fine if some sites are surveyed for just one
timestep, but at least some sites are surveyed for multiple timesteps.] closure
is assumed to hold within (but not across) timesteps, and the latent occupancy
state is linked across timesteps within a site via models of local colonization
and extinction processes. Following the terminology introduced
[here](https://jsocolar.github.io/flocker/articles/flocker_tutorial.html), we
will refer to the units over which closure is assumed as *closure-units* or just
plain *units*. Groups of units that are linked by colonization-extinction
dynamics (i.e. units representing different timesteps within a site) are
*series*.
## Multiseason models as HMMs
In `flocker`, we implement these models using a hidden markov model (HMM)
approach to dealing with the unobserved occupancy state in each separate series.
Calling our approach a HMM doesn't fundamentally change the model; it serves
mainly to highlight the connection between multiseason occupancy models and
well-developed techniques for efficiently computing the likelihood for a
series.^[For the curious, we use the forward algorithm to compute the
likelihood, and the forward-backward algorithm to compute unit-wise posterior
distributions for the latent occupancy state, and the forward-filtering-backward-sampling
algorithm to draw valid posterior samples for the sequence of latent states]
The HMM conceptualizes the latent occupancy state as a Markov-like process (the
process need not be strictly Markovian, because the transition probabilities can
vary by timestep), where the transition probabilities between the two possible
states (occupied or unoccupied) are a colonization probability $\mathcal{C}$,
an extinction probability $\mathcal{E}$, and their complements $1 - \mathcal{C}$
and $1 - \mathcal{E}$. The complement of the extinction probability turns out to
be particularly important; we call it the *persistence* probability
$\mathcal{P} = 1 - \mathcal{E}$.
To specify the HMM, we need a model for the $\mathcal{C}$, a model for
$\mathcal{E}$, and a model for the initial occupancy probability $\mathcal{O}$ in
the first timestep. We also need a model for the probability of observing the data in
any particular timestep conditional on the true occupancy state in that
timestep (in the HMM literature, this probability is referred to as an
*emission probability*).
#### Emission probabilities
*Emission probabilities* are just fancy HMM terminology for the most familiar
part of multiseason occupancy model. They are the probability of observing the
data within a unit given the latent occupancy state, and they depend only on the
detection sub-model, which is no different from a single-season detection
sub-model. In `flocker`, we assume that the logit-scale detection probabilities
vary according to a suite of covariates that can vary across all levels of
organization, including down to the visit level. This is exactly the same way
that `flocker` treats the detection sub-model in a single-season model.
#### Colonization and extinction
In "colonization-extinction" models, logit colonization probabilities
$\mathcal{C}$ and logit extinction probabilities $\mathcal{E}$ are both assumed
to vary according to separate predictors based on covariates that may vary
across timesteps within series but never across visits within closure-units.
In "autologistic" models, the occupancy probability in timestep $t + 1$ is
assumed to depend on a suite of site characteristics plus a term that depends on
the latent true occupancy state at time $t$. Some treatments write the
autologistic model in terms of an "occupancy probability" that depends on
covariates plus the "autologistic term" that gets added on the logit scale when
the site was previously occupied. However, it is easier to understand this
model in terms of its colonization and extinction probabilities. In fact, the
so-called "occupancy probability" mentioned above is actually a colonization
probability, because it reflects the probability of occupancy conditional on
non-occupancy during the previous timestep. The logit-scale sum of the
colonization probability and the autologistic term gives the probability of
persistence $\mathcal{P}$.
Thus, the autologistic model differs from the colonization-extinction model in
that it assumes that colonization and extinction probabilities are related to
one another via an offset that relates colonization to persistence. If the
offset is modeled as constant, then we can think of this model as ascribing a
"suitability" to each site that defines the position of a pair of probabilities
$\mathcal{C}$ and $\mathcal{P}$ that differ by a constant on the logit scale. In
practice, $\mathcal{P}$ is almost always larger than $\mathcal{C}$, and so the
offset is positive.
In addition to the standard autologistic model that treats the offset as a
constant, `flocker` also affords the flexibility to model the offset via its
own covariate-based predictor, which reintroduces flexibility in the
relationship between $\mathcal{P}$ and $\mathcal{C}$, but tends to retain the
idea that these probabilities ought to be related to one another and ought to
tend to move in tandem.
#### Initial occupancy
Most examples of multiseason occupancy models in the literature include an
explicit model for the initial occupancy probability $\mathcal{O}$ via its own
covariate-based predictor, which is analogous to the occupancy predictor
in a single-season model. Thus, a colonization-extinction model would contain
predictors for detection, colonization, extinction, and initial
occupancy; an autologistic model would contain linear predictors for detection,
colonization, the colonization-persistence offset (often modeled as a constant),
and initial occupancy. `flocker` can fit either model.
Borrowing from the HMM literature, `flocker` also comes with optional
functionality to handle $\mathcal{O}$ differently. Any pair of time-invariant
colonization and extinction probabilities defines an equilibrium proportion of
sites that are occupied. If colonization and extinction remain constant for a
long time prior to the first timestep, the initial occupancy probability should
simply be this equilibrium frequency, eliminating the need to parameterize
and fit initial occupancy separately from colonization and extinction.
`flocker` is equipped to fit models without any separate predictor for initial
occupancy, and instead to begin with the equilibrium frequencies implied by the
colonization and extinction probabilities modeled for the first timestep.
Note that this approach is sometimes valid even if the colonization and
extinction probabilities begin to vary during the timeseries being modeled,
as long as they are invariant for a substantial period up to and including the
first year modeled. As a concrete example, consider a study system that begins
in an equilibrium condition, but in which some sites burn in a forest fire
partway through the study. If the impact of the burn on colonization and
extinction probabilities is modeled adequately,
then the modeled probabilities prior to the burn will reflect the equilibrium
colonization and extinction probabilities in the pre-burn state. It is for this
reason that, when using equilibrium frequencies as initial occupancy
probabilities, `flocker` always computes the equilibrium based on the
probabilities modeled for the first timestep.
As an aside, we somewhat regularly encounter confusion that conflates initial
occupancy probabilities $\mathcal{O}$ with colonization probabilities
$\mathcal{C}$. This confusion apparently arises from the notion that the
autologistic model consists of a single-season occupancy model plus an
autologistic term, which leads people to incorrectly surmise that the linear
predictor without the autologistic term gives an occupancy probability. It does
not; it gives a colonization probability, which is typically much lower. Any
modeling exercise that enforces equality between initial occupancy probabilities
and subsequent colonization probabilities should be viewed skeptically.
## Examples
Let's simulate some data that is valid for use with multiseason models. Here,
we will simulate data for three seasons with one unit covariate and one event
covariate. The data will be simulated under a colonization-extinction model with
explicit inits, but we will be able to fit other models (autologistic,
equilibrium inits) to the same data.^[note that `simulate_flocker_data()` can
also simulate directly from these other model types.]
```{r simulate, eval = FALSE}
library(flocker)
multi_data <- simulate_flocker_data(
n_season = 3,
n_pt = 300,
n_sp = 1,
multiseason = "colex",
multi_init = "explicit",
seed = 1
)
fd <- make_flocker_data(
multi_data$obs,
multi_data$unit_covs,
multi_data$event_covs,
type = "multi",
quiet = TRUE
)
```
Here's the colonization-extinction model with an explicit model for occupancy
in the first timestep. Depending on hardware, fitting this model might take 1-5
minutes.
```{r colex-explicit, eval = FALSE}
multi_colex <- flock(
f_occ = ~ uc1,
f_det = ~ uc1 + ec1,
f_col = ~ uc1,
f_ex = ~ uc1,
flocker_data = fd,
multiseason = "colex",
multi_init = "explicit",
backend = "cmdstanr",
cores = 4
)
```
Here's the colonization-extinction model using equilibrium occupancy
probabilities in the first timestep:
```{r colex-equilibrium, eval = FALSE}
multi_colex_eq <- flock(
f_det = ~ uc1 + ec1,
f_col = ~ uc1,
f_ex = ~ uc1,
flocker_data = fd,
multiseason = "colex",
multi_init = "equilibrium",
backend = "cmdstanr",
cores = 4
)
```
Here's the autologistic model with explicit occupancy probabilities in the
first timestep. To reflect the stereotypical autologistic model with a constant
logit-scale offset separating colonization and persistence probabilities, we
use the formula `f_auto = ~ 1`, but it is fine to relax this constraint and use,
e.g. `f_auto = ~ uc1`.
```{r auto-explicit, eval = FALSE}
multi_auto <- flock(
f_occ = ~ uc1,
f_det = ~ uc1 + ec1,
f_col = ~ uc1,
f_auto = ~ 1,
flocker_data = fd,
multiseason = "autologistic",
multi_init = "explicit",
backend = "cmdstanr",
cores = 4
)
```
And finally the autologistic model with equilibrium occupancy probabilities in
the first timestep:
```{r auto-equilirium, eval = FALSE}
multi_auto_eq <- flock(
f_det = ~ uc1 + ec1,
f_col = ~ uc1,
f_auto = ~ 1,
flocker_data = fd,
multiseason = "autologistic",
multi_init = "equilibrium",
backend = "cmdstanr",
cores = 4
)
```
We can reconstruct the occupancy probabilities at every unit using the
`get_Z` function:
```{r Z, eval = FALSE}
Z <- get_Z(multi_colex_eq)
```
We can generate posterior predictions:
```{r predict, eval = FALSE}
pp <- predict_flocker(multi_colex_eq)
```
We can compare the fit of these various models using approximate leave-one-out
cross-validation, where the holdouts consist of entire series
(leave-one-series-out cross-validation).
```{r loo, eval = FALSE}
loo_out <- loo_compare_flocker(
list(multi_colex, multi_colex_eq, multi_auto, multi_auto_eq)
)
loo_out
```
See the main tutorial vignette for an example of the resulting output (this
vignette does not actual run the models to save on computational resources).