---
title: "tidyindex"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{tidyindex}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup, echo = TRUE, message = FALSE}
library(tidyindex)
library(dplyr)
library(lubridate)
library(lmomco)
library(ggplot2)
library(tsibble)
```
The tidyindex package provides functionality to construct indexes in a data
pipeline, align with the tidyverse paradigm. The pipeline approach is
universally applicable to indexes of all kinds. It allows indexes to be broken
down into a set of defined building blocks (modules) and hence provides means
to standardise the workflow to construct, compare, and analyse indexes.
## Decomposing an index into steps
Here we present an example to calculate one of the most widely used drought
index: Standardised Precipitation Index (SPI). The index is composed to three
steps:
- step 1: aggregate the precipitation series in a rolling window
- step 2: fit a distribution (usually gamma), per month, to the aggregated
precipitation
- step 3: normalise the fitted values to a standard normal distribution as
the index
## Pipeline design
These three steps correspond to three modules in the tidyindex pipeline
(`temporal_aggregate()`, `distribution_fit()`, and `normalise()`). Each module
uses a tidyverse-mutate style to calculate a step within the module.
For example, the following code fits a gamma distribution to the variable
`.agg`. Different distributions are available and prefixed with `dist_*()`
and additional distribution can be added by the user following a similar style
to the existing `dist_*()` steps. The step `dist_*()` can also be evaluated
standalone and seen as a recipe of the step:
```{r eval = FALSE}
distribution_fit(.fit = dist_gamma(...))
```
```{r}
dist_gamma(var = ".agg")
```
## Standardised Precipitation Index (SPI): An example
Here we select a single station, Texas Post Office, where is heavily impacted
during the 2019/20 bushfire season, in Queensland, Australia, to demonstrate
the calculation.
```{r}
texas_post_office <- queensland %>%
filter(name == "TEXAS POST OFFICE") %>%
mutate(month = lubridate::month(ym))
dt <- texas_post_office |>
init(id = id, time = ym, group = month) |>
temporal_aggregate(.agg = temporal_rolling_window(prcp, scale = 24)) |>
distribution_fit(.fit = dist_gamma(var = ".agg")) |>
tidyindex::normalise(.index = norm_quantile(.fit))
dt
```
The results contain a summary of the steps used and the data with intermediate
variables (`.agg`, `.fit`, and `.fit_obj`) and the index (`.index`). We can plot
the result using `ggplot2` as:
```{r}
dt$data |>
ggplot(aes(x = ym, y = .index)) +
geom_hline(yintercept = -2, color = "red", linewidth = 1) +
geom_line() +
scale_x_yearmonth(name = "Year", date_break = "2 years", date_label = "%Y") +
theme_bw() +
facet_wrap(vars(name), ncol = 1) +
theme(panel.grid = element_blank(),
legend.position = "bottom") +
ylab("SPI")
```
# What's more
There are many different things you can do with the package, for example:
- to switch from SPI to Standardized Precipitation-Evapotranspiration Index
(SPEI), simply add an variable transformation step to compute
evapotranspiration from temperature data: `variable_trans(.pet = trans_thornthwaite(.tavg = tavg, .lat = lat))`
- a set of existing drought indexes are available as `idx_spi()`, `idx_spei()`,
`idx_edi()`, and `idx_rdi()`
- to compute multiple indexes at once, check `compute_indexes()`
- to calculate parameter uncertainty with the distribution fit, check the `.n_boot` argument in the `distribution_fit()`