---
title: "tidyindex"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{tidyindex}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, echo = TRUE, message = FALSE}
library(tidyindex)
library(dplyr)
library(lubridate)
library(lmomco)
library(ggplot2)
library(tsibble)
```

The tidyindex package provides functionality to construct indexes in a data 
pipeline, align with the tidyverse paradigm. The pipeline approach is 
universally applicable to indexes of all kinds. It allows indexes to be broken 
down into a set of defined building blocks (modules) and hence provides means 
to standardise the workflow to construct, compare, and analyse indexes.

## Decomposing an index into steps

Here we present an example to calculate one of the most widely used drought 
index: Standardised Precipitation Index (SPI). The index is composed to three 
steps: 

  - step 1: aggregate the precipitation series in a rolling window
  - step 2: fit a distribution (usually gamma), per month, to the aggregated 
  precipitation
  - step 3: normalise the fitted values to a standard normal distribution as 
  the index
  
## Pipeline design  

These three steps correspond to three modules in the tidyindex pipeline 
(`temporal_aggregate()`, `distribution_fit()`, and `normalise()`). Each module 
uses a tidyverse-mutate style to calculate a step within the module. 
For example, the following code fits a gamma distribution to the variable 
`.agg`. Different distributions are available and prefixed with `dist_*()` 
and additional distribution can be added by the user following a similar style 
to the existing `dist_*()` steps. The step `dist_*()` can also be evaluated 
standalone and seen as a recipe of the step: 

```{r eval = FALSE}
distribution_fit(.fit = dist_gamma(...))
```

```{r}
dist_gamma(var = ".agg")
```

## Standardised Precipitation Index (SPI): An example

Here we select a single station, Texas Post Office, where is heavily impacted 
during the 2019/20 bushfire season, in Queensland, Australia, to demonstrate 
the calculation.

```{r}
texas_post_office <- queensland %>% 
  filter(name == "TEXAS POST OFFICE") %>% 
  mutate(month = lubridate::month(ym)) 

dt <- texas_post_office |>
  init(id = id, time = ym, group = month) |> 
  temporal_aggregate(.agg = temporal_rolling_window(prcp, scale = 24)) |> 
  distribution_fit(.fit = dist_gamma(var = ".agg")) |>
  tidyindex::normalise(.index = norm_quantile(.fit))
dt
```

The results contain a summary of the steps used and the data with intermediate 
variables (`.agg`, `.fit`, and `.fit_obj`) and the index (`.index`). We can plot 
the result using `ggplot2` as:

```{r}
dt$data |> 
  ggplot(aes(x = ym, y = .index)) + 
  geom_hline(yintercept = -2, color = "red",  linewidth = 1) + 
  geom_line() + 
  scale_x_yearmonth(name = "Year", date_break = "2 years", date_label = "%Y") +
   theme_bw() +
  facet_wrap(vars(name), ncol = 1) + 
  theme(panel.grid = element_blank(), 
        legend.position = "bottom") + 
  ylab("SPI")
```

# What's more

There are many different things you can do with the package, for example:

  - to switch from SPI to  Standardized Precipitation-Evapotranspiration Index 
  (SPEI), simply add an variable transformation step to compute 
  evapotranspiration from temperature data: `variable_trans(.pet = trans_thornthwaite(.tavg = tavg, .lat = lat))`
  - a set of existing drought indexes are available as `idx_spi()`, `idx_spei()`,
  `idx_edi()`, and `idx_rdi()`
  - to compute multiple indexes at once, check `compute_indexes()`
  - to calculate parameter uncertainty with the distribution fit, check the `.n_boot` argument in the `distribution_fit()`