---
title: "Real-world example"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Real-world example}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, warning=FALSE, message=FALSE}
library(rosario)
library(dplyr)
library(lubridate)
```

## 1. Overview

The `rosario` package implements a null model analysis to quantify concurrent temporal niche overlap (i.e., activity or phenology) among biological identities (e.g., individuals, populations, or species) using the Rosario randomization algorithm (Castro-Arellano et al. 2010). It is designed for cyclical time data (e.g., hours of the day, months of the year) and can be used to assess temporal overlap among two or more biological identities.

This vignette provides a real-world example showing how to (1) use an external camera-trap dataset, (2) convert detections into fixed time bins, (3) build the matrix required by `rosario`, and (4) run a null model test of assemblage-wide temporal overlap.

## 2. Data Loading

We use a subset of the SIMDeer project (Foster et al. 2020) representing 85,929 capture events from three cervid species recorded in British Columbia, Canada (51,691 Mule deer, 19,395 Elk, and 14,843 White-tailed deer). The full dataset is available through Wildlife Insights (https://app.wildlifeinsights.org/explore).

The dataset used in this vignette is:

  - `Sim_data`: a CSV file containing capture events (timestamped detections) for three cervid species.

Note: To run this vignette locally with the full data set, download and load the external data using the code below. The data is hosted on Figshare. This step is not run automatically to comply with CRAN policies.

```{r set-working-directory, eval = FALSE}
options(timeout = 600)
download.file("https://ndownloader.figshare.com/files/61985230", destfile = "Sim_data.csv", mode = "wb")
```
```{r eval = FALSE}
Sim_dat <- read.csv("Sim_data.csv")
head(Sim_dat)
```

## 3. Organizing detections into time bins

The `rosario` workflow requires a matrix where:

- rows = biological identities (in this example: species),
- columns = time intervals across a cycle (in this example: 30-minute bins across a 24-hour day),
- values = counts (or proportions) of detections in each time interval.

We convert event-level timestamps into a single row of binned counts per species. Time intervals are treated as circular (e.g., hours of the day), so the cycle wraps from the last interval back to the first.

### 3.1. Helper function to bin detections for one species

The function below converts timestamps to date-time, assigns each detection to a 30-minute bin, and returns a one-row data frame where columns are time bins (formatted as `"HH:MM:SS"`) and values are detection counts.

```{r eval = FALSE}
bin_species <- function(dat, species_code, bin_mins = 30) {
  dat %>%
    filter(species == species_code) %>%
    mutate(
      timestamp = mdy_hm(timestamp),
      bin = floor_date(timestamp, "hour") +
        minutes(floor(minute(timestamp) / bin_mins) * bin_mins),
      hour_min_sec = format(as.POSIXct(bin), "%H:%M:%S")
    ) %>%
    count(hour_min_sec, name = "count") %>%
    tidyr::pivot_wider(
      names_from = hour_min_sec,
      values_from = count,
      values_fill = 0
    )
}
```

### 3.2. Create binned rows for each cervid species

Here we generate one binned row per species:

```{r eval = FALSE}
mule_deer <- bin_species(Sim_dat, "hemionus")     # Mule deer
elk      <- bin_species(Sim_dat, "canadensis")    # Elk
wtd      <- bin_species(Sim_dat, "virginianus")   # White-tailed deer
```

### 3.3. Combine species rows into the `rosario` input matrix

We bind the three species rows into a single matrix. Any missing time bins are filled with zeros by the binning function above, and we ensure all entries are numeric.

```{r eval = FALSE}
binned_df <- dplyr::bind_rows(
  MuleDeer = mule_deer,
  Elk      = elk,
  WTD      = wtd,
  .id = "species"
)

# Convert to a numeric matrix (rows = species; columns = time bins)
rownames(binned_df) <- binned_df$species
data_matrix <- binned_df %>%
  select(-species) %>%
  mutate(across(everything(), as.numeric)) %>%
  as.matrix()
rownames(data_matrix) <- binned_df$species

dim(data_matrix)
```

### 3.4. Optional: rescale counts to proportions

Users may analyze either counts or proportions (e.g., to compare species with very different sample sizes). The function `rescale_matrix()` rescales rows so each row sums to 1.

```{r eval = FALSE}
data_matrix_prop <- rescale_matrix(data_matrix)
rowSums(data_matrix_prop)
```

## 4. Running `rosario` on the real-world dataset

This section demonstrates how to run the `rosario` workflow using the matrix created above.

### 4.1. Create cyclic shifts and mirror images

The `rosario()` function creates the set of cyclic shifts and their mirror images (reverse order), preserving shape while changing location along the cycle and maintaining temporal autocorrelation. The suite of vectors and mirrors represents a complete set of possible distributions.

```{r eval = FALSE}
cervid_shifts <- rosario(data_matrix[1, ])  # example: generate shifts for Mule deer
head(cervid_shifts)
```

### 4.2. Visualize patterns generated by `rosario()` function

Use `plot_rosario()` to visualize hypothetical time-use distributions produced by `rosario()` for a single biological identity. The function plots the first 10 cyclic shifts and their mirror images. Each panel shows one hypothetical time-use distribution, with the cyclic shift in dark gray and its mirror image in dark red.

```{r eval = FALSE}
plot_rosario(data_matrix[1, ], cols = 5) # example: visualize shifts for Mule deer
```

### 4.3. Assemblage-wide temporal overlap

Use `temp_overlap()` to compute the mean of all pairwise overlaps among rows (species), using the chosen index: "pianka" or "czekanowski". This returns the observed assemblage-wide overlap among the cervid species.

```{r eval = FALSE}
Results_Pianka <-temp_overlap(data_matrix, method = "pianka") 
Results_Pianka

Results_Czekanowski <-temp_overlap(data_matrix, method = "czekanowski")
Results_Czekanowski
```

### 4.4. Null Model Test

To test whether the observed overlap differs from random expectations, `get_null_model()` generates a null distribution of assemblage-wide overlap by repeatedly randomizing the input matrix using `rosario_sample()` and recomputing mean pairwise overlap (via `temp_overlap()`).

`get_null_model()` can run simulations in parallel by setting `parallel = TRUE`. For CRAN checks and maximum portability, this vignette uses `parallel = FALSE`.

We recommend using `nsim = 1000` for applied analyses, but runtime depends on hardware and dataset size.

```{r eval = FALSE}
set.seed(1)
Null_Model_Pianka <- get_null_model(data_matrix, method = "pianka", nsim = 100, parallel = FALSE)
Null_Model_Pianka$p_value

Null_Model_Czekanowski <- get_null_model(data_matrix, method = "czekanowski", nsim = 100, parallel = FALSE)
Null_Model_Czekanowski$p_value
```

### 4.5. Visualizing the null distribution

Use `temp_overlap_plot()` to compare the observed overlap (from `temp_overlap()`) against the simulated null distribution (from `get_null_model()`). The plot shows a histogram of simulated overlap values and a vertical line indicating the observed overlap.

```{r eval = FALSE}
temp_overlap_plot(Null_Model_Pianka)

temp_overlap_plot(Null_Model_Czekanowski)
```

## 5. Conclusion

This vignette demonstrates the key steps needed to analyze cyclical temporal data using `rosario`: downloading event-level detections, binning them into time intervals, building a species-by-time matrix, computing observed assemblage-wide overlap, and comparing observed overlap against a null model.

For more details on each function, see the package documentation and the examples in each function’s help page.