---
title: "Exploring Mammal Diversity in Peru with perumammals"
author: "Paul Efren Santos Andrade"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Exploring Mammal Diversity in Peru with perumammals}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(perumammals)
library(dplyr)
library(ggplot2)
library(tidyr)
```

## Introduction

Peru is widely recognized as one of the world's megadiverse countries, ranking second in the Neotropics and the Americas for mammalian diversity, with **573 species** across 13 orders (Pacheco et al., 2021).
The `perumammals` package provides direct access to this updated taxonomic dataset, enabling researchers, students, and conservation practitioners to explore, validate, and analyze mammals occurring in Peru.

### Dataset Background

The updated mammalian checklist by Pacheco et al. (2021) compiles:

* **573 species**, **223 genera**, **51 families**, **13 orders**
* **87 endemic species** (15.2% of total)
* Presence across 9 ecoregions, from coastal deserts to Amazonian rainforests
* Continuous updates reflecting recent systematic revisions

The package adopts this taxonomic framework and includes tools for validating species names and handling synonyms or misspelled entries via fuzzy matching.

## Loading the Dataset

```{r load-data}
data("peru_mammals")

glimpse(peru_mammals)

```

## Taxonomic Diversity

### Orders and Species Richness

```{r orders-diversity, fig.width=8, fig.height=6, out.width="100%", fig.align="center", fig.alt="."}
order_summary <- pm_list_orders() |> 
  dplyr::arrange(dplyr::desc(n_species)) |> 
  dplyr::mutate(percentage = round(n_species / sum(n_species) * 100, 1))

order_summary

ggplot(order_summary, 
       aes(x = reorder(order, n_species),
           y = n_species)) +
  geom_col(fill = "steelblue") +
  geom_text(aes(label = n_species),
            hjust = -0.2, size = 3) +
  coord_flip() +
  labs(
    title = "Mammalian Diversity in Peru by Order",
    subtitle = "Based on Pacheco et al. (2021)",
    x = "Order",
    y = "Number of Species"
  ) +
  theme_minimal()
```

**Rodentia** (194 species) and **Chiroptera** (189 species) together account for nearly **67%** of Peru’s mammalian diversity.

## Endemic Species

```{r endemics}
pm_list_endemic(include_rate = TRUE)
```

**Rodentia** contains the largest number of endemic species (56), representing **64.4%** of all mammalian endemics in Peru.

## Geographic Distribution

### Ecoregional Diversity

```{r ecoregions, fig.width=10, fig.height=12, out.width="100%", fig.align="center", fig.alt="."}

# Obtener datos de ecoregiones
ecoregion_diversity <- pm_list_ecoregions(include_endemic = TRUE)

# Preparar datos para el gráfico tipo waffle
waffle_data <- ecoregion_diversity |> 
  select(ecoregion_label, pct_endemic) |> 
  mutate(
    total_pct = paste0(round(pct_endemic, 1), "% endemic"),
    ecoregion_full = paste0(ecoregion_label, "\n", total_pct)
  ) |> 
  rowwise() |> 
  mutate(
    grid_data = list(
      expand.grid(x = 1:10, y = 1:10) |> 
        mutate(
          id = row_number(),
          type = ifelse(id <= round(pct_endemic), "Endemic", "Common")
        )
    )
  ) |> 
  unnest(grid_data)

# Crear el gráfico
ggplot(waffle_data, aes(x = x, y = y, fill = type)) +
  geom_tile(color = "white", linewidth = 0.4) +
  facet_wrap(~ecoregion_full, ncol = 2) +
  scale_fill_manual(
    name = NULL,
    values = c("Common" = "#CBD5E1", "Endemic" = "#E07A5F"),
    labels = c("Common species", "Endemic species")
  ) +
  coord_equal() +
  labs(
    title = "Mammal Endemism by Ecoregion in Peru",
    subtitle = "Each square represents 1% of species. The percentage indicates the proportion of endemic species"
  ) +
  theme_void() +
  theme(
    plot.title = element_text(
      face = "bold", 
      size = 16, 
      hjust = 0.5, 
      margin = margin(b = 5)
    ),
    plot.subtitle = element_text(
      hjust = 0.5, 
      size = 10, 
      color = "gray30", 
      margin = margin(b = 20)
    ),
    legend.position = "bottom",
    legend.text = element_text(size = 10),
    legend.key.size = unit(0.7, "cm"),
    legend.margin = margin(t = 15),
    strip.text = element_text(
      size = 9,
      face = "bold", 
      lineheight = 1.1,
      margin = margin(t = 5, b = 5)  # Aumentar márgenes
    ),
    strip.clip = "off",  # Permitir que el texto se extienda fuera del panel
    plot.background = element_rect(fill = "white", color = NA),
    panel.spacing.x = unit(2, "lines"),  # Aumentar espacio horizontal
    panel.spacing.y = unit(2.5, "lines"),  # Aumentar espacio vertical
    plot.margin = margin(15, 15, 15, 15)
  )

```

The **Selva Baja** contains the highest species richness, while the **Yungas** harbor the greatest number of endemic mammals.

## Name Validation and Fuzzy Matching

```{r fuzzy-matching}
species_list <- c(
  "Tremarctos ornatos",
  "Leopardus pardalis",
  "Odocoileus virginanus",
  "Lagothrix flavicauda",
  "Alouatta seniculus",
  "Puma concolor"
)

validated <- validate_peru_mammals(
  species_list,
  quiet = TRUE
)

validated %>%
  select(Orig.Name, Matched.Name, Match.Level, valid_rank)
```

The fuzzy matching tool corrects common misspellings and returns standardized scientific names while preserving the original input.

## Practical Applications

### Conservation Assessment

```{r conservation-example}
endemic_primates <- peru_mammals %>%
  filter(order == "Primates", endemic == TRUE) %>%
  select(genus, species, common_name)

endemic_primates
```

### Research Data Cleaning

```{r data-cleaning-example}
field_data <- data.frame(
  site = rep(c("Site_A", "Site_B", "Site_C"), each = 3),
  species = c(
    "Tremarctos ornatos", "Mazama rufina", "Odocoileus virginianus",
    "Leopardus pardalis", "Puma concolor", "Leopardus jacobita",
    "Lagothrix flavicauda", "Ateles belzebuth", "Alouatta seniculus"
  ),
  count = c(2, 1, 3, 1, 2, 1, 5, 3, 2)
)

cleaned_data <- field_data %>%
  left_join(
    validate_peru_mammals(field_data$species),
    by = c("species" = "Orig.Name")
  ) %>%
  select(site, original = species, validated = Matched.Name, count, Match.Level)

cleaned_data
```
---