---
title: "Introduction to mispitools"
author: "Franco L. Marsico"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to mispitools}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

## Overview

The **mispitools** package provides tools for missing person identification
combining genetic and non-genetic evidence within a Bayesian framework.
It implements likelihood ratio (LR) calculations for various types of
evidence commonly used in forensic identification.

### Key Features

- **Genetic LR simulations**: Simulate likelihood ratios from DNA profiles
  using pedigree-based kinship analysis
- **Non-genetic LR calculations**: Compute LRs for sex, age, hair color,
  and birthdate evidence
- **Decision analysis**: Evaluate error rates and optimal thresholds
- **Visualization**: Plot LR distributions, decision curves, and CPT heatmaps
- **Interactive exploration**: Shiny apps for parameter exploration

## Installation

```{r eval=FALSE}
# Install from CRAN (when available)
install.packages("mispitools")

# Or install the development version from GitHub
# devtools::install_github("MarsicoFL/mispitools")
```

## The Bayesian Framework

In missing person identification, we compare two hypotheses:

- **H1**: The person of interest (POI) is the missing person (MP)
- **H2**: The POI is unrelated to the MP

The **likelihood ratio (LR)** quantifies the relative support for H1 vs H2:

$$LR = \frac{P(Evidence | H1)}{P(Evidence | H2)}$$

Multiple pieces of evidence can be combined by multiplying their LRs:

$$LR_{total} = LR_{genetic} \times LR_{sex} \times LR_{age} \times LR_{color}$$

## Quick Start

### Non-Genetic Evidence

Calculate the LR for sex evidence when the missing person is female:

```{r}
library(mispitools)

# LR for sex evidence
# H1: MP is female, POI observed as female
# eps = probability of sex observation error
lr_sex(LR = TRUE, H = 1, eps = 0.05)
```

Calculate the LR for age evidence:

```{r}
# LR for age evidence
# MP age = 25, tolerance range = 5 years
# POI observed age falls within range
lr_age(LR = TRUE, H = 1, MPa = 25, MPr = 5, epa = 0.05)
```

### Conditional Probability Tables

The package uses Conditional Probability Tables (CPTs) to model evidence
under each hypothesis:

```{r}
# CPT under H2 (population hypothesis)
cpt_h2 <- cpt_population(
  propS = c(0.5, 0.5),  # 50% female, 50% male
  MPa = 30,             # MP age
  MPr = 5,              # Age range
  propC = c(0.3, 0.25, 0.2, 0.15, 0.1)  # Hair color proportions
)

# CPT under H1 (MP hypothesis)
cpt_h1 <- cpt_missing_person(
  MPs = 1,    # Female
  MPc = 2,    # Hair color 2
  eps = 0.05, # Sex error
  epa = 0.05, # Age error
  epc = error_matrix_hair()  # Hair color error matrix
)

# View dimensions
dim(cpt_h1)
```

### Visualizing CPTs

```{r fig.height=4}
# Visualize both CPTs and LR heatmap
plot_cpt(cpt_h2, cpt_h1)
```

## Available Functions

### LR Calculation Functions

| Function | Description |
|----------|-------------|
| `lr_sex()` | LR for biological sex evidence |
| `lr_age()` | LR for age evidence |
| `lr_hair_color()` | LR for hair color evidence |
| `lr_birthdate()` | LR for birthdate evidence (Dirichlet model) |
| `lr_pigmentation()` | LR for multiple pigmentation traits |

### Simulation Functions

| Function | Description |
|----------|-------------|
| `sim_lr_genetic()` | Simulate genetic LRs from pedigrees |
| `sim_lr_prelim()` | Simulate non-genetic LRs |
| `sim_posterior()` | Combine prior and LRs for posterior odds |
| `sim_poi_genetic()` | Generate random DNA profiles |
| `sim_poi_prelim()` | Generate random non-genetic data |

### Decision Analysis

| Function | Description |
|----------|-------------|
| `decision_threshold()` | Find optimal LR threshold |
| `threshold_rates()` | Calculate TPR/FPR at different thresholds |
| `lr_combine()` | Combine genetic and non-genetic LRs |

### Visualization

| Function | Description |
|----------|-------------|
| `plot_lr_distribution()` | Plot LR distributions under H1/H2 |
| `plot_decision_curve()` | ROC-like decision curves |
| `plot_cpt()` | Heatmap of CPT values |

### Interactive Apps

Launch interactive Shiny applications:

```{r eval=FALSE}
# Basic CPT explorer
app_mispitools()

# Advanced LR comparison with ROC analysis
app_lr_comparison()
```

## Population Frequency Databases

The package includes allele frequency databases for genetic LR calculations:

```{r}
# Available databases
data(Argentina)
data(Europe)
data(USA)
data(Asia)
data(Austria)
data(BosniaHerz)
data(China)
data(Japan)

# View structure
dim(Argentina)
names(Argentina)[1:10]
```

## Next Steps

See the "Complete Workflow" vignette for a full example combining
genetic and non-genetic evidence in a missing person case.

## References

Marsico FL, Vigeland MD, Egeland T, Herrera Pinero F (2021). "Making
decisions in missing person identification cases with low statistical
power." *Forensic Science International: Genetics*, 52, 102519.
https://doi.org/10.1016/j.fsigen.2021.102519

Marsico FL, et al. (2023). "Likelihood ratios for non-genetic evidence
in missing person cases." *Forensic Science International: Genetics*,
66, 102891. https://doi.org/10.1016/j.fsigen.2023.102891