---
title: "Getting Started with smsncut"
author: "Renato de Paula, Helena Mouriño, Tiago Dias Domingues"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with smsncut}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  fig.width  = 6,
  fig.height = 4
)
library(smsncut)
```

## Overview

`smsncut` implements the parametric decision-theoretic framework for optimal
diagnostic cutoff selection under **scale mixtures of skew-normal (SMSN)**
distributions, described in:

> de Paula, R., Mouriño, H., and Dias Domingues, T. (2026).
> *Parametric ROC analysis and optimal cutoff selection under scale mixtures of
> skew-normal distributions: a decision-theoretic framework with asymptotic
> inference.* arXiv:2605.07829. <https://doi.org/10.48550/arXiv.2605.07829>.

The package supports the **skew-normal (SN)** and **skew-*t* (ST)** families
and provides tools for model fitting, ROC analysis, optimal cutoff estimation,
asymptotic inference, and Monte Carlo validation.

---

## 1. Unconstrained parametrisation

All functions use the unconstrained vector:

| Family | `theta` | Natural parameters |
|--------|-------|---------------------|
| SN | `c(xi, log(omega), alpha)` | location, scale, skewness |
| ST | `c(xi, log(omega), alpha, log(nu-2))` | + degrees of freedom |

```{r parametrisation}
theta_sn <- c(0, log(1), 1.5)
smsn_density(c(-1, 0, 1, 2), theta_sn, "SN")

theta_st <- c(0, log(1), 1.5, log(8 - 2))
smsn_density(c(-1, 0, 1, 2), theta_st, "ST")
```

---

## 2. Model fitting

`smsn_fit` uses BFGS for the MLE and **Richardson extrapolation** (via
`numDeriv::hessian`) for the observed Fisher information matrix, which is
critical for accurate variance estimation (de Paula et al., 2026, Section 6.2).

```{r fitting}
set.seed(42)
x0 <- sn::rst(300, xi = 0, omega = 1, alpha = 1, nu = 8)
x1 <- sn::rsn(300, xi = 2, omega = 1, alpha = 1.5)

fit0 <- smsn_fit(x0, family = "ST")
fit1 <- smsn_fit(x1, family = "SN")
cat("Group 0 (ST) MLE:", round(fit0$par, 3), "\n")
cat("Group 1 (SN) MLE:", round(fit1$par, 3), "\n")
```

---

## 3. Admissible interval and boundary conditions

```{r interval}
ab <- cutoff_admissible_interval(
  fit0$par, fit1$par, family0 = "ST", family1 = "SN"
)
cat("Interval: [", round(ab["a"], 3), ",", round(ab["b"], 3), "]\n")

bc <- cutoff_boundary_check(
  ab["a"], ab["b"], fit0$par, fit1$par, "ST", "SN",
  lambda0 = 1, lambda1 = 3, pi0 = 0.9, pi1 = 0.1
)
cat("Boundary conditions satisfied:", bc$satisfied, "\n")
```

---

## 4. Optimal cutoff

```{r cutoff}
res <- cutoff_optimal(
  fit0$par, fit1$par, "ST", "SN",
  lambda0 = 1, lambda1 = 3, pi0 = 0.9, pi1 = 0.1,
  interval = ab
)
cat("Optimal cutoff c*:", round(res$cutoff, 4), "\n")
cat("Risk at c*:       ", round(res$risk,   4), "\n")
cat("Youden cutoff:    ", round(cutoff_youden(fit0$par, fit1$par, "ST", "SN",
                                               interval = ab), 4), "\n")
```

---

## 5. Local identifiability diagnostic

```{r diagnostic}
diag_val <- cutoff_identifiability(
  res$cutoff, fit0$par, fit1$par, "ST", "SN",
  lambda0 = 1, lambda1 = 3, pi0 = 0.9, pi1 = 0.1
)
cat("|d phi/dc| at c* =", round(diag_val, 4), "\n")
```

---

## 6. Asymptotic variance and Wald confidence interval

```{r inference}
n0 <- length(x0); n1 <- length(x1)

V_hat <- cutoff_variance(
  res$cutoff, fit0$par, fit1$par, "ST", "SN",
  lambda0 = 1, lambda1 = 3, pi0 = 0.9, pi1 = 0.1,
  hess0 = fit0$hessian, hess1 = fit1$hessian, n0 = n0, n1 = n1
)

ci <- cutoff_ci(res$cutoff, V_hat, n = n0 + n1)
cat("95% Wald CI: [", round(ci["lower"], 4), ",", round(ci["upper"], 4), "]\n")
```

---

## 7. ROC curve and AUC

```{r roc}
auc_val <- smsn_auc(fit0$par, fit1$par, "ST", "SN")
cat("Estimated AUC:", round(auc_val, 4), "\n")

r_grid <- seq(0.01, 0.99, by = 0.01)
tpr    <- smsn_roc(r_grid, fit0$par, fit1$par, "ST", "SN")
plot(r_grid, tpr, type = "l", lwd = 2,
     xlab = "False positive rate", ylab = "True positive rate",
     main = paste0("Parametric ROC (AUC = ", round(auc_val, 3), ")"))
abline(0, 1, lty = 2, col = "grey60")
```

---

## 8. Reproducing Table 3 (Scenario SN1, n = 200)

```{r simulation, eval=FALSE}
set.seed(123)
result <- mc_simulate_scenario(
  theta0_true = c(0, log(1), 1),  theta1_true = c(2, log(1), 1.5),
  family0 = "SN", family1 = "SN", c_true = 1.6101,
  lambda0 = 1, lambda1 = 1, pi0 = 0.5, pi1 = 0.5,
  n0 = 100L, n1 = 100L, B = 2000L  # use B=2000 for paper results
)

cat("Success rate:", result$success_rate, "\n")
cat("Mean c_hat:  ", round(result$mean_c_hat, 4), "\n")
cat("Bias:        ", round(result$bias,       4), "\n")
cat("RMSE:        ", round(result$rmse,       4), "\n")
cat("Coverage:    ", round(result$coverage,   3), "\n")  
cat("Ratio:       ", round(result$ratio,      3), "\n")  
cat("Bracket rate:", round(result$bracket_rate, 3), "\n") 
```