--- title: "Getting Started with smsncut" author: "Renato de Paula, Helena Mouriño, Tiago Dias Domingues" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with smsncut} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4 ) library(smsncut) ``` ## Overview `smsncut` implements the parametric decision-theoretic framework for optimal diagnostic cutoff selection under **scale mixtures of skew-normal (SMSN)** distributions, described in: > de Paula, R., Mouriño, H., and Dias Domingues, T. (2026). > *Parametric ROC analysis and optimal cutoff selection under scale mixtures of > skew-normal distributions: a decision-theoretic framework with asymptotic > inference.* arXiv:2605.07829. . The package supports the **skew-normal (SN)** and **skew-*t* (ST)** families and provides tools for model fitting, ROC analysis, optimal cutoff estimation, asymptotic inference, and Monte Carlo validation. --- ## 1. Unconstrained parametrisation All functions use the unconstrained vector: | Family | `theta` | Natural parameters | |--------|-------|---------------------| | SN | `c(xi, log(omega), alpha)` | location, scale, skewness | | ST | `c(xi, log(omega), alpha, log(nu-2))` | + degrees of freedom | ```{r parametrisation} theta_sn <- c(0, log(1), 1.5) smsn_density(c(-1, 0, 1, 2), theta_sn, "SN") theta_st <- c(0, log(1), 1.5, log(8 - 2)) smsn_density(c(-1, 0, 1, 2), theta_st, "ST") ``` --- ## 2. Model fitting `smsn_fit` uses BFGS for the MLE and **Richardson extrapolation** (via `numDeriv::hessian`) for the observed Fisher information matrix, which is critical for accurate variance estimation (de Paula et al., 2026, Section 6.2). ```{r fitting} set.seed(42) x0 <- sn::rst(300, xi = 0, omega = 1, alpha = 1, nu = 8) x1 <- sn::rsn(300, xi = 2, omega = 1, alpha = 1.5) fit0 <- smsn_fit(x0, family = "ST") fit1 <- smsn_fit(x1, family = "SN") cat("Group 0 (ST) MLE:", round(fit0$par, 3), "\n") cat("Group 1 (SN) MLE:", round(fit1$par, 3), "\n") ``` --- ## 3. Admissible interval and boundary conditions ```{r interval} ab <- cutoff_admissible_interval( fit0$par, fit1$par, family0 = "ST", family1 = "SN" ) cat("Interval: [", round(ab["a"], 3), ",", round(ab["b"], 3), "]\n") bc <- cutoff_boundary_check( ab["a"], ab["b"], fit0$par, fit1$par, "ST", "SN", lambda0 = 1, lambda1 = 3, pi0 = 0.9, pi1 = 0.1 ) cat("Boundary conditions satisfied:", bc$satisfied, "\n") ``` --- ## 4. Optimal cutoff ```{r cutoff} res <- cutoff_optimal( fit0$par, fit1$par, "ST", "SN", lambda0 = 1, lambda1 = 3, pi0 = 0.9, pi1 = 0.1, interval = ab ) cat("Optimal cutoff c*:", round(res$cutoff, 4), "\n") cat("Risk at c*: ", round(res$risk, 4), "\n") cat("Youden cutoff: ", round(cutoff_youden(fit0$par, fit1$par, "ST", "SN", interval = ab), 4), "\n") ``` --- ## 5. Local identifiability diagnostic ```{r diagnostic} diag_val <- cutoff_identifiability( res$cutoff, fit0$par, fit1$par, "ST", "SN", lambda0 = 1, lambda1 = 3, pi0 = 0.9, pi1 = 0.1 ) cat("|d phi/dc| at c* =", round(diag_val, 4), "\n") ``` --- ## 6. Asymptotic variance and Wald confidence interval ```{r inference} n0 <- length(x0); n1 <- length(x1) V_hat <- cutoff_variance( res$cutoff, fit0$par, fit1$par, "ST", "SN", lambda0 = 1, lambda1 = 3, pi0 = 0.9, pi1 = 0.1, hess0 = fit0$hessian, hess1 = fit1$hessian, n0 = n0, n1 = n1 ) ci <- cutoff_ci(res$cutoff, V_hat, n = n0 + n1) cat("95% Wald CI: [", round(ci["lower"], 4), ",", round(ci["upper"], 4), "]\n") ``` --- ## 7. ROC curve and AUC ```{r roc} auc_val <- smsn_auc(fit0$par, fit1$par, "ST", "SN") cat("Estimated AUC:", round(auc_val, 4), "\n") r_grid <- seq(0.01, 0.99, by = 0.01) tpr <- smsn_roc(r_grid, fit0$par, fit1$par, "ST", "SN") plot(r_grid, tpr, type = "l", lwd = 2, xlab = "False positive rate", ylab = "True positive rate", main = paste0("Parametric ROC (AUC = ", round(auc_val, 3), ")")) abline(0, 1, lty = 2, col = "grey60") ``` --- ## 8. Reproducing Table 3 (Scenario SN1, n = 200) ```{r simulation, eval=FALSE} set.seed(123) result <- mc_simulate_scenario( theta0_true = c(0, log(1), 1), theta1_true = c(2, log(1), 1.5), family0 = "SN", family1 = "SN", c_true = 1.6101, lambda0 = 1, lambda1 = 1, pi0 = 0.5, pi1 = 0.5, n0 = 100L, n1 = 100L, B = 2000L # use B=2000 for paper results ) cat("Success rate:", result$success_rate, "\n") cat("Mean c_hat: ", round(result$mean_c_hat, 4), "\n") cat("Bias: ", round(result$bias, 4), "\n") cat("RMSE: ", round(result$rmse, 4), "\n") cat("Coverage: ", round(result$coverage, 3), "\n") cat("Ratio: ", round(result$ratio, 3), "\n") cat("Bracket rate:", round(result$bracket_rate, 3), "\n") ```