--- title: "Neo-normal Distribution Family for MCMC Models in 'bnrm', 'brms', and 'Stan' Code" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Neo-normal Distribution Family for MCMC Models in 'bnrm', 'brms', and 'Stan' Code} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE ) ``` ## Introduction The **neodistr** package provides neo-normal distribution functions for Bayesian modeling. This vignette explains how to use it with `bnrm`, `brms`, and `Stan` code, and demonstrates how to fit Bayesian models using the neo-normal distribution. **Note**: This vignette contains computationally intensive examples. To keep the vignette lightweight, we use pre-computed results for the examples # tambahkan penjelasan bahwa iterasi hanay 2000 supaya tdak besar dalam bahasa inggris . The initial examples demonstrate the workflow using pre-computed data and results, while subsequent sections show the code needed to run the models interactively. # Installation Install **neodistr** from CRAN or GitHub: ```{r eval=FALSE, echo=TRUE} # From CRAN install.packages("neodistr") # From GitHub #devtools::install_github("madsyair/neodistr") ``` ## Load Required Packages ```{r,message=FALSE, echo=TRUE,eval=TRUE} library(rstan) library(brms) library(bayesplot) library(loo) library(neodistr) # Set Stan options for faster computation options(mc.cores = min(2, parallel::detectCores())) rstan_options(auto_write = TRUE) ``` ```{r load_packages, message=FALSE,echo=FALSE} library(neodistr) # Load other packages conditionally library(rstan) library(brms) library(bayesplot) library(loo) # Set Stan options for faster computation options(mc.cores = min(2, parallel::detectCores())) rstan_options(auto_write = TRUE) ``` ## A. Parameter Estimation with the FOSSEP Distribution ### Data Simulation and Exploration We simulate a right‐skewed dataset of 100 observations from the FOSSEP distribution ($\mu=0$, $\sigma=1$, $\alpha=2$, and $\beta=2$) and perform basic exploratory analysis. In this vignette, pre-computed data are loaded to streamline the presentation. Pre-computed data are generated from the FOSSEP distribution using the `rfossep` function, which is part of the **neodistr** package. The simulated data is stored in an RDS file for easy access. The simulated data can be generated using the following code. The exploratory analysis includes summary statistics and a histogram of the simulated data. ```{r, eval=FALSE,echo=TRUE} #install.packages("R.utils") #library(R.utils) set.seed(400) precomp_dir <- system.file("data_precomputed", package = "neodistr") # Simulate data from FOSSEP distribution y <- rfossep(50, 0, 1, 2, 2) # Simulated data from fossep distribution df <- data.frame(y) #saveRDS(y, file.path(precomp_dir, "fossep_data.rds")) # Save pre-computed data #gzip(file.path(precomp_dir, "fossep_data.rds")) # Basic exploration summary(y) hist(y, main = "Simulated FOSSEP Data", xlab = "Y", col = "skyblue", breaks = 20) ``` ```{r,eval=TRUE,echo=FALSE} # this data simulation generated from the fossep distribution #y <- rfossep(100, 0, 1, 2, 2) # Simulated data from fossep distribution #df <- data.frame(y) precomp_dir <- system.file("data_precomputed", package = "neodistr") y <- readRDS(file.path(precomp_dir, "fossep_data.rds")) # Load pre-computed data df <- data.frame(y) # Basic exploration summary(y) hist(y, main = "Simulated FOSSEP Data", xlab = "Y", col = "skyblue", breaks = 20) ``` ### Model Specification We model `y` with an intercept-only formula. Priors are informed by the simulation parameters: * $\alpha$ ~ lognormal(0, 0.5) * $\beta$ ~ lognormal(log(2), 0.25) * $\sigma$ ∼ half-normal(0, 1) * $\mu$ ∼ normal(0, 1) ```{r} # Define formula for parameter estimation formula <- brms::bf(y ~ 1) # Choose priors prior <- c( set_prior("lognormal(0,0.5)", class = "alpha"), set_prior("lognormal(log(2),0.5)", class = "beta"), set_prior("normal(0,1)", class = "sigma"), set_prior("normal(0,1)", class = "Intercept") # mu ) ``` ### Prior Predictive Check The prior predictive check is performed to assess the plausibility of the priors by simulating data from the specified priors. This step helps to ensure that the priors are reasonable and that they can generate data similar to the observed data. To perform the prior predictive check, the `bnrm` function is used with the `sample_prior = "only"` argument. This will generate simulated datasets from the priors without fitting the model to the data. ```{r prior_check_fossep, eval=FALSE} # Run prior predictive check (computationally intensive) ppc_fit_fossep <- bnrm( formula = formula, data = df, family = fossep(), prior = prior, sample_prior = "only", chains = 2, # Reduced for demo cores = 1, # Single core for consistency iter = 2000, # Reduced iterations warmup = 1000, refresh = 0 # Suppress output ) ``` ```{r, prior_check_fossep_pre,eval=FALSE} # Show prior predictive check plot ppc_plot_fossep <- pp_check(ppc_fit_fossep, type = "dens_overlay") #saveRDS(ppc_plot_fossep, file.path(precomp_dir, "ppc_plot_fossep.rds")) # Save pre-computed plot file.path(precomp_dir, "fossep_data.rds") print(ppc_plot_fossep) ``` ```{r prior_check_pre_computed, eval=TRUE,echo=FALSE} # Load pre-computed prior predictive check results ppc_plot_fossep <- readRDS(file.path(precomp_dir, "ppc_plot_fossep.rds")) # Load pre-computed plot print(ppc_plot_fossep) # Show prior predictive check plot ``` The plot of the prior predictive check shows the distribution of the simulated data from the priors overlaid the observed data. This allows us to visually assess whether the priors are reasonable and can generate data similar to the observed data. ### Model Fitting We fit the model using the `bnrm` function, specifying the formula, data, family, and priors defined earlier. The fitting process uses MCMC sampling to estimate the posterior distributions of the parameters. ```{r fit_fossep_bnrm, eval=FALSE} # Fit the model (computationally intensive) fit_fossep_bnrm <- bnrm( formula = formula, data = df, family = fossep(), prior = prior, chains = 2, cores = 1, iter = 2000, warmup = 1000, seed = 123, refresh = 0 ) #saveRDS(fit_fossep_bnrm, file.path(precomp_dir, "fit_fossep_bnrm.rds")) # Save pre-computed fit ``` After fitting the model, we can summarize the results to see the estimated parameters and their posterior distributions. The summary will include the mean, standard deviation, and credible intervals for each parameter. The convergence diagnostics will also be checked to ensure that the MCMC chains have mixed well and converged to the posterior distribution. Posterior predictive checks will be performed to assess the model fit and the ability of the model to predict new data. ```{r show_fossep_results, eval=FALSE} summary(fit_fossep_bnrm) # Convergence diagnostics trace_plot <- mcmc_trace(fit_fossep_bnrm) print(trace_plot) # Posterior predictive check pp_plot <- pp_check(fit_fossep_bnrm) #saveRDS(pp_plot, file.path(precomp_dir, "pp_plot_fossep.rds")) # Save pre-computed plot print(pp_plot) ``` ```{r show_fossep_results_pre, eval=TRUE,echo=FALSE} # precomputed results fit_fossep_bnrm <- readRDS(file.path(precomp_dir, "fit_fossep_bnrm.rds")) # Load pre-computed fit summary(fit_fossep_bnrm) # Convergence diagnostics trace_plot <- mcmc_trace(fit_fossep_bnrm, facet_args = list(ncol = 2),pars = c("alpha", "beta", "sigma", "Intercept")) print(trace_plot) # Read pre-computed plot # Posterior predictive check pp_plot<-readRDS(file.path(precomp_dir,"pp_plot_fossep.rds")) # Load pre-computed plot print(pp_plot) ``` ## B. Regression with the JSEP Distribution ### Data Preparation In the following example, we demonstrate how to perform regression modeling using the JSEP distribution. The code provided below shows how to set up the regression model, specify the priors, and fit the model using the `bnrm` function. ### Data Simulation and Exploration ```{r,eval=FALSE} set.seed(123) x <- runif(50) e <- rjsep(50, 0, 0.1, 0.8,2) # Simulated errors from JSEP distribution y <- 0.5 + 0.8*x + e xy<-cbind(y,x) df_reg <- data.frame(y, x) # Basic exploration plot(x, y, main = "Regression Data", pch = 19, col = "steelblue") #save pre-computed data #saveRDS(xy, file.path(precomp_dir,"jsep_data.rds")) # Save pre-computed data ``` ```{r,eval=TRUE, echo=FALSE} xy <- readRDS(file.path(precomp_dir,"jsep_data.rds")) # Load pre-computed data df_reg<-data.frame(xy) x<- df_reg$x y<- df_reg$y # Basic exploration plot(x, y, main = "Regression Data", pch = 19, col = "steelblue") ``` ### Model Specification and Fitting We specify the regression model using the `brms` formula interface. The regression formula is defined as `y ~ x`, where `y` is the response variable and `x` is the predictor variable. The priors for the regression parameters are set as follows: Prior of $\alpha$ is set to log-normal(0, 0.5), $\beta$ is set to log-normal(1, 0.5), $\sigma$ is set to half-normal(0, 1), and the intercept and slope are set to normal(0, 1). The code syntax for defining the model and these priors is shown below. ```{r,eval=TRUE} # Define regression formula formula_reg <- brms::bf(y ~ x) # Priors for regression prior_reg <- c( set_prior("lognormal(log(2),0.25)", class = "alpha"), set_prior("lognormal(log(2),0.25)", class = "beta"), set_prior("normal(0,1)", class = "sigma"), set_prior("normal(0,1)", class = "Intercept"), set_prior("normal(0,1)", class = "b") ) ``` ### Prior Predictive Check We evaluate our chosen priors via a prior predictive check, which entails drawing simulated datasets solely from those priors to verify they can plausibly reproduce the characteristics of the actual data. In practice, this is done by calling `bnrm(..., sample_prior = "only")`, which generates the prior‐based simulations without fitting the model to the observed data. ```{r pc_check_jsep_reg, eval=FALSE} # Fit regression model fit_ppc_jsep <- bnrm( formula = formula_reg, data = df_reg, family = jsep(), prior = prior_reg, chains = 2, cores = 1, iter = 2000, sample_prior = "only", # Prior predictive check warmup = 1000, seed = 123, refresh = 0 ) # prior predictive check fir jesp ppc_plot_jsep<- pp_check(fit_ppc_jsep, type = "dens_overlay", nsamples = 200) #saveRDS(ppc_plot_jsep, file.path(precomp_dir,"ppc_plot_jsep.rds")) # Save pre-computed plot print(ppc_plot_jsep) # Show prior predictive check plot ``` ```{r ppc_plot_jsep, eval=TRUE,echo=FALSE,message=FALSE} ppc_plot_jsep <- readRDS(file.path(precomp_dir,"ppc_plot_jsep.rds")) # Load pre-computed plot print(ppc_plot_jsep) # Show prior predictive check plot ``` The prior predictive check plot overlays the data simulated from our priors onto the real observations, letting us visually judge whether those priors can plausibly reproduce the distribution of the actual data. ### Model Fitting We fit the regression model using the `bnrm` function, specifying the formula, data, family, and priors defined earlier. The fitting process uses MCMC sampling to estimate the posterior distributions of the parameters. ```{r fit_jsep_bnrm, eval=FALSE} fit_jsep_bnrm <- bnrm( formula = formula_reg, data = df_reg, family = jsep(), prior = prior_reg, chains = 2, cores = 1, iter = 2000, warmup = 1000, seed = 123, refresh = 0 ) #saveRDS(fit_jsep_bnrm, file.path(precomp_dir, "fit_jsep_bnrm.rds")) # Save pre-computed fit ``` After fitting the model, we can summarize the results to see the estimated parameters and their posterior distributions. The summary will include the mean, standard deviation, and credible intervals for each parameter. The convergence diagnostics will also be checked to ensure that the MCMC chains have mixed well and converged to the posterior distribution. Posterior predictive checks will be performed to assess the model fit and the ability of the model to predict new data. ```{r show_jsep_results, eval=FALSE} summary(fit_jsep_bnrm) # Convergence diagnostics trace_plot <- mcmc_trace(fit_jsep_bnrm, facet_args = list(ncol = 2),pars = c("alpha", "beta", "sigma", "Intercept","b_x")) print(trace_plot) # Posterior predictive check pp_plot <- pp_check(fit_jsep_bnrm) #saveRDS(pp_plot, file.path(precomp_dir,"pp_plot_jsep.rds")) # Save pre-computed plot # Model evaluation loo_result <- loo(fit_jsep_bnrm, moment_match = TRUE) #saveRDS(loo_result, file.path(precomp_dir,"loo_result_jsep.rds")) # Save pre-computed loo result print(loo_result) ``` ```{r show_jsep_results_pre, eval=TRUE, echo=FALSE} fit_jsep_bnrm <- readRDS(file.path(precomp_dir,"fit_jsep_bnrm.rds")) # Load pre-computed fit summary(fit_jsep_bnrm) # Convergence diagnostics trace_plot <- mcmc_trace(fit_jsep_bnrm, facet_args = list(ncol = 2),pars = c("alpha", "beta", "sigma", "Intercept","b_x")) print(trace_plot) # Posterior predictive check pp_plot <- readRDS(file.path(precomp_dir,"pp_plot_jsep.rds")) # Load pre-computed print(pp_plot) # Model evaluation loo_result <- readRDS(file.path(precomp_dir,"loo_result_jsep.rds")) # Load pre-computed loo result print(loo_result) ``` ## Usage Example with `brms` code The `brms` package provides a high-level interface for Bayesian regression modeling using Stan. It allows users to specify models using a formula interface similar to that of `lm` or `glm`, while also supporting custom families. The neo-normal distributions, such as FOSSEP and JSEP, can be used with `brms` by defining custom families. The `brms` interface requires custom family definitions. Fossep and JSEP distributions can be defined using the `brms_custom_family` function. The following code demonstrates how to set up the custom family for FOSSEP distributions, fit the model, and summarize the results. The following code demonstrates how to set up the custom family for FOSSEP distributions, fit the model, and summarize the results. The `brms_custom_family` function is used to define the custom family, and then the `brm` function is used to fit the model. The summary of the fitted model is displayed at the end. ```{r brms_demo, eval=FALSE} # Define custom family for brms fossep_custom <- brms_custom_family("fossep") # Fit model with brms fit_brms_demo <- brm( y ~ 1, data = df, family = fossep_custom$custom_family, stanvars = fossep_custom$stanvars_family, chains = 2, cores = 2, iter = 4000, warmup = 1000, seed = 123, refresh = 0 ) summary(fit_brms_demo) ``` ## Usage Example with Stan code The `stanf_fossep` function generates the Stan code for the FOSSEP distribution. The following code demonstrates how to prepare the data, define the Stan model, and fit it using the `stan` function. The Stan model includes the custom family function and the data specification. ```{r stan_demo, eval=FALSE} # Prepare data for Stan stan_data <- list( n = length(y), y = y ) # Simple Stan model (abbreviated for vignette) stan_code <- paste0( "functions { ", stanf_fossep(vectorize = TRUE), " }", "data { int n; vector[n] y; }", "parameters { real mu; real sigma; real alpha; real beta; }", "model { y ~ fossep(rep_vector(mu, n), sigma, alpha, beta); }" ) # Fit Stan model fit_stan_demo <- stan( model_code = stan_code, data = stan_data, chains = 2, cores = 1, iter = 1000, warmup = 500, refresh = 0 ) summary(fit_stan_demo, pars = c("mu", "sigma", "alpha", "beta")) ``` ## Practical Recommendations **For routine analysis:** - Use `bnrm()` for seamless integration with neo-normal distributions - Start with `iter = 2000, warmup = 1000` for initial exploration - Increase to `iter = 4000, warmup = 2000` for final results **For custom implementations:** - Use `brms_custom_family()` for maximum flexibility with brms - Direct Stan code offers most control but requires more setup **Performance tips:** - Use `cores = parallel::detectCores()` for parallel processing - Set `refresh = 0` to suppress output in production - Consider `adapt_delta = 0.95` if you encounter divergent transitions For production use, we recommend starting with the `bnrm` interface for its simplicity, then moving to `brms` or direct Stan code for more complex modeling needs.