---
title: "Machine Learning with AddiVortes: A Bayesian Alternative to BART"
author: "John Paul Gosling and Adam Stone"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Machine Learning with AddiVortes: A Bayesian Alternative to BART}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This vignette provides a comprehensive example of how to use the `AddiVortes` package for machine learning regression tasks. AddiVortes offers a Bayesian alternative to BART (Bayesian Additive Regression Trees), using Voronoi tessellations for spatial partitioning. We will walk through loading data, training a regression model, and making predictions on a test set using this powerful non-parametric Bayesian approach.

### 1. Loading the Package and Data

First, we load the `AddiVortes` package. For this example, we will use the well-known Boston Housing dataset.

```{r, message=FALSE, warning=FALSE}
# Load the package
require(AddiVortes)

# Load the Boston Housing dataset from a URL
Boston <- read.csv(paste0("https://raw.githubusercontent.com/anonymous2738/",
                          "AddiVortesAlgorithm/DataSets/BostonHousing_Data.csv"))


# Separate predictors (X) and the response variable (Y)
X_Boston <- as.matrix(Boston[, 2:14])
Y_Boston <- as.numeric(as.matrix(Boston[, 15]))

# Clean up the environment
rm(Boston)
```

### 2. Preparing the Data

To evaluate the model's performance, we need to split the data into a training set and a testing set. We will use a standard 5/6 split for training and 1/6 for testing.

```{r}
n <- length(Y_Boston)

# Set a seed for reproducibility
set.seed(1025)

# Create a training set containing 5/6 of the data
TrainSet <- sort(sample.int(n, 5 * n / 6))

# The remaining data will be our test set
TestSet <- setdiff(1:n, TrainSet)
```

### 3. Training the Model

Now we can run the main `AddiVortes` function on our training data. We will specify several parameters for the algorithm, such as the number of iterations and trees.

```{r}
# Run the AddiVortes algorithm on the training data
results <- AddiVortes(y = Y_Boston[TrainSet], 
                      x = X_Boston[TrainSet, ],
                      m = 200, 
                      totalMCMCIter = 2000, 
                      mcmcBurnIn = 200, 
                      nu = 6, 
                      q = 0.85, 
                      k = 3, 
                      sd = 0.8, 
                      Omega = 3, 
                      LambdaRate = 25,
                      IntialSigma = "Linear",
                      showProgress = FALSE)
```

### 4. Making Predictions and Evaluating Performance

With a trained model object, we can now make predictions on our unseen test data. We will then calculate the Root Mean Squared Error (RMSE) to see how well the model performed.

```{r}
# Generate predictions on the test set
preds <- predict(results,
                 X_Boston[TestSet, ],
                 showProgress = FALSE)

# The RMSE is contained in the results object
cat("In-Sample RMSE:", results$inSampleRmse, "\n")

# Calculate the Root Mean Squared Error (RMSE) for the test set
rmse <- sqrt(mean((Y_Boston[TestSet] - preds)^2))
cat("Test Set RMSE:", rmse, "\n")
```

### 5. Visualising the Results

Finally, a good way to assess the model is to plot the predicted values against the true values. For a perfect model, all points would lie on the equality line (y = x). We will also plot the prediction intervals to visualise the model's uncertainty.

```{r, fig.width=6, fig.height=6, fig.align='center'}
# Plot true values vs. predicted values
plot(Y_Boston[TestSet],
     preds,
     xlab = "True Values",
     ylab = "Predicted Values",
     main = "AddiVortes Predictions vs True Values",
     xlim = range(c(Y_Boston[TestSet], preds)),
     ylim = range(c(Y_Boston[TestSet], preds)),
     pch = 19, col = "darkblue"
)

# Add the line of equality (y = x) for reference
abline(a = 0, b = 1, col = "darkred", lwd = 2)

# Get quantile predictions to create error bars/intervals
preds_quantile <- predict(results,
                          X_Boston[TestSet, ],
                          "quantile",
                          showProgress = FALSE)

# Add error segments for each prediction
for (i in 1:nrow(preds_quantile)) {
  segments(Y_Boston[TestSet][i], preds_quantile[i, 1],
           Y_Boston[TestSet][i], preds_quantile[i, 2],
           col = "darkblue", lwd = 1
  )
}
legend("topleft", legend=c("Prediction", "y=x Line", "95% Interval"), 
       col=c("darkblue", "darkred", "darkblue"),
       lty=1, pch=c(19, NA, NA), lwd=2)

```