---
title: "Basic Examples"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Basic Examples}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(nadir)
```

Let's start with an extremely simple example: a prediction problem 
on a continuous outcome, where we want to use cross-validation to 
minimize the expected risk/loss on held out data across a few different models. 

We'll use the `iris` dataset to do this. 

`nadir::super_learner()` strives to keep the syntax simple, so the simplest
call to `super_learner()` might look something like this:

```{r}
super_learner(
  data = iris,
  formula = Petal.Width ~ Petal.Length + Sepal.Length + Sepal.Width,
  learners = list(lnr_lm, lnr_rf, lnr_earth, lnr_mean))
```

Notice what it returns:  A function of `newdata` that predicts 
across the learners, sums up according to the learned weights, and returns
the ensemble predictions. 

We can store that learned predictor function and use it: 

```{r}
# We recommend storing more complicated arguments used repeatedly to simplify 
# the call to super_learner()
petal_formula <- Petal.Width ~ Petal.Length + Sepal.Length + Sepal.Width
learners <- list(lnr_lm, lnr_rf, lnr_earth, lnr_mean)

sl_model <- super_learner(
  data = iris,
  formula = petal_formula,
  learners = learners)
```

In particular, we can use it to predict on the same dataset, 

```{r}
predict(sl_model, iris) |> head()
```

On a random sample of it, 

```{r}
predict(sl_model, iris[sample.int(size = 10, n = nrow(iris)), ]) |> 
  head()
```

Or on completely new data.

```{r}
fake_iris_data <- data.frame()
fake_iris_data <- cbind.data.frame(
  Sepal.Length = 
  rnorm(
    n = 6,
    mean = mean(iris$Sepal.Length),
    sd = sd(iris$Sepal.Length)
  ),

Sepal.Width = 
  rnorm(
    n = 6,
    mean = mean(iris$Sepal.Width),
    sd = sd(iris$Sepal.Width)
  ),

Petal.Length = 
  rnorm(
    n = 6,
    mean = mean(iris$Petal.Length),
    sd = sd(iris$Petal.Length)
  )
)

predict(sl_model, fake_iris_data) |> 
  head()
```

## Getting More Information Out

If we want to know a lot more about the `super_learner()` process, 
how it weighted the candidate learners, what the candidate learners predicted
on the held-out data, etc., then we will want to look at the other 
metadata contained in the `nadir_sl_model` object produced:
option. 

```{r}
sl_model_iris <- super_learner(
  data = iris,
  formula = petal_formula,
  learners = learners)

str(sl_model_iris, max.level = 2)
```

To put some description to what's contained in the 
output from `super_learner()`: 

  * A prediction function, `$predict()` that takes `newdata`
  * Some character fields like `$y_variable` and `$outcome_type` to 
  provide some context to the learning task that was performed.
  * `$learner_weights` that indicate what weight the different 
  candidate learners were given
  * `$holdout_predictions`: A data.frame of predictions from each of the 
  candidate learners, along with the actual outcome from the held-out data. 

We can call `compare_learners()` on the verbose output from `super_learner()` if
we want to assess how the different learners performed. We can also call 
`cv_super_learner()` with the same arguments as `super_learner()` to wrap 
the `super_learner()` call in another layer of cross-validation to assess
how `super_learner()` performs on held-out data. 

```{r}
compare_learners(sl_model_iris)

cv_super_learner(
  data = iris, 
  formula = petal_formula,
  learners = learners)$cv_loss
```

We can, of course, do anything with a super learned model that we would 
do with a conventional prediction model, like calculating performance statistics
like $R^2$. 

```{r}
var_residuals <- var(iris$Sepal.Length - predict(sl_model_iris, iris))
total_variance <- var(iris$Sepal.Length)
variance_explained <- total_variance - var_residuals 

rsquared <- variance_explained / total_variance
print(rsquared)
```