--- title: "Model fitting and producing short-term forecasts using daily data with EpiEstim" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Model fitting and producing short-term forecasts using daily data with EpiEstim} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r} library(dplyr) library(ViroReportR) library(ggplot2) library(purrr) library(tidyr) ``` ## Overview This vignette provides a short demonstration of the model fitting functions in `ViroReportR` which extend the functionality and wrap the `estimate_R` function from the `EpiEstim` package to produce short-term forecasts. ## Loading data For this short demonstration, we will simulate daily data for Influenza A, RSV and Covid-19, using the function `simulate_data` which is provided with the `ViroReportR` package. The input dataset must have two columns: `date` and `confirm` in accordance to format accepted by the model fitting functions. ```{r } data <- simulate_data(days=365, #days spanning simulation peaks = c("flu_a"=90,"rsv"=110,"sars_cov2"=160), #peak day for each disease amplitudes=c("flu_a"=50,"rsv"=40,"sars_cov2"=20), #amplitude of peak for each disease scales = c("flu_a"=-0.004,"rsv"=-0.005,"sars_cov2"=-0.001), # spread of peak for each disease time_offset = 0, #number of days to offset start of simulation noise_sd = 5, #noise term start_date = "2024-01-07" #starting day (Sunday) ) ``` ```{r} data$date <- lubridate::ymd(data$date) vri_data <- data %>% pivot_longer(cols = -date, # all columns except 'date' names_to = "disease_type", values_to = "confirm") # <-- need ot be called confirm # VRI data set-up vri_name_list <- vri_data %>% dplyr::group_by(disease_type) %>% dplyr::group_keys() %>% dplyr::pull() vri_data_list <- vri_data %>% dplyr::group_by(disease_type) %>% dplyr::group_map(~.x) names(vri_data_list) <- vri_name_list ``` ## Model fitting and forecasting The code below estimates the reproduction number using `EpiEstim` through the `generate_forecast()` function for each disease type via the `purrr::map2()` function. The `generate_forecast()` function prepares the data, estimates the reproduction number, and produces short-term forecasts of daily confirmed cases for an `n_days` forecast horizon. ```{r} #parameters set-up start_date <- min(vri_data$date) + 13 forecast_horizon <- 7 smooth <- FALSE validate_window_size <- 7 forecasts_results <- tibble( vri_data_list, forecasts = map2( vri_data_list, vri_name_list, ~ generate_forecast( data = .x, smooth_data = smooth, type = .y, n_days = forecast_horizon, start_date = start_date ) ) ) names(forecasts_results$forecasts) <- vri_name_list names(forecasts_results$vri_data_list) <- vri_name_list ``` The function supports forecasting for SARS-CoV2, RSV and Influenza A primarily but also provides functionality to forecast other respiratory diseases. Any other respiratory disease can be specified by setting `type = other`. If `type = other` is set, a user-specified `mean_si` and `std_si` must be passed to the function as well. ## Validation The package provides validation functionality to support predictive performance evaluation. The forecasting model is trained on historical data and used to predict daily cases r `forecast_horizon` days ahead. Validation compares the model’s forecasted values with observed historical data, where overlap between prediction intervals and actual cases indicates good model performance. A rolling window approach is applied to the historical data to generate multiple validation datasets. ```{r validation-plots, echo=FALSE, results='asis'} validation_results <- tibble( vri_data_list, validation = map2( vri_data_list, vri_name_list, ~ generate_validation( data = .x, type = .y, n_days = forecast_horizon, start_date = min(.x$date, na.rm = T), validate_window_size = validate_window_size, window_size = 7, smooth_data = smooth, smoothing_cutoff = 10 ) ) ) names(validation_results$validation) <- vri_name_list names(validation_results$vri_data_list) <- vri_name_list for (vri in vri_name_list) { cat("\n\n") # spacing cat(glue::glue("#### {vri}\n\n")) # subheading for each disease print( plot_validation( data = vri_data_list[[vri]], validation_results$validation[[vri]], pred_plot = "ribbon" ) ) cat("\n\n") } ``` ## Forecast Report Finally, the `ViroReportR` package can conveniently generate an automated report for the current season for all supported viral respiratory diseases (Influenza-A, RSV and SARS-CoV2) using the `generate_forecast_report` function, which renders an HTML report. ```{r eval = FALSE} tmp_dir <- tempdir() # Save the simulated data data_path <- file.path(tmp_dir, "simulated_data.csv") write.csv(vri_data, data_path, row.names = FALSE) output_path <- tempdir() generate_forecast_report(input_dir = data_path, output_dir = output_path, n_days = 7, validate_window_size = 7, smooth = TRUE) cat("Report saved to:", output_path, "\n") if (interactive() && file.exists(output_path)) { utils::browseURL(output_path) } ```