--- title: "Introduction to bayesiansurpriser" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to bayesiansurpriser} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, fig.alt = "Map of North Carolina counties shaded by Bayesian surprise values." ) ``` ## Overview The `bayesiansurpriser` package implements Bayesian Surprise methodology for de-biasing thematic maps and data visualizations. This technique, based on Correll & Heer's "Surprise! Bayesian Weighting for De-Biasing Thematic Maps" (IEEE InfoVis 2016), helps identify truly surprising patterns in data by accounting for common cognitive biases. ## The Problem: Cognitive Biases in Data Visualization When viewing thematic maps or data visualizations, viewers often fall prey to three cognitive biases: 1. **Base rate bias**: Ignoring that larger populations naturally produce more events 2. **Sampling error bias**: Treating small-sample estimates as equally reliable as large-sample ones 3. **Renormalization bias**: Difficulty comparing rates across different scales ## The Solution: Bayesian Surprise Bayesian Surprise measures how much our beliefs change after observing data. Mathematically, it computes the KL-divergence between prior and posterior distributions across a space of models: $$\text{Surprise} = D_{KL}(P(M|D) \| P(M)) = \sum_i P(M_i|D) \log \frac{P(M_i|D)}{P(M_i)}$$ Higher surprise indicates observations that substantially change our beliefs. ## Quick Start ```{r setup} library(bayesiansurpriser) library(sf) library(ggplot2) ``` ### Basic Usage with sf Objects ```{r basic-sf} # Load North Carolina SIDS data nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE) # Compute surprise: observed SIDS cases vs expected (births) result <- surprise(nc, observed = SID74, expected = BIR74) # View the result print(result) ``` ### Plotting Results ```{r plot-ggplot} # Plot surprise values with ggplot2 ggplot(result) + geom_sf(aes(fill = surprise)) + scale_fill_surprise() + labs(title = "Bayesian Surprise: NC SIDS Data (1974)") ``` ```{r plot-signed} # Plot signed surprise (shows direction of deviation) ggplot(result) + geom_sf(aes(fill = signed_surprise)) + scale_fill_surprise_diverging() + labs(title = "Signed Surprise: Over/Under-representation") ``` ## Understanding the Output The `surprise()` function returns an object containing: - **surprise**: Non-negative values indicating magnitude of surprise (in bits) - **signed_surprise**: Positive for over-representation, negative for under-representation - **model_space**: The models used and their posterior weights - **posteriors**: Per-observation posterior distributions ```{r output-details} # Access surprise values directly get_surprise(result, "surprise")[1:5] # Access the model space get_model_space(result) ``` ## Customizing Models By default, `surprise()` uses three models: - Uniform: All regions equally likely - Base Rate: Regions proportional to expected values - de Moivre Funnel: Accounts for sampling variance You can customize the model space: ```{r custom-models} # Create custom model space custom_space <- model_space( bs_model_uniform(), bs_model_baserate(nc$BIR74), bs_model_gaussian(), prior = c(0.2, 0.5, 0.3) # Custom prior weights ) result_custom <- surprise(nc, observed = SID74, expected = BIR74, models = custom_space) print(result_custom) ``` ## Next Steps - See `vignette("model-types")` for details on all five model types - See `vignette("sf-workflow")` for advanced spatial workflows - See `vignette("ggplot2-visualization")` for visualization options - See `vignette("temporal-analysis")` for time series and streaming data