--- title: "Core Concepts and Architecture" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Core Concepts and Architecture} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` ## Overview SafeMapper achieves fault tolerance through three core mechanisms: 1. **Fingerprinting** - Uniquely identifies each computational task 2. **Checkpointing** - Periodically saves intermediate results 3. **Auto-Recovery** - Seamlessly resumes interrupted tasks This article provides an in-depth explanation of how these mechanisms work. ```{r} library(SafeMapper) ``` ## Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ SafeMapper System Architecture │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ User Code Layer │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ s_map() s_map2() s_pmap() s_future_map() s_walk() ... │ │ │ └─────────────────────────────┬────────────────────────────────────┘ │ │ │ │ │ Core Engine Layer ▼ │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ .safe_execute() │ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ │ │Fingerprint │ │ Checkpoint │ │ Batch │ │ Error │ │ │ │ │ │ Generator │ │ Manager │ │ Processor │ │ Retry │ │ │ │ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │ │ └─────────────────────────────┬────────────────────────────────────┘ │ │ │ │ │ Storage Layer ▼ │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ R User Cache Directory (~/.cache/R/SafeMapper/) │ │ │ │ └── checkpoints/ │ │ │ │ ├── session_abc123.rds │ │ │ │ └── session_def456.rds │ │ │ └──────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ## 1. Fingerprinting Mechanism ### What is a Fingerprint? A fingerprint is a unique identifier that identifies a specific computational task. SafeMapper automatically generates fingerprints by analyzing input data characteristics. ### Fingerprint Generation Flow ``` ┌─────────────────────────────────────────────────────────────────┐ │ Fingerprint Generation │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Input Data │ │ [1, 2, 3, ..., 1000] │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────┐ │ │ │ Extract Features │ │ │ │ ┌─────────────────────────────────────┐ │ │ │ │ │ mode: "map" │ │ │ │ │ │ length: 1000 │ │ │ │ │ │ class: "numeric" │ │ │ │ │ │ first: 1 │ │ │ │ │ │ last: 1000 │ │ │ │ │ └─────────────────────────────────────┘ │ │ │ └───────────────────┬─────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────┐ │ │ │ xxhash64 Hash Calculation │ │ │ └───────────────────┬─────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────┐ │ │ │ Fingerprint: "map_7a3b9c2d1e8f4a5b" │ │ │ └─────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Feature Selection Rationale ```{r} # These two calls will generate the same fingerprint data <- 1:100 result1 <- s_map(data, ~ .x^2) # If re-run immediately, same fingerprint will be detected # This call generates a different fingerprint (different data) data2 <- 1:200 result2 <- s_map(data2, ~ .x^2) ``` **Why not hash the entire dataset?** - Complete hashing of large datasets is very time-consuming - Using features (length, first/last elements, type) enables fast, stable fingerprinting - In practice, this approach is sufficient to distinguish different tasks ### Custom Session IDs For more precise control, you can manually specify a session ID: ```{r} # Use custom session ID result <- s_map(1:20, ~ .x^2, .session_id = "my_custom_session") # This ensures that even tasks with similar data features won't conflict ``` ## 2. Checkpointing Mechanism ### Checkpoint Data Structure ``` ┌─────────────────────────────────────────────────────────────────┐ │ Checkpoint File Structure │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ checkpoint_file.rds │ │ │ │ │ ├── results: list() │ │ │ └── [1], [2], [3], ..., [completed items] │ │ │ │ │ └── metadata: list() │ │ ├── session_id: "map_7a3b9c2d..." │ │ ├── total_items: 1000 │ │ ├── completed_items: 500 │ │ ├── mode: "map" │ │ ├── created: "2026-01-23 10:30:00" │ │ └── last_updated: "2026-01-23 10:35:00" │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Checkpoint Save Timing ``` ┌─────────────────────────────────────────────────────────────────┐ │ Batch Processing & Checkpoints │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Data: [1, 2, 3, ..., 1000] Batch Size: 100 │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Batch 1: [1-100] │ │ │ │ │ │ │ │ │ ├── Complete ──────────► 💾 Save checkpoint (100) │ │ │ │ ▼ │ │ │ │ Batch 2: [101-200] │ │ │ │ │ │ │ │ │ ├── Complete ──────────► 💾 Save checkpoint (200) │ │ │ │ ▼ │ │ │ │ Batch 3: [201-300] │ │ │ │ │ │ │ │ │ └── ❌ INTERRUPTED! │ │ │ │ │ │ │ │ 💾 Checkpoint saved: 200 items completed │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Configuring Batch Size ```{r} # Default batch size is 100 # For fast operations, increase batch size to reduce I/O s_configure(batch_size = 200) # For slow operations (like API calls), decrease batch size for more frequent saves s_configure(batch_size = 10) # Reset to defaults s_configure(batch_size = 100) ``` ## 3. Auto-Recovery Mechanism ### Recovery Flow ``` ┌─────────────────────────────────────────────────────────────────┐ │ Auto-Recovery Flow │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ s_map(data, func) is called │ │ │ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ Generate Fingerprint│ │ │ │ "map_7a3b9c2d..." │ │ │ └──────────┬──────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ │ Checkpoint Exists? │── Yes ─►│ Validate Checkpoint │ │ │ └──────────┬──────────┘ │ - Data length match?│ │ │ │ │ - File not corrupt? │ │ │ │ No └──────────┬──────────┘ │ │ │ │ │ │ ▼ │ Valid │ │ ┌─────────────────────┐ │ │ │ │ Start Fresh │ ▼ │ │ │ start_idx = 1 │ ┌─────────────────────┐ │ │ └──────────┬──────────┘ │ Resume Progress │ │ │ │ │ "Resuming from 200" │ │ │ │ │ start_idx = 201 │ │ │ │ └──────────┬──────────┘ │ │ │ │ │ │ └───────────────┬───────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ Continue Batch │ │ │ │ Processing │ │ │ └──────────┬──────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ Delete Checkpoint │ │ │ │ Return Results │ │ │ └─────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Recovery Demo ```{r} # Simulate a task that might be interrupted simulate_task <- function(x) { Sys.sleep(0.01) x^2 } # First run result <- s_map(1:30, simulate_task, .session_id = "recovery_demo") # If task is interrupted, simply re-run the same code: # result <- s_map(1:30, simulate_task, .session_id = "recovery_demo") # Output: "Resuming from item XX/30" ``` ## 4. Error Retry Mechanism ### Retry Flow ``` ┌─────────────────────────────────────────────────────────────────┐ │ Error Retry Mechanism │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Processing Batch [101-200] │ │ │ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ Attempt 1 │ │ │ │ ❌ Error: Timeout │ │ │ └──────────┬──────────┘ │ │ │ │ │ ▼ Wait 1 second │ │ ┌─────────────────────┐ │ │ │ Attempt 2 │ │ │ │ ❌ Error: Server Busy│ │ │ └──────────┬──────────┘ │ │ │ │ │ ▼ Wait 1 second │ │ ┌─────────────────────┐ │ │ │ Attempt 3 │ │ │ │ ✅ Success! │ │ │ └──────────┬──────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ Save Checkpoint │ │ │ │ Continue Next Batch │ │ │ └─────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Configure Retry Attempts ```{r} # For unstable network environments, increase retry attempts s_configure(retry_attempts = 5) # For stable local computations, reduce retry attempts s_configure(retry_attempts = 1) # Reset to default s_configure(retry_attempts = 3) ``` ## 5. Storage Location SafeMapper uses R's standard user cache directory to store checkpoints: ```{r} # Checkpoint storage location (varies by system) # Linux: ~/.cache/R/SafeMapper/checkpoints/ # macOS: ~/Library/Caches/org.R-project.R/R/SafeMapper/checkpoints/ # Windows: %LOCALAPPDATA%/R/cache/R/SafeMapper/checkpoints/ ``` ## Complete Execution Flow ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ SafeMapper Complete Execution Flow │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ User Invocation │ │ │ │ s_map(data, func, .session_id = NULL) │ │ │ └───────────────────────────────┬───────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ 1. Generate Fingerprint │ │ │ │ session_id = .make_fingerprint(data, "map") │ │ │ └───────────────────────────────┬───────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ 2. Try Recovery │ │ │ │ restored = .try_restore(session_id, length(data)) │ │ │ │ ├── Has checkpoint ──► results = restored$results │ │ │ │ │ start_idx = completed_items + 1 │ │ │ │ └── No checkpoint ───► results = vector("list", n) │ │ │ │ start_idx = 1 │ │ │ └───────────────────────────────┬───────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ 3. Batch Processing Loop │ │ │ │ for batch in batches(start_idx:n): │ │ │ │ │ │ │ │ │ ├── Show progress: "[XX%] Processing items X-Y of N" │ │ │ │ │ │ │ │ │ ├── Execute batch (with retry) │ │ │ │ │ batch_results = .execute_batch_with_retry(...) │ │ │ │ │ │ │ │ │ ├── Store results │ │ │ │ │ results[batch_indices] = batch_results │ │ │ │ │ │ │ │ │ └── Save checkpoint │ │ │ │ .save_checkpoint(session_id, results, ...) │ │ │ └───────────────────────────────┬───────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ 4. Completion │ │ │ │ .cleanup_checkpoint(session_id) # Delete checkpoint │ │ │ │ message("Completed N items") │ │ │ │ return(.format_output(results, output_type)) │ │ │ └───────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ## Design Principles SafeMapper follows these design principles: ``` ┌─────────────────────────────────────────────────────────────────┐ │ Design Principles │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 1. Zero Configuration First │ │ ├── Works out of the box, no setup required │ │ └── All configuration is optional │ │ │ │ 2. Non-Invasive │ │ ├── API fully compatible with purrr/furrr │ │ ├── Just change function name, no code restructuring │ │ └── Can switch back to native purrr anytime │ │ │ │ 3. Transparent Operation │ │ ├── Checkpoints managed automatically │ │ ├── Users don't need to worry about details │ │ └── Automatic cleanup after success │ │ │ │ 4. Safe and Reliable │ │ ├── Save per batch, minimize data loss │ │ ├── Automatic error retry │ │ └── Corrupted checkpoints safely ignored │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ## Next Steps - 🔧 [Map Functions](map-functions.html) - Learn all mapping functions in detail - ⚡ [Parallel Processing](parallel-processing.html) - Speed up with future - 🛡️ [Error Handling](error-handling.html) - Build more robust code