---
title: "Synchronization Protocol"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Synchronization Protocol}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r}
#| include: false
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r}
#| label: setup
library(automerge)
```

## Overview

Automerge's synchronization protocol enables multiple peers to exchange changes and converge to the same document state. The protocol is efficient, handling only missing changes between peers, and works over any transport (files, HTTP, WebSockets, etc.).

## High-Level Sync (Recommended)

For most use cases, use the high-level sync helpers that handle the protocol automatically.

### Bidirectional Sync

The simplest way to synchronize two documents:

```{r}
# Create two peers with different data
peer1 <- am_create()
peer1[["edited_by"]] <- "peer1"
peer1[["data1"]] <- 100
am_commit(peer1, "Peer1 changes")

peer2 <- am_create()
peer2[["edited_by"]] <- "peer2"
peer2[["data2"]] <- 200
am_commit(peer2, "Peer2 changes")

# Automatic bidirectional sync (documents modified in place)
rounds <- am_sync(peer1, peer2)

rounds

# Both documents now have all data
peer1[["data1"]]
peer1[["data2"]]
peer2[["data1"]]
peer2[["data2"]]
```

**When to use**: Simple peer-to-peer synchronization where both sides need all changes.

### One-Way Merge

For pulling changes from one document to another without sending changes back:

```{r}
# Create source document
source <- am_create()
source[["version"]] <- "2.0"
source[["features"]] <- list("auth", "api")
am_commit(source)

# Create target document
target <- am_create()
target[["version"]] <- "1.0"
target[["development"]] <- TRUE
am_commit(target)

# Pull changes from source to target
am_merge(target, source)

# Target now has source's changes
target[["version"]]

# Source is unchanged
names(source)
```

**When to use**: Client pulling updates from server, importing changes from another peer.

## Low-Level Sync Protocol

For network protocols or fine-grained control, use the low-level API.

### Basic Protocol Flow

```{r}
# Create two peers
peer3 <- am_create()
peer3[["source"]] <- "peer3"
am_commit(peer3)

peer4 <- am_create()
peer4[["source"]] <- "peer4"
am_commit(peer4)

# Each peer maintains its own sync state
sync3 <- am_sync_state_new()
sync4 <- am_sync_state_new()

# Exchange messages until converged
round <- 0
repeat {
  round <- round + 1

  # Peer 3 generates message
  msg3 <- am_sync_encode(peer3, sync3)
  if (is.null(msg3)) {
    break
  } # No more messages

  # Peer 4 receives and processes
  am_sync_decode(peer4, sync4, msg3)

  # Peer 4 generates response
  msg4 <- am_sync_encode(peer4, sync4)
  if (is.null(msg4)) {
    break
  } # No more messages

  # Peer 3 receives and processes
  am_sync_decode(peer3, sync3, msg4)

  if (round > 100) {
    warning("Sync did not converge in 100 rounds")
    break
  }
}

round
peer3[["source"]]
peer4[["source"]]
```

### Protocol Components

**Sync State** (`am_sync_state_new()`):

- Tracks what each peer knows
- Document-independent (can be reused)
- Should be preserved across sessions for efficiency

**Sync Encode** (`am_sync_encode(doc, sync_state)`):

- Generates sync message to send to peer
- Returns `NULL` when no more changes to send
- Message is binary (raw vector)

**Sync Decode** (`am_sync_decode(doc, sync_state, message)`):

- Receives sync message from peer
- Updates document and sync state
- Returns document invisibly for chaining

## Change-Based Synchronization

For scenarios where you want explicit control over individual changes:

### Exporting and Importing Changes

```{r}
# Create base document that will be shared
base_doc <- am_create()
base_doc[["v1"]] <- "first"
am_commit(base_doc, "Version 1")

# Fork to create two peers with shared history
peer_a <- am_fork(base_doc)
peer_b <- am_fork(base_doc)

# Remember the fork point
fork_heads <- am_get_heads(peer_a)

# Peer A makes changes
peer_a[["v2"]] <- "second"
am_commit(peer_a, "Version 2")

peer_a[["v3"]] <- "third"
am_commit(peer_a, "Version 3")

# Export changes since fork point
changes <- am_get_changes(peer_a, fork_heads)

length(changes)

# Peer B applies the changes
am_apply_changes(peer_b, changes)

# Verify both peers are in sync
peer_b[["v2"]]
peer_b[["v3"]]
```

### Saving Changes to Files

```{r}
# Save individual changes to files
temp_dir <- tempdir()
for (i in seq_along(changes)) {
  change_file <- file.path(temp_dir, sprintf("change_%03d.bin", i))
  writeBin(changes[[i]], change_file)
}

# Later: load and apply changes
loaded_changes <- list()
for (i in seq_along(changes)) {
  change_file <- file.path(temp_dir, sprintf("change_%03d.bin", i))
  loaded_changes[[i]] <- readBin(change_file, "raw", file.size(change_file))
}

# Apply loaded changes to a peer
peer_c <- am_fork(base_doc)
am_apply_changes(peer_c, loaded_changes)

# Verify
peer_c[["v3"]]
```

**When to use**: Git-like workflows, change logs, selective sync.

## Document Heads and History

### Understanding Heads

**Heads** are the most recent commits in a document's history. They represent the current state's frontier in the change graph.

```{r}
# Create document and make changes
doc_main <- am_create()
doc_main[["version"]] <- "1.0"
am_commit(doc_main, "Initial version")

# Get heads (fingerprint of current state)
heads_v1 <- am_get_heads(doc_main)
length(heads_v1)

# Make more changes
doc_main[["version"]] <- "2.0"
doc_main[["feature"]] <- "new"
am_commit(doc_main, "Version 2")

heads_v2 <- am_get_heads(doc_main)
identical(heads_v1, heads_v2)
```

Heads are useful for:

- Tracking which changes you've already seen
- Computing incremental updates
- Detecting whether documents have diverged

### Working with History

```{r}
# Get full change history
history <- am_get_history(doc_main)
length(history)

# History includes all commits ever made
# Each entry contains metadata about a commit

# Get changes between two points in history
changes_since_v1 <- am_get_changes(doc_main, heads_v1)
str(changes_since_v1)
```

### Incremental Sync Pattern

Common pattern: track last sync point and only send new changes

```{r}
# Server document that accumulates changes
server <- am_create()
server[["users"]] <- 0L
am_commit(server, "Initialize")

# Client syncs and remembers server state
client <- am_fork(server)
last_sync_heads <- am_get_heads(server)

# Client makes local changes
client[["users"]] <- 1L
client[["local_cache"]] <- TRUE
am_commit(client, "Client updates")

# Server receives changes from other clients
server[["users"]] <- 5L
server[["server_config"]] <- "production"
am_commit(server, "Server updates")

# Client pulls only new server changes since last sync
new_changes <- am_get_changes(server, last_sync_heads)
length(new_changes)

am_apply_changes(client, new_changes)

# Update sync point
last_sync_heads <- am_get_heads(server)

# Client now has server's changes
client[["server_config"]]

# Server can also pull client's changes
client_changes <- am_get_changes(client, am_get_heads(server))
am_apply_changes(server, client_changes)

server[["local_cache"]]
```

### Detecting Divergence

```{r}
# Create two peers that will diverge
peer_x <- am_create()
peer_x[["id"]] <- "x"
am_commit(peer_x)

peer_y <- am_fork(peer_x)

# Remember common state
common_heads <- am_get_heads(peer_x)

# Both make different changes
peer_x[["data_x"]] <- "from X"
am_commit(peer_x, "X's change")

peer_y[["data_y"]] <- "from Y"
am_commit(peer_y, "Y's change")

# Check if they've diverged
x_heads <- am_get_heads(peer_x)
y_heads <- am_get_heads(peer_y)

identical(x_heads, y_heads)

# Get each peer's changes since common state
x_changes <- am_get_changes(peer_x, common_heads)
y_changes <- am_get_changes(peer_y, common_heads)

str(x_changes)
str(y_changes)

# Sync to merge divergent histories
rounds <- am_sync(peer_x, peer_y)
rounds

# After sync, heads are identical again
identical(am_get_heads(peer_x), am_get_heads(peer_y))
```

## Concurrent Edits

Understanding how concurrent edits merge:

```{r}
# Create shared starting point
base <- am_create()
base[["counter"]] <- am_counter(0)
base[["status"]] <- "draft"
am_commit(base)

# Fork to two editors
editor1 <- am_fork(base)
editor2 <- am_fork(base)

# Concurrent edits
# Editor 1: increment counter, change status
am_counter_increment(editor1, AM_ROOT, "counter", 5)
editor1[["status"]] <- "review"
am_commit(editor1, "Editor 1 changes")

# Editor 2: increment counter differently, change status differently
am_counter_increment(editor2, AM_ROOT, "counter", 3)
editor2[["status"]] <- "published"
am_commit(editor2, "Editor 2 changes")

# Sync editors
rounds <- am_sync(editor1, editor2)

# Counter: Both increments sum (CRDT)
editor1[["counter"]]

# Status: Deterministic conflict resolution (one value wins)
editor1[["status"]]
```

## Sync Performance

### The Sync Frequency Tradeoff

There's a fundamental tradeoff between *collaboration latency* and *efficiency*:

```{r}
# Strategy A: Frequent sync (lower latency, more overhead)
doc_frequent <- am_create()
peer_frequent <- am_create()

sync_count_frequent <- 0
for (i in 1:10) {
  doc_frequent[[paste0("k", i)]] <- i
  am_commit(doc_frequent, paste("Change", i))

  rounds <- am_sync(doc_frequent, peer_frequent)
  sync_count_frequent <- sync_count_frequent + 1
}

# 10 separate commits, 10 sync operations
sync_count_frequent

# Strategy B: Batched commits (higher latency, less overhead)
doc_batched <- am_create()
peer_batched <- am_create()

# Batch all changes into one commit
for (i in 1:10) {
  doc_batched[[paste0("k", i)]] <- i
}
am_commit(doc_batched, "Batch of 10 changes")

rounds <- am_sync(doc_batched, peer_batched)

# 1 commit, 1 sync operation
rounds

# Compare document sizes
length(am_save(doc_frequent))
length(am_save(doc_batched))
```

**Batching creates smaller documents** because:

- Fewer commit metadata entries
- Better compression of related changes
- Less overhead from commit boundaries

### Choosing a Strategy

**Use frequent syncs when:**

- Multiple users editing simultaneously
- Low-latency collaboration is critical (real-time editing)
- Users need to see each other's changes quickly
- Working over reliable, fast networks

**Use batched commits when:**

- Single user making many changes
- Optimizing for storage/bandwidth
- Syncing over slow or metered connections
- Changes are logically related (e.g., form submission)

**Hybrid approach** (recommended for most cases):

```{r}
# Batch related changes, sync periodically
doc_hybrid <- am_create()

# User fills out a form (batch these)
doc_hybrid[["name"]] <- "Alice"
doc_hybrid[["email"]] <- "alice@example.com"
doc_hybrid[["preferences"]] <- list("theme" = "dark")
am_commit(doc_hybrid, "Update user profile")

# Sync after logical unit of work
# result <- sync_with_server(doc_hybrid)

# Later: another logical unit
doc_hybrid[["last_login"]] <- Sys.time()
am_commit(doc_hybrid, "Record login")

# Sync again
# result <- sync_with_server(doc_hybrid)
```

### Reusing Sync State

Sync state tracks what each peer has seen, enabling efficient incremental sync:

```{r}
# Without reusing sync state (inefficient)
doc_no_reuse <- am_create()
doc_no_reuse[["v1"]] <- 1
am_commit(doc_no_reuse)

peer_no_reuse <- am_create()

# First sync
rounds1 <- am_sync(doc_no_reuse, peer_no_reuse)
rounds1

# Add more changes
doc_no_reuse[["v2"]] <- 2
am_commit(doc_no_reuse)

# Second sync (creates new sync state, may resend some data)
rounds2 <- am_sync(doc_no_reuse, peer_no_reuse)
rounds2

# With reusing sync state (efficient)
doc_reuse <- am_create()
doc_reuse[["v1"]] <- 1
am_commit(doc_reuse)

peer_reuse <- am_create()
sync_state_local <- am_sync_state_new()
sync_state_peer <- am_sync_state_new()

# Manual sync with persistent state
msg1 <- am_sync_encode(doc_reuse, sync_state_local)
am_sync_decode(peer_reuse, sync_state_peer, msg1)
msg2 <- am_sync_encode(peer_reuse, sync_state_peer)

if (!is.null(msg2)) {
  am_sync_decode(doc_reuse, sync_state_local, msg2)
}

# Add more changes
doc_reuse[["v2"]] <- 2
am_commit(doc_reuse)

# Second sync reuses state (only sends new changes)
msg3 <- am_sync_encode(doc_reuse, sync_state_local)
am_sync_decode(peer_reuse, sync_state_peer, msg3)

# Sync state remembers what was already exchanged
```

**When to persist sync state:**

- Long-lived connections (WebSocket servers)
- Periodic sync jobs (cron, scheduled tasks)
- Client apps that reconnect frequently
- Reduces redundant data transfer significantly

### Measuring Performance

```{r}
# Compare efficiency of different approaches
measure_sync <- function(n_changes, batch_size) {
  doc <- am_create()
  peer <- am_create()

  changes_made <- 0
  syncs_performed <- 0

  for (i in seq_len(n_changes)) {
    doc[[paste0("k", i)]] <- i
    changes_made <- changes_made + 1

    # Commit every batch_size changes
    if (changes_made %% batch_size == 0) {
      am_commit(doc, sprintf("Batch at %d", changes_made))
      rounds <- am_sync(doc, peer)
      syncs_performed <- syncs_performed + 1
    }
  }

  # Final commit if needed
  if (changes_made %% batch_size != 0) {
    am_commit(doc, "Final batch")
    rounds <- am_sync(doc, peer)
    syncs_performed <- syncs_performed + 1
  }

  list(
    changes = changes_made,
    syncs = syncs_performed,
    size = length(am_save(doc))
  )
}

# No batching: commit and sync after every change
result_no_batch <- measure_sync(20, 1)

# Medium batching: commit every 5 changes
result_medium <- measure_sync(20, 5)

# Full batching: single commit at end
result_full <- measure_sync(20, 20)

# Compare results
result_no_batch$syncs
result_medium$syncs
result_full$syncs

# Document sizes
result_no_batch$size
result_medium$size
result_full$size
```

## Next Steps

- Learn about [CRDT Concepts](crdt-concepts.html) to understand merge behavior
- Check the [Quick Reference](quick-reference.html) for all sync functions

## Further Reading

- [Local-First Software](https://www.inkandswitch.com/local-first/)