--- title: "Understanding CRDTs in Automerge" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Understanding CRDTs in Automerge} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r} #| include: false knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r} #| label: setup library(automerge) ``` ## What are CRDTs? Conflict-free Replicated Data Types (CRDTs) are data structures that can be replicated across multiple computers and modified independently without coordination. The key property is that when all changes are eventually synchronized, all replicas converge to the same state automatically - without manual conflict resolution. ### The Problem CRDTs Solve Consider a simple example without CRDTs: ``` Initial state: counter = 0 Alice's device (offline): counter = counter + 1 → counter = 1 Bob's device (offline): counter = counter + 2 → counter = 2 When they sync: - If we use "last write wins": counter = 2 (Alice's change lost!) - If we use "first write wins": counter = 1 (Bob's change lost!) - Manual conflict resolution: Ask user to choose (annoying!) ``` With a CRDT counter: ``` Initial state: counter = 0 Alice's device: increment(1) Bob's device: increment(2) When they sync: - CRDT automatically merges: counter = 3 (both changes preserved!) ``` ### CRDT Properties All CRDTs have these properties: 1. **Associativity**: The order of merging doesn't matter - `merge(merge(A, B), C) = merge(A, merge(B, C))` 2. **Commutativity**: Who merges with whom doesn't matter - `merge(A, B) = merge(B, A)` 3. **Idempotency**: Merging the same change multiple times has no additional effect - `merge(A, A) = A` These properties ensure **strong eventual consistency**: all devices that see the same set of changes will converge to the same state, regardless of network delays, message reordering, or duplicates. ## Automerge CRDT Types Automerge implements several CRDT types, each optimized for different use cases. ### Register Values (Strings, Numbers, Booleans) The simplest type: when two replicas concurrently write different values to the same key, one value wins. Automerge uses a deterministic conflict resolution algorithm (based on Lamport timestamps and actor IDs) to ensure all replicas converge to the same value. ```{r} doc1 <- am_create() doc1[["name"]] <- "Alice" doc1[["score"]] <- 100 am_commit(doc1) doc2 <- am_fork(doc1) # Concurrent edits doc1[["name"]] <- "Alice Smith" doc2[["name"]] <- "Alice Johnson" # Merge am_merge(doc1, doc2) # One value wins (deterministic, all replicas agree) doc1[["name"]] ``` **When to use**: Simple values where automatic conflict resolution is acceptable (metadata, settings, labels). **Trade-off**: One concurrent change is lost, but resolution is deterministic and all replicas converge to the same state. ### Maps (Nested Objects) Maps are collections of key-value pairs. Each key uses deterministic conflict resolution (one value wins), but different keys can be edited concurrently without conflict. ```{r} doc3 <- am_create() doc3[["user"]] <- list(name = "Alice", age = 30L, city = "Boston") am_commit(doc3) doc4 <- am_fork(doc3) # Concurrent edits to different keys user3 <- am_get(doc3, AM_ROOT, "user") am_put(doc3, user3, "age", 31L) user4 <- am_get(doc4, AM_ROOT, "user") am_put(doc4, user4, "city", "New York") # Merge - both changes preserved am_merge(doc3, doc4) # Both edits are present user_final <- am_get(doc3, AM_ROOT, "user") am_get(doc3, user_final, "age") am_get(doc3, user_final, "city") ``` **When to use**: Structured data with multiple independent fields (user profiles, configuration objects). **Merge behavior**: Changes to different keys merge cleanly. Changes to the same key use deterministic conflict resolution. ### Lists (Ordered Sequences) Lists use a sophisticated CRDT algorithm that preserves insertion order and handles concurrent insertions elegantly. ```{r} doc5 <- am_create() am_put(doc5, AM_ROOT, "items", AM_OBJ_TYPE_LIST) items5 <- am_get(doc5, AM_ROOT, "items") am_put(doc5, items5, "end", "A") am_put(doc5, items5, "end", "C") am_commit(doc5) doc6 <- am_fork(doc5) # Concurrent insertions at same position items6 <- am_get(doc6, AM_ROOT, "items") am_insert(doc5, items5, 2, "B1") # Insert between A and C am_insert(doc6, items6, 2, "B2") # Insert between A and C # Merge am_merge(doc5, doc6) # Both insertions preserved with deterministic ordering for (i in seq_len(am_length(doc5, items5))) { print(am_get(doc5, items5, i)) } ``` **When to use**: Ordered collections (task lists, sequences, timelines). **Merge behavior**: Concurrent insertions at the same position are both preserved. The algorithm ensures deterministic ordering based on actor IDs. **Limitation**: There are cases where insertion order is not perfectly preserved, primarily when inserting elements in reverse order. However, the algorithm performs well in most common scenarios. ### Text CRDTs (Collaborative Text Editing) Text objects are optimized for character-level collaborative editing. They handle concurrent insertions, deletions, and edits gracefully. ```{r} doc7 <- am_create() am_put(doc7, AM_ROOT, "document", am_text("The quick fox jumps")) am_commit(doc7) doc8 <- am_fork(doc7) text7 <- am_get(doc7, AM_ROOT, "document") text8 <- am_get(doc8, AM_ROOT, "document") # Concurrent edits (0-based inter-character positions) am_text_splice(text7, 10, 0, "brown ") # Insert "brown " at position 10 am_text_splice(text8, 19, 0, " high") # Insert " high" at position 19 (end) # Merge am_merge(doc7, doc8) # Both edits preserved am_text_content(text7) ``` **When to use**: Collaborative documents, shared notes, code editors, chat messages. **Merge behavior**: Character-level operations merge intelligently. Positions are tracked using stable identifiers, so concurrent edits at different positions don't interfere. **Important**: Regular strings (`AM_VAL_TYPE_STR`) use deterministic conflict resolution (one value wins). Only text objects (`AM_OBJ_TYPE_TEXT`) provide character-level CRDT merging. ```{r} # String (deterministic conflict resolution) doc9 <- am_create() doc9[["title"]] <- "Document" doc10 <- am_fork(doc9) doc9[["title"]] <- "My Document" doc10[["title"]] <- "Our Document" am_merge(doc9, doc10) doc9[["title"]] # One value wins deterministically # Text object (CRDT) doc11 <- am_create() am_put(doc11, AM_ROOT, "content", am_text("Hello")) doc12 <- am_fork(doc11) text11 <- am_get(doc11, AM_ROOT, "content") text12 <- am_get(doc12, AM_ROOT, "content") am_text_splice(text11, 5, 0, " World") am_text_splice(text12, 5, 0, " Everyone") am_merge(doc11, doc12) am_text_content(text11) ``` ### Counters Counters are a classic CRDT that adds up increments from all replicas. ```{r} doc13 <- am_create() am_put(doc13, AM_ROOT, "likes", am_counter(0)) am_commit(doc13) doc14 <- am_fork(doc13) # Concurrent increments am_counter_increment(doc13, AM_ROOT, "likes", 3) am_counter_increment(doc14, AM_ROOT, "likes", 5) am_counter_increment(doc14, AM_ROOT, "likes", -1) # Merge am_merge(doc13, doc14) # Sum of all increments doc13[["likes"]] ``` **When to use**: Vote counts, like counts, inventory quantities, collaborative counters. **Merge behavior**: All increments (positive or negative) are summed automatically. ### Timestamps Timestamps record when a value was set. They use deterministic conflict resolution semantics (one value wins). ```{r} doc15 <- am_create() doc15[["created_at"]] <- Sys.time() am_commit(doc15) Sys.sleep(0.1) doc16 <- am_fork(doc15) doc15[["updated_at"]] <- Sys.time() doc16[["updated_at"]] <- Sys.time() am_merge(doc15, doc16) doc15[["created_at"]] doc15[["updated_at"]] ``` **When to use**: Audit trails, modification times, temporal metadata. ## Advanced Text Features ### Cursors (Stable Position Tracking) Cursors maintain stable positions in text even as the document is edited. ```{r} doc17 <- am_create() am_put(doc17, AM_ROOT, "text", am_text("Hello World")) text17 <- am_get(doc17, AM_ROOT, "text") # Create cursor at position 6 (0-based: after "Hello ") cursor <- am_cursor(text17, 6) # Insert text before cursor am_text_splice(text17, 0, 0, "Hi ") # Cursor automatically adjusts new_pos <- am_cursor_position(cursor) new_pos # Cursor moved with text from original position 6 ``` **When to use**: Text editors (cursor tracking), collaborative commenting (position references). ### Marks (Text Formatting) Marks attach metadata to text ranges, useful for formatting and annotations. ```{r} doc18 <- am_create() am_put(doc18, AM_ROOT, "text", am_text("Hello World")) text18 <- am_get(doc18, AM_ROOT, "text") # Mark "Hello" as bold (positions 0-4, 0-based) am_mark(text18, 0, 5, "bold", TRUE, expand = "none") # Mark "World" as italic (positions 6-10) am_mark(text18, 6, 11, "italic", TRUE, expand = "none") # Query marks marks <- am_marks(text18) str(marks) # Marks at specific position marks_at_pos <- am_marks_at(text18, 2) # Position 2 (in "Hello") str(marks_at_pos) ``` **When to use**: Rich text editors, collaborative annotations, syntax highlighting. **Mark expansion**: Controls how marks behave when text is inserted at boundaries: - `"none"`: Mark doesn't expand - `"before"`: Expands to include text inserted before start - `"after"`: Expands to include text inserted after end - `"both"`: Expands at both boundaries ```{r} doc19 <- am_create() am_put(doc19, AM_ROOT, "text", am_text("Hello")) text19 <- am_get(doc19, AM_ROOT, "text") # Mark with expansion am_mark(text19, 0, 5, "bold", TRUE, expand = "after") # Insert at end of mark am_text_splice(text19, 5, 0, " World") # Mark expands to include "World" marks <- am_marks(text19) str(marks) ``` ## CRDT Design Trade-offs ### Storage Growth CRDTs preserve operation history to enable merging. This means document size grows with the number of operations, not just current content size. ```{r} doc20 <- am_create() # Make many edits for (i in 1:100) { doc20[[paste0("key", i)]] <- i } am_commit(doc20) # Size includes all history length(am_save(doc20)) ``` **Mitigation strategies**: 1. Commit less frequently for bulk changes 2. Compact history periodically (future feature) 3. Use appropriate granularity (don't make objects too fine-grained) ### Deletions and Concurrent Operations Deleted items leave tombstones to ensure consistent deletion across replicas. When a delete operation conflicts with an update operation, Automerge has specific resolution rules: **Maps and Lists**: When one replica deletes a key/element while another updates it concurrently, the update takes precedence. The updated value is retained in the merged document. ```{r} # Map: Deletion vs concurrent update - update wins doc21 <- am_create() doc21[["temp"]] <- "value" am_commit(doc21) doc22 <- am_fork(doc21) am_delete(doc21, AM_ROOT, "temp") am_commit(doc21) doc22[["temp"]] <- "updated" am_commit(doc22) am_merge(doc21, doc22) doc21[["temp"]] # Update takes precedence over delete ``` **Double deletion**: When both replicas delete the same key/element, the key/element is removed from the merged document. ```{r} # List: Delete and insert at same position - both operations apply doc23 <- am_create() am_put(doc23, AM_ROOT, "items", AM_OBJ_TYPE_LIST) items23 <- am_get(doc23, AM_ROOT, "items") am_put(doc23, items23, "end", "A") am_put(doc23, items23, "end", "B") am_put(doc23, items23, "end", "C") am_commit(doc23) doc24 <- am_fork(doc23) items24 <- am_get(doc24, AM_ROOT, "items") am_delete(doc23, items23, 2) am_insert(doc24, items24, 2, "X") am_merge(doc23, doc24) for (i in seq_len(am_length(doc23, items23))) { print(am_get(doc23, items23, i)) } ``` ## Best Practices ### 1. Choose the Right CRDT Type - **Simple values**: Use regular types (strings, numbers, booleans) - **Structured data**: Use maps for objects with independent fields - **Ordered collections**: Use lists for sequences - **Collaborative text**: Use text objects, not strings - **Counters**: Use counter CRDT, not integers ### 2. Design for Commutativity Structure your data so concurrent operations naturally merge well: ```{r} # Good: Independent counters per user doc_good <- am_create() doc_good[["votes"]] <- list( alice = am_counter(0), bob = am_counter(0) ) # Better than: Single counter for all votes doc_bad <- am_create() doc_bad[["total_votes"]] <- am_counter(0) # Loses attribution ``` ### 3. Commit at Logical Boundaries Group related changes in commits: ```{r} doc25 <- am_create() # Good: Atomic transaction doc25[["user"]] <- list(name = "Alice", age = 30L, city = "Boston") am_commit(doc25, "Add user Alice") # Bad: Many micro-commits (increases storage) doc25[["status"]] <- "active" am_commit(doc25, "Set status") doc25[["role"]] <- "admin" am_commit(doc25, "Set role") ``` ### 4. Handle Concurrent Conflicts Gracefully Understand that some conflicts are inevitable: ```{r} doc26 <- am_create() doc26[["status"]] <- "draft" am_commit(doc26) doc27 <- am_fork(doc26) doc26[["status"]] <- "published" doc27[["status"]] <- "archived" am_merge(doc26, doc27) # One will win - application should handle both states sensibly doc26[["status"]] # Should be prepared for either 'published' or 'archived' ``` ## Further Reading - [Automerge Documentation](https://automerge.org/) - [CRDT Research Papers](https://crdt.tech/) - [Conflict-Free Replicated Data Types (Paper)](https://arxiv.org/abs/1805.06358) ## Next Steps - Learn about [Synchronization Patterns](sync-protocol.html) to use CRDTs in practice - Check the [Quick Reference](quick-reference.html) for all available functions