--- title: "MCP Server: Design and Implementation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{MCP Server: Design and Implementation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## The Problem The original MCP server shipped 9 tools that mostly serialized raw data rows as JSON and sent them to Claude. This created three compounding problems: 1. **Size**: A single `get_dataset` call on a moderately large table (e.g., Grade Results with 50K rows) would hit the MCP protocol's 1MB response limit, forcing artificial row caps. 2. **Waste**: Claude received thousands of JSON rows it couldn't meaningfully compute on. Every "what's the average grade?" question required the LLM to mentally sum and divide across a JSON array. 3. **Rigidity**: Dedicated tools like `student_summary` and `course_summary` hard-coded specific analysis patterns. Any question outside those patterns required a new tool or awkward workarounds. The redesign inverts the architecture: Claude writes R code, the server executes it in a persistent workspace, and returns compact results -- text summaries and file-based visualizations. ## Design Principles ### Compute where compute belongs R is purpose-built for data manipulation. Claude is purpose-built for reasoning about what manipulation to do. The `execute_r` tool bridges these: Claude writes a dplyr pipeline, ggplot visualization, or statistical test; the server evaluates it and returns the result. No raw data crosses the wire unless explicitly requested. ### Visualization: interactive HTML first, static PNG fallback The preferred visualization path uses Chart.js injected into self-contained HTML files: 1. `execute_r` aggregates data in R (counts, means, distributions) 2. Claude builds an HTML string with Chart.js loaded from CDN 3. `writeLines()` saves it to the output directory 4. The user opens the HTML file in their browser ``` execute_r → aggregate data → paste0(Chart.js HTML) → writeLines() ↓ Interactive chart in browser (tooltips, hover, responsive) ``` As a fallback for cases where the user explicitly requests a PNG or a chart type Chart.js cannot handle, `execute_r` supports returning ggplot objects directly. When detected, the server renders the plot to a PNG file and generates an HTML viewer wrapper. Both files are saved to the configured output directory. ### Outcome-oriented tools The old server had tools organized around API operations: `get_dataset`, `join_datasets`, `student_summary`. The redesign organizes around user goals: | Old (9 tools) | New (7 tools) | Why | |---|---|---| | `get_dataset` | `execute_r` | Claude writes `bs_get_dataset("Users") %>% head(10)` | | `student_summary` | `execute_r` | Claude writes the exact summary needed | | `course_summary` | `execute_r` | Claude writes the exact aggregation | | `join_datasets` | `execute_r` | Claude calls `bs_join()` directly | | `describe_dataset` | `describe_dataset` (enhanced) | Now returns column stats, not sample rows | | -- | `get_data_summary` | Quick filter/group stats without writing R | ### Progressive disclosure Every response from `describe_dataset` and `get_data_summary` ends with a hint: > "Use execute_r for custom analysis. Available: bs_get_dataset(), bs_join(), > dplyr verbs." This teaches the model (and the user reading the transcript) that there's a more powerful tool available when the pre-built summaries aren't enough. ### Server instructions The MCP `initialize` handshake includes an `instructions` field -- a block of text that tells Claude the recommended workflow before any tools are called: ``` 1. Use list_datasets or search_datasets to discover data 2. Use describe_dataset to understand columns and distributions 3. Use get_data_summary with filter_by/group_by for quick stats 4. Use execute_r for custom analysis ``` The instructions also include performance guidelines, the Chart.js HTML pattern, available packages (dplyr, tidyr, ggplot2, lubridate, scales), and the date axis gotcha for ggplot2. ## Architecture ### Persistent workspace The server creates a `.mcp_workspace` environment at startup with `dplyr`, `tidyr`, `ggplot2`, `lubridate`, and `scales` pre-loaded. Every `execute_r` call evaluates code in this environment, so variables persist between calls: ```r # Call 1: load and filter users <- bs_get_dataset("Users") %>% filter(role_name == "Student") # Call 2: uses `users` from previous call users %>% count(org_unit_id, sort = TRUE) ``` The workspace also exposes `output_dir` -- the path to the configured output directory -- so Chart.js HTML files can be written there directly. ### Output directory Visualizations are written to a configurable output directory. Resolution order: 1. `BRIGHTSPACER_OUTPUT_DIR` environment variable (if set) 2. `/brightspaceR_output` (development mode) 3. `/brightspaceR_output` (fallback) 4. `/brightspaceR_output` (last resort if directory creation fails) The path is normalised with `normalizePath()` and logged at startup. If creation fails, the server falls back gracefully and logs a warning. ### Smart result formatting `handle_execute_r` inspects the result type and formats accordingly: | Result type | Formatting | |---|---| | `data.frame` with <=50 rows | Full text table via `print()` | | `data.frame` with >50 rows | `head(20)` + "... N more rows" hint | | `gg`/`ggplot` object | Render to 900x600 PNG + HTML viewer, return file paths | | Character ending in `.html`/`.png`/`.pdf`/`.csv` | Return as file path | | Anything else | `capture.output(print(x))` | | Error | `isError: TRUE` with the error message | ### Column summary helper `summarize_column()` is a shared function used by both `describe_dataset` and `get_data_summary`. It produces type-appropriate stats: - **Numeric**: min, max, mean, n_missing - **Character/Factor**: n_unique, top 3 values with counts, n_missing - **Logical**: n_true, n_false, n_missing - **Date/POSIXt**: min, max, n_missing This replaces the old approach of sending 5 sample rows, which told Claude almost nothing about the data distribution. ### Response helpers Composable functions build MCP-compliant responses: - `mcp_text(text, audience)` -- Creates a `TextContent` block with optional `audience` annotation (`["user"]` for clean output, `["assistant"]` for verbose diagnostics like row counts and join warnings). - `mcp_result(contents, is_error)` -- Wraps a list of content blocks into a tool result. Handles size truncation at ~800KB. ### Size guard The `mcp_result()` function checks the serialized JSON size of every response. If it exceeds 800KB, it finds the first text content block and truncates it with a clear message: ``` ... [TRUNCATED: response exceeded size limit. Use head()/filter() to narrow results in execute_r.] ``` This is a safety net, not the primary size control. The real size control is that `execute_r` returns computed results (a count, a mean, a chart file path) rather than raw data. ### Defensive execution `execute_r` includes several layers of protection against runaway queries: **30-second timeout**: Every eval is wrapped in `setTimeLimit(elapsed = 30)`. If a Cartesian join or unfiltered aggregation over millions of rows hangs, the server kills it and returns a clear error: *"Execution timed out after 30 seconds. Try filtering data earlier or breaking into smaller steps."* **Row-count reporting**: The workspace's `bs_get_dataset()` wrapper emits a message like `[Grade Results: 523,041 rows x 12 cols]` every time it's called. These messages are captured and returned to Claude as assistant-facing context (using `audience: ["assistant"]`), so Claude always knows how big the data it just loaded is -- and can course-correct before the next operation. **Large-dataset warnings**: If a dataset exceeds 50K rows, the message includes an explicit warning: *"WARNING: large dataset -- filter early to avoid slow operations"*. The same applies to `bs_join()` when either input exceeds 50K rows. **Server instructions**: The `initialize` response tells Claude to always call `describe_dataset` first to check row counts, filter early on large tables, and never return raw unfiltered data frames. These defenses are layered: instructions guide Claude to do the right thing, row-count messages let it self-correct mid-execution, and the timeout is the hard backstop. ## Tool Reference ### Discovery tools **`list_datasets`** -- Returns all available BDS dataset names and descriptions. No parameters. **`search_datasets`** -- Case-insensitive keyword search across dataset names and descriptions. **`auth_status`** -- Reports whether the server has a valid Brightspace OAuth token. **`list_schemas`** -- Lists registered dataset schemas and their key columns (used by `bs_join()`). ### Analysis tools **`describe_dataset(name)`** -- Loads a dataset and returns per-column summary statistics. The footer suggests using `execute_r` for custom queries. **`get_data_summary(dataset, filter_by?, group_by?)`** -- The "quick stats" tool. Loads a dataset, optionally filters by column-value pairs, optionally groups. Without grouping, returns per-column stats. With grouping, returns group counts and numeric column means per group. Always suggests `execute_r` for anything more complex. **`execute_r(code)`** -- The core tool. Claude writes R code as a string, the server evaluates it in the persistent workspace, and returns the result. Handles data frames, ggplot objects (saved as PNG + HTML), file paths, and arbitrary R objects. ## Types of Analysis The MCP server is designed for the kinds of questions LMS administrators and institutional researchers typically ask. Here are the main categories with example prompts. ### Enrollment analytics - "How many students are enrolled this semester vs last semester?" - "Show me enrollment trends by month for the last 2 years" - "Which courses have the highest drop rates?" - "Break down enrollments by role -- how many students, instructors, TAs?" ### Grade performance - "What's the average grade in STAT101?" - "Show me the grade distribution across all courses as a histogram" - "Which courses have the lowest pass rates (below 50%)?" - "Compare grade distributions between online and in-person sections" ### Student engagement - "How many users haven't logged in for 90 days?" - "What's the average content completion rate per course?" - "Which discussion forums have the most activity?" - "Show me quiz attempt patterns -- when do students take quizzes?" ### Course health - "Which courses have no grade items set up?" - "List courses that haven't been accessed in 6 months" - "Show me assignment submission rates by course" - "Which courses have the most content objects?" ### Cross-dataset joins - "Join users and enrollments to find students enrolled in more than 5 courses" - "Correlate discussion post frequency with final grades" - "Find users with attendance below 80% whose grades are above 70%" ### Reporting and dashboards - "Build me a dashboard showing KPIs for this semester" - "Generate a grade report for all courses in the Business faculty" - "Create a chart showing monthly active users over the past year" Each of these can be handled through the MCP workflow: discover datasets, check column distributions, then write R code that aggregates, joins, and visualizes. ## Typical Conversation Flow ``` User: "How are students performing in STAT101?" Claude thinks: I need to find the course, get grade data, and summarize it. 1. search_datasets(keyword="grade") -> Finds "Grade Results" 2. describe_dataset(name="Grade Results") -> Sees columns: user_id, org_unit_id, points_numerator, etc. -> Sees row count: 523,041 3. execute_r(code=""" grades <- bs_get_dataset("Grade Results") orgs <- bs_get_dataset("Org Units") stat101 <- orgs %>% filter(grepl("STAT101", name)) grades %>% filter(org_unit_id %in% stat101$org_unit_id) %>% summarise( n_students = n_distinct(user_id), mean_grade = mean(points_numerator, na.rm = TRUE), median_grade = median(points_numerator, na.rm = TRUE) ) """) -> Returns: tibble with 1 row of summary stats 4. execute_r(code=""" # Interactive chart grade_dist <- grades %>% filter(org_unit_id %in% stat101$org_unit_id, !is.na(points_numerator)) %>% mutate(bracket = cut(points_numerator, breaks = seq(0, 100, 10), include.lowest = TRUE)) %>% count(bracket) html <- paste0('......') writeLines(html, file.path(output_dir, 'stat101_grades.html')) browseURL(file.path(output_dir, 'stat101_grades.html')) """) -> Opens interactive bar chart in browser ``` Note how calls 3 and 4 share variables (`grades`, `orgs`, `stat101`) via the persistent workspace. Claude computes with R, then builds Chart.js HTML for interactive rendering. ## Security Considerations `execute_r` evaluates arbitrary R code. This is intentional -- the server runs locally, authenticated with the user's own Brightspace credentials. The trust boundary is the same as opening an R console. The workspace is sandboxed only in the sense that it's a child environment of `globalenv()`, not a separate process. For production deployments where multiple users share a server, `execute_r` would need process-level isolation (e.g., callr). That's out of scope for the current single-user design.