--- title: "Getting Started with glyrepr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with glyrepr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Welcome to the world of glycan analysis! If you've ever tried to work with glycans computationally, you know the struggle: these tree-like molecules are notoriously difficult to represent and analyze compared to their linear cousins like proteins or DNA. That's where `glyrepr` comes to the rescue. Think of `glyrepr` as your **glycan translator** — it teaches your computer how to "speak glycan" fluently. Whether you're dealing with compositions (what's in the glycan) or structures (how it's connected), this package has got you covered. ```{r setup} library(glyrepr) ``` ## Quick Start: What Are We Talking About? Before we dive in, let's establish our vocabulary. Don't worry — it's simpler than it sounds! | Term | What It Means | Example | |------|---------------|---------| | **Composition** | The "ingredients list" — how many of each sugar | `Hex(5)HexNAc(2)` | | **Structure** | The "blueprint" — how sugars are connected | `Man(a1-3)Man(b1-4)GlcNAc` | | **Monosaccharide** | A single sugar unit (the building blocks) | `Gal`, `Man`, `Hex` | | **Linkage** | The "glue" between sugars | `a1-3`, `b1-4` | | **Substitution** | Chemical decorations on sugars | `6Ac`, `3Me` | 🔍 **Pro tip**: We distinguish between **generic** sugars (like mystery boxes labeled "Hex") and **concrete** sugars (like specific boxes labeled "Galactose"). ## Part 1: Compositions — The Easy Start Let's start with something straightforward: glycan compositions. Think of these as ingredient lists for your favorite recipes. ### Creating Your First Compositions There are three ways to create compositions, each with its own superpower: **Method 1: The Direct Approach** ```{r} # Just tell R what you have glycan_composition(c(Hex = 5, HexNAc = 2), c(Gal = 1, GalNAc = 1)) ``` **Method 2: The Programmatic Way** ```{r} # Perfect when you're processing data from files or databases comp_list <- list(c(Hex = 5, HexNAc = 2), c(Gal = 1, GalNAc = 1)) as_glycan_composition(comp_list) ``` **Method 3: The Parser** ```{r} # Copy-paste from your mass spec software? No problem! as_glycan_composition(c("Hex(5)HexNAc(2)", "Gal(1)GalNAc(1)")) ``` ### The Magic of Colors 🌈 Here's something cool: when you run these examples in your R console, you'll see the concrete monosaccharides (like Gal and GalNAc) displayed in beautiful colors! These follow the [SNFG standard](https://www.ncbi.nlm.nih.gov/glycans/snfg.html) — the universal "color code" for glycans. Think of it as the glycan rainbow 🌈. ### Smart Counting with `count_mono()` Now here's where `glyrepr` shows its intelligence: ```{r} comp <- glycan_composition( c(Hex = 5, HexNAc = 2), # generic sugars c(Gal = 1, Man = 1, GalNAc = 1) # concrete sugars ) # How many galactose residues? count_mono(comp, "Gal") # How many hexose residues? (This includes Gal and Man!) count_mono(comp, "Hex") ``` Notice how `count_mono()` is smart enough to know that galactose and mannose are both hexoses? That's the power of understanding glycan hierarchies! ## Part 2: Structures — Where the Magic Happens Compositions are nice, but structures are where `glyrepr` truly shines. This is like going from knowing the ingredients to understanding the actual recipe and cooking method. ### Your First Glycan Structures Let's work with some real glycan structures. These strings below are called the "IUPAC-condensed" glycan text representations. They might look cryptic, but they're actually quite readable once you get the hang of it. To learn about them, check out [this article](https://glycoverse.github.io/glyrepr/articles/iupac.html). ```{r} iupacs <- c( "Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc(b1-", # The famous N-glycan core "Gal(b1-3)GalNAc(a1-", # O-glycan core 1 "Gal(b1-3)[GlcNAc(b1-6)]GalNAc(a1-", # O-glycan core 2 "Man(a1-3)[Man(a1-6)]Man(a1-3)[Man(a1-6)]Man(a1-", # A branched mannose tree "GlcNAc6Ac(b1-4)Glc3Me(a1-" # With some decorations ) struc <- as_glycan_structure(iupacs) struc ``` ### The Secret Sauce: Unique Structure Optimization Here's where `glyrepr` gets really clever. Notice that "# Unique structures: 5" message? This isn't just informational — it's the key to lightning-fast performance. Let's see this optimization in action: ```{r} # Create a big dataset with lots of repetition large_struc <- rep(struc, 1000) # 5,000 structures total large_struc ``` Still showing "# Unique structures: 5"! This means `glyrepr` is storing only 5 unique graphs internally, not 5,000. This is like having a smart library system that stores only one copy of each book, no matter how many people want to read it. ### Performance That Will Blow Your Mind 🚀 Let's put this to the test: ```{r} library(tictoc) tic("Converting 5 structures") result_small <- convert_to_generic(struc) toc() tic("Converting 5,000 structures") result_large <- convert_to_generic(large_struc) toc() ``` **Mind = blown!** 🤯 The performance is nearly identical because `glyrepr` only processes each unique structure once, then cleverly expands the results. ### Structure Manipulation Tools `glyrepr` comes with several handy tools for structure manipulation: **Strip away the connections:** ```{r} remove_linkages(struc) ``` **Remove the decorations:** ```{r} # Let's look at our decorated structure first struc[5] # Now remove the decorations (6Ac and 3Me) remove_substituents(struc[5]) ``` ## Part 3: Conversions and Integrations ### From Structure to Composition Ever wondered what's actually in those complex structures? Easy: ```{r} comp <- as_glycan_composition(struc) comp ``` ### Back to Strings Need to export your data or use it elsewhere? ```{r} # Get the original string representations as.character(struc) as.character(comp) ``` ### Playing Nice with the Tidyverse `glyrepr` objects are first-class citizens in the tidyverse: ```{r} suppressPackageStartupMessages(library(tibble)) suppressPackageStartupMessages(library(dplyr)) df <- tibble( id = seq_along(struc), structures = struc, names = c("N-glycan core", "Core 1", "Core 2", "Branched Man", "Decorated") ) df %>% mutate(n_man = count_mono(structures, "Man")) %>% filter(n_man > 1) ``` ## What's Next? Congratulations! You've just learned the fundamentals of glycan representation in R. Here's what you can explore next: - 🔬 **Advanced analysis**: Check out the "Power User Guide: Efficient Glycan Manipulation" vignette for power-user features - 🧬 **Motif searching**: Try the `glymotif` package for finding patterns in glycan structures - 📊 **Visualization**: Explore glycan visualization packages in the glycoverse The glycoverse is your oyster! 🦪 ## Session Information ```{r} sessionInfo() ```