--- title: "tlsR Workflow: From Raw Imaging Data to TLS Characterisation" author: "Ali Amiryousefi" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{tlsR Workflow: From Raw Imaging Data to TLS Characterisation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 6, eval = TRUE ) ``` ## Introduction Tertiary lymphoid structures (TLS) are ectopic lymphoid organs that form in non-lymphoid tissues -- most notably in tumors -- and are associated with improved patient outcomes and immunotherapy response. **tlsR** provides a fast, reproducible pipeline for detecting TLS and characterizing their spatial organisation in multiplexed tissue imaging data (e.g. mIHC, CODEX, IMC). The core pipeline is: ``` Raw ldata list | v detect_TLS() <- KNN-based B+T co-localisation | +--> scan_clustering() <- Sliding-window Ripley's L clustering map | +--> calc_icat() <- ICAT spatial-spread score per TLS | +--> detect_tic() <- T-cell clusters outside TLS | +--> summarize_TLS() <- Tidy summary table | +--> plot_TLS() <- Publication-ready spatial plot ``` --- ## Data Format `tlsR` expects a **named list of data frames** (`ldata`), one element per tissue sample. Each data frame must contain at minimum: | Column | Type | Description | |-------------|-----------|--------------------------------------------------| | `x` | numeric | X coordinate in microns | | `y` | numeric | Y coordinate in microns | | `phenotype` | character | Cell label; must contain `"B cell"` / `"T cell"` | Additional columns (e.g. cell area, marker intensities) are silently ignored. ```{r load-data} library(tlsR) data(toy_ldata) # Structure of the built-in example dataset str(toy_ldata) table(toy_ldata[["ToySample"]]$phenotype) ``` --- ## Step 1 -- Detect TLS with `detect_TLS()` `detect_TLS()` identifies B-cell-rich regions with sufficient T-cell co-localisation using a KNN density approach. ```{r detect-tls} data(toy_ldata) ldata <- detect_TLS( LSP = "ToySample", k = 10, # neighbours for density estimation bcell_density_threshold = 17, # min avg 1/k-distance (um) min_B_cells = 100, # min B cells per candidate TLS min_T_cells_nearby = 5, # min T cells within max_distance_T max_distance_T = 50, # search radius (um) expand_distance = 100, # expanding radius ldata = toy_ldata ) table(ldata[["ToySample"]]$tls_id_knn) ``` The new column `tls_id_knn` is `0` for non-TLS cells and a positive integer for cells assigned to TLS 1, 2, 3, ... . ### Quick base-R check plot ```{r base-plot, fig.alt="Scatter plot of ToySample cells coloured by TLS membership"} df <- ldata[["ToySample"]] plot(df$x[df$tls_id_knn == 0], df$y[df$tls_id_knn == 0], col = "grey80", pch = 19, cex = 0.3, xlab = "x (um)", ylab = "y (um)", main = "Detected TLS -- ToySample") points(df$x[df$tls_id_knn > 0], df$y[df$tls_id_knn > 0], col = "#0072B2", pch = 19, cex = 0.4) legend("bottomright", legend = c("Background", "TLS"), col = c("grey80", "#0072B2"), pch = 19, pt.cex = 1.2, bty = "n") ``` --- ## Step 2 -- Local Ripley's L Map with `scan_clustering()` `scan_clustering()` slides a square window across the tissue and computes the **K-integral** clustering index in each window -- the mean positive excess of the observed Ripley's L over the theoretical CSR value. When `plot = TRUE` (the default) a spatial map is produced showing: - All cells as small light-grey points. - Phenotype cells coloured green (T cells) or red (B cells). - A navy dashed grid marking window boundaries. - A LOESS-smoothed L-excess curve overlaid inside each qualifying window. - A bold numeric clustering-intensity (CI) label centred in each window. - A legend identifying all point and curve colours. ### Single-phenotype map ```{r scan-B, eval = FALSE} # eval=FALSE because this can take ~10--30 s on real data L_B <- scan_clustering( ws = 1000, # window side (um) sample = "ToySample", phenotype = "B cells", plot = TRUE, creep = 1L, min_cells = 10L, min_phen_cells = 5L, label_cex = 1.1, # increase if CI labels look small ldata = ldata ) cat("B-cell windows analysed:", length(L_B$B), "\n") ``` ```{r scan-T, eval = FALSE} L_T <- scan_clustering( ws = 500, sample = "ToySample", phenotype = "T cells", plot = TRUE, ldata = ldata ) cat("T-cell windows analysed:", length(L_T$T), "\n") ``` ### Side-by-side B and T cell panels When `phenotype = "Both"` two panels are drawn side by side -- one for B cells and one for T cells -- with a shared super-title, making it easy to compare clustering intensity across compartments. ```{r scan-both, eval = FALSE} L_both <- scan_clustering( ws = 3000, sample = "ToySample", phenotype = "Both", plot = TRUE, ldata = ldata ) cat("B windows:", length(L_both$B), " | T windows:", length(L_both$T), "\n") ``` The returned list has named elements `$B` and `$T`, each containing `Lest` objects for the qualifying windows of that phenotype. Individual L curves can be inspected or plotted directly from these objects. --- ## Step 3 -- ICAT Score with `calc_icat()` The **ICAT (Immune Cell Arrangement Trace)** index quantifies the spatial spread and linear organisation of cells within a TLS. A higher value indicates a more spatially extended, structured cluster. ### How it works `calc_icat()` applies FastICA to the centred (x, y) coordinates of TLS cells, reconstructs the data as \( \hat{X} = S A^T + \mu \), and computes the normalised trace-standard-deviation: \[ \text{ICAT} = 100 \times \frac{\sqrt{v_1 + v_2 + 2\sqrt{v_1 v_2}}}{\text{nrow}(X)} \] where \(v_1, v_2\) are the marginal variances of \(\hat{X}\). This formulation is **always non-negative** -- it reflects average spatial spread per cell in microns, rather than the signed trace of the raw mixing matrix which can be negative due to ICA sign ambiguity. ```{r icat} n_tls <- max(ldata[["ToySample"]]$tls_id_knn, na.rm = TRUE) if (n_tls >= 1L) { icat_scores <- vapply( seq_len(n_tls), function(id) calc_icat("ToySample", tlsID = id, ldata = ldata), numeric(1L) ) names(icat_scores) <- paste0("TLS", seq_len(n_tls)) print(icat_scores) } ``` `calc_icat()` returns `NA` (with a message) if a TLS has too few cells or if FastICA fails to converge -- no errors are thrown. --- ## Step 4 -- Detect T-cell Clusters with `detect_tic()` T-cell clusters (TIC) that lie *outside* TLS are identified with HDBSCAN. The `min_pts` and `min_cluster_size` arguments let you control sensitivity. ```{r detect-tic} ldata <- detect_tic( sample = "ToySample", min_pts = 20, # HDBSCAN minPts min_cluster_size = 100, # drop clusters smaller than this ldata = ldata ) table( ldata[["ToySample"]]$tcell_cluster_hdbscan[ ldata[["ToySample"]]$tcell_cluster_hdbscan != 0 ], useNA = "ifany" ) ``` --- ## Step 5 -- Summary Table with `summarize_TLS()` `summarize_TLS()` produces a tidy one-row-per-sample summary -- convenient for downstream statistical analysis. ```{r summary} sumtbl <- summarize_TLS(ldata, calc_icat_scores = FALSE) print(sumtbl) ``` With `calc_icat_scores = TRUE` a list-column `icat_scores` is appended containing named numeric vectors of per-TLS ICAT values (always non-negative). --- ## Step 6 -- Visualise with `plot_TLS()` `plot_TLS()` produces a ggplot2 scatter plot with TLS and TIC coloured distinctly using a colourblind-friendly palette. ### Rendering improvements Two aesthetics have been tuned for clarity: - **Background cells** are drawn with `bg_alpha = 0.25` (more transparent than before), so the foreground TLS and TIC structure is immediately visible. - **TIC cells** are drawn at `point_size * tic_size_mult` (default multiplier `1.8x`), making them slightly larger than TLS cells without dominating the plot. Both parameters are fully exposed as function arguments so you can fine-tune them for your data density. ```{r plot-tls, fig.alt="ggplot2 spatial map of ToySample with TLS and TIC highlighted"} p <- plot_TLS( sample = "ToySample", ldata = ldata, show_tic = TRUE, point_size = 0.5, alpha = 0.7, # TLS / TIC cells bg_alpha = 0.25, # background cells (more transparent) tic_size_mult = 0.8 # TIC cells drawn 1.8x larger ) ``` The returned `ggplot` object can be further customised with standard ggplot2 functions: ```{r plot-custom, fig.alt="Customised TLS plot with additional title"} library(ggplot2) p + labs(title = "ToySample -- Your custom title") ``` --- ## Multi-Sample Workflow `tlsR` is designed to scale naturally to many samples. Simply pass your full `ldata` list and iterate: ```{r multi-sample, eval = FALSE} samples <- names(ldata) ldata <- Reduce(function(ld, s) detect_TLS(s, ldata = ld), samples, ldata) ldata <- Reduce(function(ld, s) detect_tic(s, ldata = ld), samples, ldata) summary_all <- summarize_TLS(ldata) print(summary_all) ``` For `scan_clustering()` across many samples: ```{r multi-scan, eval = FALSE} # Generate one spatial map per sample (side-by-side B and T panels) for (s in names(ldata)) { scan_clustering( ws = 500, sample = s, phenotype = "Both", # two-panel plot: B cells | T cells plot = TRUE, label_cex = 1.2, # slightly larger CI labels for presentation ldata = ldata ) } ``` --- ## Session Info ```{r session} sessionInfo() ```