--- title: "Sequence Plots: heatmap, index, and distribution" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Sequence Plots: heatmap, index, and distribution} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5.5, out.width = "100%", dev = "png", dpi = 72) ``` ```{r setup, message = FALSE} library(Nestimate) ``` ## The dataset `trajectories` ships with Nestimate: 138 student sequences × 15 weeks, three states (`Active`, `Average`, `Disengaged`) plus `NA` for missed weeks. ```{r data} data(trajectories) dim(trajectories) head(trajectories[, 1:8]) sort(unique(as.vector(trajectories)), na.last = NA) ``` `sequence_plot()` is the single entry point for three views of this data: | `type` | What it shows | Uses dendrogram? | Facets? | |---|---|---|---| | `"heatmap"` *(default)* | dense carpet, rows sorted by a distance/dendrogram | yes | no | | `"index"` | carpet without dendrogram, row-gap optional | no | yes | | `"distribution"` | stacked area / bar of state proportions over time | no | yes | Defaults: `legend = "right"`, `frame = FALSE`. ## 1. `type = "heatmap"` — clustered carpet ### 1.1 Default — LCS distance, ward.D2 dendrogram ```{r h-default} sequence_plot(trajectories) ``` ### 1.2 Switch the sort strategy ```{r h-freq} sequence_plot(trajectories, sort = "frequency", main = "sort = 'frequency'") ``` ```{r h-hamming} sequence_plot(trajectories, sort = "hamming", main = "sort = 'hamming'") ``` ```{r h-start} sequence_plot(trajectories, sort = "start", main = "sort = 'start' (no dendrogram)") ``` Available sorts: `lcs` (default), `frequency`, `start`, `end`, plus any `build_clusters()` distance — `hamming`, `osa`, `lv`, `dl`, `qgram`, `cosine`, `jaccard`, `jw`. ### 1.3 Cluster separators with `k` Cut the dendrogram into `k` groups and overlay thin horizontal lines at the cluster boundaries in the ordered rows. Tune with `k_color` and `k_line_width`. ```{r h-k3} sequence_plot(trajectories, k = 3, main = "k = 3 — white separators") ``` ```{r h-k5-black} sequence_plot(trajectories, k = 5, k_color = "black", k_line_width = 1.2, main = "k = 5 — thin black") ``` ### 1.4 Legend position, custom palette, title ```{r h-legend} sequence_plot(trajectories, legend = "bottom", legend_title = "Engagement", state_colors = c("#2a9d8f", "#e9c46a", "#e76f51"), main = "Custom palette + bottom legend") ``` ### 1.5 Cell borders + `tick` thinning ```{r h-borders} sequence_plot(trajectories, cell_border = "grey60", tick = 3, main = "Cell grid + every-3rd tick") ``` ### 1.6 `frame = TRUE` brings back the outer box ```{r h-frame} sequence_plot(trajectories, frame = TRUE, main = "frame = TRUE") ``` ## 2. `type = "index"` — gap-ready carpet with facets No dendrogram. Rows are sorted within each panel by `sort`. Supports `group` (vector or auto from a `net_clustering`) plus `ncol` / `nrow` facet grids. ### 2.1 Single panel ```{r i-default} sequence_plot(trajectories, type = "index", main = "index — single panel") ``` ### 2.2 Visible row gaps ```{r i-gap} sequence_plot(trajectories, type = "index", row_gap = 0.25, main = "index with row_gap = 0.25") ``` ### 2.3 Faceted by `net_clustering` (auto 2×2 for k = 3) ```{r i-cluster} cl <- build_clusters(as.data.frame(trajectories), k = 3L, dissimilarity = "hamming", method = "ward.D2") sequence_plot(cl, type = "index", main = "index faceted by build_clusters(k = 3)") ``` ### 2.4 Force a 1×3 row ```{r i-row, fig.width=9, fig.height=4} sequence_plot(cl, type = "index", ncol = 3, nrow = 1, main = "index — ncol = 3, nrow = 1") ``` ## 3. `type = "distribution"` — state proportions over time Stacked area or bar chart of state frequencies per time column. ### 3.1 Default stacked area ```{r d-area} sequence_plot(trajectories, type = "distribution", main = "distribution — stacked area") ``` ### 3.2 Stacked bars, count scale ```{r d-bar-count} sequence_plot(trajectories, type = "distribution", geom = "bar", scale = "count", main = "distribution — bars, count scale") ``` ### 3.3 NA band on/off ```{r d-na-true} sequence_plot(trajectories, type = "distribution", na = TRUE, main = "na = TRUE") ``` ```{r d-na-false} sequence_plot(trajectories, type = "distribution", na = FALSE, main = "na = FALSE") ``` ### 3.4 Faceted by cluster ```{r d-cluster} sequence_plot(cl, type = "distribution", main = "distribution by cluster (k = 3)") ``` ## Cheat sheet ```{r cheatsheet, eval = FALSE} # Always explore first with the default: sequence_plot(trajectories) # Zoom in on cluster structure: sequence_plot(trajectories, k = 3) sequence_plot(trajectories, sort = "hamming", k = 4) # Compare cluster compositions: cl <- build_clusters(as.data.frame(trajectories), k = 3, dissimilarity = "hamming", method = "ward.D2") sequence_plot(cl, type = "index") sequence_plot(cl, type = "distribution") # Polish for a paper: sequence_plot(trajectories, k = 3, state_colors = c("#2a9d8f", "#e9c46a", "#e76f51"), legend_title = "Engagement", legend = "bottom", cell_border = "grey70", main = "Student engagement trajectories") ```