The tidyverse, ggplot2, and destiny are a great fit!
suppressPackageStartupMessages({
library(destiny)
library(tidyverse)
library(forcats) # not in the default tidyverse loadout
})ggplot has a peculiar method to set default scales: You just have to define certain variables.
When working mainly with dimension reductions, I suggest to hide the (useless) ticks:
Let’s load our dataset
Of course you could use tidyr::gather() to tidy or transform the data now, but the data is already in the right form for destiny, and R for Data Science is a better resource for it than this vignette. The long form of a single cell ExpressionSet would look like:
| Cell | num_cells | Gene | Expression |
|---|---|---|---|
| <fct> | <int> | <chr> | <dbl> |
| 2C 1.1 | 2 | Actb | -0.575 |
| 2C 1.2 | 2 | Actb | -0.435 |
| 2C 2.1 | 2 | Actb | 0.460 |
| 2C 2.2 | 2 | Actb | 0.610 |
| 2C 3.1 | 2 | Actb | 1.970 |
| 2C 3.2 | 2 | Actb | 2.285 |
| 2C 4.1 | 2 | Actb | 1.785 |
| 2C 5.1 | 2 | Actb | 1.495 |
| 2C 5.2 | 2 | Actb | 1.935 |
| 2C 6.1 | 2 | Actb | -0.820 |
| 2C 6.2 | 2 | Actb | -1.335 |
| 2C 7.1 | 2 | Actb | -0.595 |
| 2C 7.2 | 2 | Actb | -0.600 |
| 2C 8.1 | 2 | Actb | 2.660 |
| 2C 8.2 | 2 | Actb | 1.205 |
| 2C 9.1 | 2 | Actb | 1.425 |
| 2C 9.2 | 2 | Actb | 0.815 |
| 2C 10.1 | 2 | Actb | 2.075 |
| 2C 10.2 | 2 | Actb | -0.520 |
| 4C 1.1 | 4 | Actb | -4.245 |
| 4C 1.2 | 4 | Actb | 0.570 |
| 4C 1.3 | 4 | Actb | -0.085 |
| 4C 1.4 | 4 | Actb | 2.785 |
| 4C 2.1 | 4 | Actb | 1.880 |
| 4C 2.2 | 4 | Actb | 2.170 |
| 4C 2.3 | 4 | Actb | 1.390 |
| 4C 3.1 | 4 | Actb | 1.260 |
| 4C 3.2 | 4 | Actb | 0.835 |
| 4C 3.3 | 4 | Actb | -0.745 |
| 4C 3.4 | 4 | Actb | -0.280 |
| ⋮ | ⋮ | ⋮ | ⋮ |
| 64C 5.9 | 64 | Tspan8 | 10.255 |
| 64C 5.10 | 64 | Tspan8 | 12.120 |
| 64C 5.11 | 64 | Tspan8 | -0.155 |
| 64C 5.12 | 64 | Tspan8 | 0.260 |
| 64C 5.13 | 64 | Tspan8 | 0.445 |
| 64C 5.14 | 64 | Tspan8 | -0.810 |
| 64C 5.15 | 64 | Tspan8 | 6.710 |
| 64C 5.16 | 64 | Tspan8 | 4.455 |
| 64C 6.1 | 64 | Tspan8 | 7.070 |
| 64C 6.2 | 64 | Tspan8 | 9.895 |
| 64C 6.3 | 64 | Tspan8 | 14.020 |
| 64C 6.4 | 64 | Tspan8 | 9.720 |
| 64C 6.5 | 64 | Tspan8 | 11.630 |
| 64C 6.6 | 64 | Tspan8 | 7.795 |
| 64C 6.7 | 64 | Tspan8 | 7.680 |
| 64C 6.8 | 64 | Tspan8 | 9.740 |
| 64C 7.1 | 64 | Tspan8 | 14.150 |
| 64C 7.2 | 64 | Tspan8 | 12.285 |
| 64C 7.3 | 64 | Tspan8 | 14.010 |
| 64C 7.4 | 64 | Tspan8 | 13.320 |
| 64C 7.5 | 64 | Tspan8 | 12.545 |
| 64C 7.6 | 64 | Tspan8 | 11.290 |
| 64C 7.7 | 64 | Tspan8 | 11.895 |
| 64C 7.8 | 64 | Tspan8 | 7.090 |
| 64C 7.9 | 64 | Tspan8 | 14.735 |
| 64C 7.10 | 64 | Tspan8 | 3.220 |
| 64C 7.11 | 64 | Tspan8 | 3.415 |
| 64C 7.12 | 64 | Tspan8 | 4.540 |
| 64C 7.13 | 64 | Tspan8 | 5.315 |
| 64C 7.14 | 64 | Tspan8 | 2.865 |
But destiny doesn’t use long form data as input, since all single cell data has always a more compact structure of genes×cells, with a certain number of per-sample covariates (The structure of ExpressionSet).
names(dm) shows what names can be used in dm$<name>, as.data.frame(dm)$<name>, or ggplot(dm, aes(<name>)):
## [1] "DC1" "DC2" "DC3" "DC4" "DC5" "DC6"
## [7] "DC7" "DC8" "DC9" "DC10" "DC11" "DC12"
## [13] "DC13" "DC14" "DC15" "DC16" "DC17" "DC18"
## [19] "DC19" "DC20" "Actb" "Ahcy" "Aqp3" "Atp12a"
## [25] "Bmp4" "Cdx2" "Creb312" "Cebpa" "Dab2" "DppaI"
## [31] "Eomes" "Esrrb" "Fgf4" "Fgfr2" "Fn1" "Gapdh"
## [37] "Gata3" "Gata4" "Gata6" "Grhl1" "Grhl2" "Hand1"
## [43] "Hnf4a" "Id2" "Klf2" "Klf4" "Klf5" "Krt8"
## [49] "Lcp1" "Mbnl3" "Msc" "Msx2" "Nanog" "Pdgfa"
## [55] "Pdgfra" "Pecam1" "Pou5f1" "Runx1" "Sox2" "Sall4"
## [61] "Sox17" "Snail" "Sox13" "Tcfap2a" "Tcfap2c" "Tcf23"
## [67] "Utf1" "Tspan8" "Cell" "num_cells"
Due to the fortify method (which here just means as.data.frame) being defined on DiffusionMap objects, ggplot directly accepts DiffusionMap objects:
When you want to use a Diffusion Map in a dplyr pipeline, you need to call fortify/as.data.frame directly:
fortify(dm) %>%
mutate(
EmbryoState = factor(num_cells) %>%
lvls_revalue(paste(levels(.), 'cell state'))
) %>%
ggplot(aes(DC1, DC2, colour = EmbryoState)) +
geom_point()The Diffusion Components of a converted Diffusion Map, similar to the genes in the input ExpressionSet, are individual variables instead of two columns in a long-form data frame, but sometimes it can be useful to “tidy” them:
fortify(dm) %>%
gather(DC, OtherDC, num_range('DC', 2:5)) %>%
ggplot(aes(DC1, OtherDC, colour = factor(num_cells))) +
geom_point() +
facet_wrap(~ DC)Another tip: To reduce overplotting, use sample_frac(., 1.0, replace = FALSE) (the default) in a pipeline.
Adding a constant alpha improves this even more, and also helps you see density:
fortify(dm) %>%
sample_frac() %>%
ggplot(aes(DC1, DC2, colour = factor(num_cells))) +
geom_point(alpha = .3)