tidyverse and ggplot integration with destiny

Interaction with the tidyverse and ggplot2

The tidyverse, ggplot2, and destiny are a great fit!

suppressPackageStartupMessages({
    library(destiny)
    library(tidyverse)
    library(forcats)  # not in the default tidyverse loadout
})

ggplot has a peculiar method to set default scales: You just have to define certain variables.

scale_colour_continuous <- scale_color_viridis_c

When working mainly with dimension reductions, I suggest to hide the (useless) ticks:

theme_set(theme_gray() + theme(
    axis.ticks = element_blank(),
    axis.text  = element_blank()))

Let’s load our dataset

data(guo_norm)

Of course you could use tidyr::gather() to tidy or transform the data now, but the data is already in the right form for destiny, and R for Data Science is a better resource for it than this vignette. The long form of a single cell ExpressionSet would look like:

guo_norm %>%
    as('data.frame') %>%
    gather(Gene, Expression, one_of(featureNames(guo_norm)))
A data.frame: 20544 × 4
Cell num_cells Gene Expression
<fct> <int> <chr> <dbl>
2C 1.1 2 Actb -0.575
2C 1.2 2 Actb -0.435
2C 2.1 2 Actb 0.460
2C 2.2 2 Actb 0.610
2C 3.1 2 Actb 1.970
2C 3.2 2 Actb 2.285
2C 4.1 2 Actb 1.785
2C 5.1 2 Actb 1.495
2C 5.2 2 Actb 1.935
2C 6.1 2 Actb -0.820
2C 6.2 2 Actb -1.335
2C 7.1 2 Actb -0.595
2C 7.2 2 Actb -0.600
2C 8.1 2 Actb 2.660
2C 8.2 2 Actb 1.205
2C 9.1 2 Actb 1.425
2C 9.2 2 Actb 0.815
2C 10.1 2 Actb 2.075
2C 10.2 2 Actb -0.520
4C 1.1 4 Actb -4.245
4C 1.2 4 Actb 0.570
4C 1.3 4 Actb -0.085
4C 1.4 4 Actb 2.785
4C 2.1 4 Actb 1.880
4C 2.2 4 Actb 2.170
4C 2.3 4 Actb 1.390
4C 3.1 4 Actb 1.260
4C 3.2 4 Actb 0.835
4C 3.3 4 Actb -0.745
4C 3.4 4 Actb -0.280
64C 5.9 64 Tspan8 10.255
64C 5.10 64 Tspan8 12.120
64C 5.11 64 Tspan8 -0.155
64C 5.12 64 Tspan8 0.260
64C 5.13 64 Tspan8 0.445
64C 5.14 64 Tspan8 -0.810
64C 5.15 64 Tspan8 6.710
64C 5.16 64 Tspan8 4.455
64C 6.1 64 Tspan8 7.070
64C 6.2 64 Tspan8 9.895
64C 6.3 64 Tspan8 14.020
64C 6.4 64 Tspan8 9.720
64C 6.5 64 Tspan8 11.630
64C 6.6 64 Tspan8 7.795
64C 6.7 64 Tspan8 7.680
64C 6.8 64 Tspan8 9.740
64C 7.1 64 Tspan8 14.150
64C 7.2 64 Tspan8 12.285
64C 7.3 64 Tspan8 14.010
64C 7.4 64 Tspan8 13.320
64C 7.5 64 Tspan8 12.545
64C 7.6 64 Tspan8 11.290
64C 7.7 64 Tspan8 11.895
64C 7.8 64 Tspan8 7.090
64C 7.9 64 Tspan8 14.735
64C 7.10 64 Tspan8 3.220
64C 7.11 64 Tspan8 3.415
64C 7.12 64 Tspan8 4.540
64C 7.13 64 Tspan8 5.315
64C 7.14 64 Tspan8 2.865

But destiny doesn’t use long form data as input, since all single cell data has always a more compact structure of genes×cells, with a certain number of per-sample covariates (The structure of ExpressionSet).

dm <- DiffusionMap(guo_norm)

names(dm) shows what names can be used in dm$<name>, as.data.frame(dm)$<name>, or ggplot(dm, aes(<name>)):

names(dm)  # namely: Diffusion Components, Genes, and Covariates
##  [1] "DC1"       "DC2"       "DC3"       "DC4"       "DC5"       "DC6"      
##  [7] "DC7"       "DC8"       "DC9"       "DC10"      "DC11"      "DC12"     
## [13] "DC13"      "DC14"      "DC15"      "DC16"      "DC17"      "DC18"     
## [19] "DC19"      "DC20"      "Actb"      "Ahcy"      "Aqp3"      "Atp12a"   
## [25] "Bmp4"      "Cdx2"      "Creb312"   "Cebpa"     "Dab2"      "DppaI"    
## [31] "Eomes"     "Esrrb"     "Fgf4"      "Fgfr2"     "Fn1"       "Gapdh"    
## [37] "Gata3"     "Gata4"     "Gata6"     "Grhl1"     "Grhl2"     "Hand1"    
## [43] "Hnf4a"     "Id2"       "Klf2"      "Klf4"      "Klf5"      "Krt8"     
## [49] "Lcp1"      "Mbnl3"     "Msc"       "Msx2"      "Nanog"     "Pdgfa"    
## [55] "Pdgfra"    "Pecam1"    "Pou5f1"    "Runx1"     "Sox2"      "Sall4"    
## [61] "Sox17"     "Snail"     "Sox13"     "Tcfap2a"   "Tcfap2c"   "Tcf23"    
## [67] "Utf1"      "Tspan8"    "Cell"      "num_cells"

Due to the fortify method (which here just means as.data.frame) being defined on DiffusionMap objects, ggplot directly accepts DiffusionMap objects:

ggplot(dm, aes(DC1, DC2, colour = Klf2)) +
    geom_point()

When you want to use a Diffusion Map in a dplyr pipeline, you need to call fortify/as.data.frame directly:

fortify(dm) %>%
    mutate(
        EmbryoState = factor(num_cells) %>%
            lvls_revalue(paste(levels(.), 'cell state'))
    ) %>%
    ggplot(aes(DC1, DC2, colour = EmbryoState)) +
        geom_point()

The Diffusion Components of a converted Diffusion Map, similar to the genes in the input ExpressionSet, are individual variables instead of two columns in a long-form data frame, but sometimes it can be useful to “tidy” them:

fortify(dm) %>%
    gather(DC, OtherDC, num_range('DC', 2:5)) %>%
    ggplot(aes(DC1, OtherDC, colour = factor(num_cells))) +
        geom_point() +
        facet_wrap(~ DC)

Another tip: To reduce overplotting, use sample_frac(., 1.0, replace = FALSE) (the default) in a pipeline.

Adding a constant alpha improves this even more, and also helps you see density:

fortify(dm) %>%
    sample_frac() %>%
    ggplot(aes(DC1, DC2, colour = factor(num_cells))) +
        geom_point(alpha = .3)