---
title: "Introduction to the *GraphExperiment* class"
author: 
  - name: Fabricio Almeida-Silva
    affiliation: |
      VIB-UGent Center for Plant Systems Biology, Ghent University, 
      Ghent, Belgium
  - name: Yves Van de Peer
    affiliation: |
      VIB-UGent Center for Plant Systems Biology, Ghent University, 
      Ghent, Belgium
output: 
  BiocStyle::html_document:
    toc: true
    number_sections: yes
bibliography: bibliography.bib
vignette: >
  %\VignetteIndexEntry{Introduction to the GraphExperiment class}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    crop = NULL
)
```

# Introduction

Networks (or graphs) have become widely used data representations in biology, 
as they can efficiently encode node-node interactions and neighborhoods.
In high-throughput, quantitative omics data (e.g., transcriptomics, proteomics,
metabolomics, epigenomics, etc), widely used network representations include
gene coexpression, protein-protein interaction, gene regulatory, and 
co-abundance networks. While data structures to store quantitative data and
associated metadata exist (e.g., `SummarizedExperiment`, `SingleCellExperiment`,
`SpatialExperiment`, etc), support for networks describing how features relate
to each other is currently missing. `GraphExperiment` is an S4 class that 
extends `SingleCellExperiment` [@sce] to include an additional container 
for networks associated with **assay features** (graphs representing columns,
such as samples and cells, are not supported by this package).

Of note, trees are an alternative way of representing how assay features are 
related to each other. Users interested in tree representations of 
assays rows/columns can use the `r BiocStyle::Biocpkg("TreeSummarizedExperiment")`
package. Trees are essentially *a kind of graph* (i.e., all trees are graphs,
but not all graphs are trees). Here, we chose to use a more
general graph representation (namely `igraph` objects) to provide users and 
developers with more flexibility.

# Installation

`GraphExperiment` can be installed from Bioconductor with the following code:

```{r installation, eval=FALSE}
if(!requireNamespace('BiocManager', quietly = TRUE))
  install.packages('BiocManager')

BiocManager::install("GraphExperiment")
```

```{r load_package, message=FALSE}
# Load package after installation
library(GraphExperiment)

set.seed(777) # for reproducibility
```

# Anatomy of a `GraphExperiment` object

Since the `GraphExperiment` class extends the `SingleCellExperiment` class,
all `SingleCellExperiment` slots are present in `GraphExperiment`, including:

- `assays`: list of matrices with primary (e.g., counts) and transformed (e.g.,
log-normalized counts, TPM, etc) data, with features in rows and observations
in columns.
- `colData`: a data frame with column (observation) metadata, such
as sample ID, condition, batch ID, genotype, etc.
- `rowData`: a data frame with row (feature) metadata, such as gene ID,
genomic coordinates, functional annotation, etc.
- `reducedDims`: list of data frames with reduced dimensions, such as PCA,
t-SNE, and UMAP embeddings.

Compared to `SingleCellExperiment` objects, `GraphExperiment` 
provides an additional container:

- `graphs`: list of `igraph` objects containing graphs, including (but 
optional) node and edge attributes. Graphs are used to represent how features
(rows, not columns) relate to each other.[^1]

```{r fig, echo=FALSE, out.width = "100%", fig.cap="The GraphExperiment class."}
knitr::include_graphics("GraphExperiment.png")
```

[^1]: **Note on software design:** if you're familiar with 
`SingleCellExperiment` objects, you probably know that it offers a `rowPairs` 
slot to store pairwise relationships between rows of assays. 
In theory, some of the data stored in `graphs` (of a 
`GraphExperiment` object) could be stored in `rowPairs` (of a 
`SingleCellExperiment`). However, we chose to implement a dedicated slot with
`igraph` objects to guarantee (i) seamless interoperability with other
packages, given that `igraph` is the de facto standard class for graphs in R; 
and (ii) convenience in methods (e.g., subsetting, integration with `rowData`, 
integration across multiple graphs, etc).


The `igraph` data class from the `r BiocStyle::CRANpkg("igraph")` package
is the standard data structure for graph representation in R. If you are
unfamiliar with `igraph` objects, you can learn more about it by reading 
the `r BiocStyle::CRANpkg("igraph")` vignettes.


# Building a `GraphExperiment` object

`GraphExperiment` objects can be created from scratch using the constructor
function `GraphExperiment()`. Below we will simulate a scRNA-seq count matrix 
with some gene (row) and cell (column) metadata, and create a graph based 
on gene-gene correlations.

```{r simulate_slots, message=FALSE}
# Simulate parts of a `GraphExperiment` object
## Assays
gene_ids <- paste0("gene", seq_len(200))
cell_ids <- paste0("cell", seq_len(100))
mat <- matrix(rpois(20000, 5), ncol = 100, dimnames = list(gene_ids, cell_ids))
mat[1:5, 1:5]

## rowData
rdata <- data.frame(
    row.names = gene_ids,
    pathway = sample(c("P1", "P2"), size = length(gene_ids), replace = TRUE),
    coding = sample(c(TRUE, FALSE), size = length(gene_ids), replace = TRUE)
)
head(rdata)

## colData
cdata <- data.frame(
    row.names = cell_ids, 
    cell_type = sample(c("ct1", "ct2"), size = length(cell_ids), replace = TRUE)
)
head(cdata)

## Graph (with node attribute `degree`)
g <- graph_from_adjacency_matrix(
    cor(t(mat)), mode = "undirected", weighted = TRUE
)
g <- set_vertex_attr(g, "degree", value = strength(g))
g
```

To create a `GraphExperiment` object from the constructor function, you would
run:

```{r create_ge}
# Create a `GraphExperiment` object
ge <- GraphExperiment(
    assays = list(counts = mat),
    rowData = rdata,
    colData = cdata,
    graphs = list(cor = g)
)
ge
```

If you're familiar with `SummarizedExperiment` and `SingleCellExperiment` 
objects, you will certainly recognize nearly everything you see in `ge`.
Compared to `SingleCellExperiment` objects, the only difference here is
in the last row, which indicates that this object contains a `graph` 
named 'cor'. 

Importantly, since nodes of graphs are always in sync with `rownames`, 
**feature IDs in rownames and graph node names need to be the same**.
For example, attempting to create a `GraphExperiment` object with some
features from `rownames` missing would lead to an error:

```{r error_missing_from_graph, error = TRUE}
# Remove 'gene1' to 'gene10' from the graph and try to recreate object
g2 <- delete_vertices(g, paste0("gene", 1:10))
GraphExperiment(
    assays = list(counts = mat),
    rowData = rdata,
    colData = cdata,
    graphs = list(cor = g2)
)
```

Alternatively, you can create a `GraphExperiment` object by coercing from
an existing `(Ranged)SummarizedExperiment` or `SingleCellExperiment` object.
For example:

```{r coerce_se}
# Coercing from `SummarizedExperiment`
se <- SummarizedExperiment(list(counts = mat))
ge1 <- as(se, "GraphExperiment")
ge1
```

Note that the `graphs` container is still there, but empty. To access the 
names of all graphs, you will use the `graphNames()` function.

```{r graphNames}
# Get graph names
graphNames(ge)      # 'cor'
graphNames(ge1)     # empty (NULL)
```

# Accessing `graphs` and `rowData` (a.k.a. 'getters')

To access graphs in `graphs`, you can use one of two getter functions:

- `graphs(x)`: retrieves **all** graphs as a simple list of `igraph` objects.
- `graph(x, i)`: retrieves only graph $i$ from the list. Note that $i$
can be a numeric scalar (index) or a character scalar (name).

The design here is equivalent to `assays()` versus `assay()` for 
`SummarizedExperiment` objects.

```{r getters}
# Get graphs
graphs(ge)

# Get first graph by index
graph(ge, 1)

# Get first graph by index (alternative)
graphs(ge)[[1]]

# Get graph by name
graph(ge, "cor")
```

Careful readers will notice that this `igraph` object has node attributes that
were not present in the original graph: 'pathway' and 'coding'. This is 
because `graphs()`/`graph()` automatically extract `rowData` variables (if any) 
and add them to node attributes. The same happens in the other direction:
the `rowData()` method for `GraphExperiment` objects automatically adds
node attributes (if any) to `rowData` variables.

```{r rowdata_getter}
# `graphs` and `rowData` are always in sync!
rowData(ge)
```

Variables 'pathway' and 'coding' were in the original data frame we used as
`rowData`, but variable 'cor__degree' was added by extracting the *degree*
attribute of nodes in graph `cor`.

# Modifying `GraphExperiment` objects (a.k.a. 'setters')

Like in the `SummarizedExperiment` and `SingleCellExperiment` classes,
all getter methods specific to `GraphExperiment` objects have a corresponding
setter method. Such methods allow users to modify elements by adding `<-`
after the getter method. For example, to add or replace a particular graph,
you would use the `graph<-` method as follows:

```{r graph_setter}
# Create a new graph without correlations between -0.4 and 0.4
fg <- graph(ge, "cor") |> 
    delete_vertex_attr("pathway") |>
    delete_vertex_attr("degree") |>
    delete_vertex_attr("coding")

todelete <- abs(E(fg)$weight) <0.4
fg <- delete_edges(fg, which(todelete))
fg

# Add filtered graph a new graph named `fcor`
graph(ge, "fcor") <- fg
ge
```

If you'd like to replace all graphs at once, you could use the `graphs<-`
setter. For example, let's add a few graphs to the `GraphExperiment` object
we created before by coercing from `SummarizedExperiment`:

```{r graphs_setter}
# Taking a quick look (note: nothing in `graphs`)
ge1

# Adding graphs from `ge`
graphs(ge1) <- graphs(ge)
ge1
```

Lastly, you can also rename graphs by updating `graphNames` as follows:

```{r graphNames_setter}
# Rename graphs
graphNames(ge1) <- c("correlations", "correlations_filtered_0.4")
ge1
```

# Subsetting `GraphExperiment` objects

In `SummarizedExperiment` objects, subsetting rows and columns 
(using square brackets, `[`) automatically subsets `rowData` and `colData`
besides the assays. The same is true for `SingleCellExperiment` objects:
subsetting columns automatically subsets `colData` and `reducedDims`.

Since graphs in `GraphExperiment` objects are linked to rows, subsetting
rows of a `GraphExperiment` object automatically subsets rows of the
`assays`, `rowData`, and all graphs in `graphs`. For example:

```{r subset}
# Subsetting `GraphExperiment` object
ge_subset <- ge[1:10, ]

ge_subset
graph(ge_subset, "cor")
```

# Session information {.unnumbered}

This document was created under the following conditions:

```{r session_info}
sessioninfo::session_info()
```

# References {.unnumbered}