Rollups and Transitivity

Shawn T O’Neil

Vignette updated: May-04-2026

First, let’s load some required libraries, and instantiate a session-cached monarch_engine() for querying.

library(monarchr)
library(tidygraph)
library(dplyr)

e <- monarch_engine()

Knowledge graphs frequently incorporate ontologies, which include complex hierarchies of classes and sub-classes. Let’s visualize a couple of levels of this hierarchy for the phenotype "leg phenotype" (removing "leg phenotype" as it will make the next examples clearer).

phenos <- e |>
    fetch_nodes(name == "leg phenotype") |>
    expand_n(
        predicates = "biolink:subclass_of",
        categories = "biolink:PhenotypicFeature",
        direction = "out",
        n = 4
    ) |>
    activate(nodes) |>
    filter(name != "leg phenotype")

plot(phenos)

It is not uncommon for data like this to come with additional information; if these were a set of disease diagnoses, we might have patient counts associated with each. Since patients receive diagnoses of varying specificity, there may be counts on any subtype.

Hypothesizing these phenotypes as diagnoses, we’ll simulate some count information, plotting it in the node labels:

set.seed(42)

num_nodes <- nrow(nodes(phenos))
phenos_counted <- phenos |>
    activate(nodes) |>
    mutate(count = rpois(num_nodes, lambda = 5))

plot(phenos_counted,
    node_label = paste(name, " || count: ", count)
)

A “rollup” might thus ask, how many patients are associated with each phenotype, if we include all of it’s descendants? For example, “lower limb segment phenotype” (8 patients) is a subclass of “limb segment phenotype” (4 patients), so the total number of “limb segment phenotype” patients includes both (12 patients).

The roll_up() function allows us to compute this information. It is designed to work with dplyr’s mutate() on node data: we provide the column specifying information to aggregate, a function to apply over the values (amongst all descendants), and whether each node should include its own value in the aggregation.

phenos_counted_rolled <- phenos_counted |>
    activate(nodes) |>
    mutate(total = roll_up(count, fun = sum, include_self = TRUE))

plot(phenos_counted_rolled,
    node_label = paste0(
        name,
        " || count: ", count,
        " || total: ", total
    )
)

The corresponding roll_down() aggregates in the opposite direction (not shown).

Other aggregations, transferring information

When performing a rollup, each node receives the specified column, indexed to include only its descendants (and itself, if include_self is set). This is then passed to the aggregating function fun.

To see how this can be useful, we’ll start by introducing another function, transfer(). Much like roll_up(), this function is designed to be used with mutate() on node data; its purpose is to transfer information across edges, usually from nodes of one kind to another. We’ll start by fetching all of the subtypes of Niemann-Pick disease, and all known causal genes.

npc_genes <- e |>
    fetch_nodes(name == "Niemann-Pick disease") |>
    descendants() |>
    expand(predicates = "biolink:causes")

plot(npc_genes)

Now, we might wish for disease nodes to have an attribute reflecting their causal genes. This information is captured in the graph, but not as a part of the nodes. The transfer() function ‘pulls’ information across edges:

npc_genes_causal <- npc_genes |>
    activate(nodes) |>
    mutate(caused_by = transfer(name, over = "biolink:causes", direction = "out"))

plot(npc_genes_causal,
    node_label = paste0(name, " || caused by: ", caused_by)
)
npc_genes_causal

Here, transfer is moving information ‘over’ (or across) "biolink:causes" edges in an outward direction, along the direction of the edge. The transferred information is being drawn from source nodes’ name, resulting in a new caused_by column in the node table.

nodes(npc_genes_causal) |>
    select(name, caused_by)

In cases where a transfer would result in multiple values being collected at the destination node, the result will be a list column.

To finish this example, we use roll_up() to collect, for each diseases, the set of genes that cause it or any of its subtypes.

npc_genes_causal_rolled <- npc_genes_causal |>
    activate(nodes) |>
    mutate(any_caused_by = roll_up(caused_by,
        fun = unique,
        include_self = TRUE,
        predicates = "biolink:subclass_of"
    ))

plot(npc_genes_causal_rolled,
    node_label = paste0(name, " || any caused by: ", any_caused_by)
)

npc_genes_causal_rolled

The inclusion of NA values may not be desired (it signals that at least one of the rolled nodes had a caused_by of NA). We could write an aggregating function that removes NA and supply that; this would also a good use case for purrr’s compose() (fun = compose(unique, na.omit)).

Transitive closures and reductions

Let’s return to the patient-count example, using the rolled-up data:

plot(phenos_counted_rolled,
    node_label = paste0(
        name,
        " || count: ", count,
        " || total: ", total
    )
)

It may be the case that to protect patient privacy (again, pretending these phenotypes are disease diagnoses associated with patients) we want to remove nodes that have a count less than 6. If we do so however, we lose connectivity:

censored <- phenos_counted_rolled |>
    activate(nodes) |>
    filter(!count < 6)

plot(censored,
    node_label = paste0(
        name,
        " || count: ", count,
        " || total: ", total
    )
)

To fix this, we can first compute the transitive_closure() of the graph, with respect to an edge predicate we want to treat as transitive (defaulting to biolink:subclass_of). We color edges by primary_knowledge_source to highlight that newly created transitive edges are given knowledge source transitive_<predicate>, but use the same predicate. The result is busy, and in general the number of transitive edges can be \(O(n^2)\) in the number of nodes.

phenos_closed <- phenos_counted_rolled |>
    transitive_closure(predicate = "biolink:subclass_of")

plot(phenos_closed,
    node_label = paste0(
        name,
        " || count: ", count,
        " || total: ", total
    ),
    edge_color = primary_knowledge_source,
    edge_linetype = predicate
)

Now we can try our removal:

closed_censored <- phenos_closed |>
    activate(nodes) |>
    filter(!count < 6)

plot(closed_censored,
    node_label = paste0(
        name,
        " || count: ", count,
        " || total: ", total
    ),
    edge_color = primary_knowledge_source,
    edge_linetype = predicate
)

The graph retains its connectivity, but has redundant edges. The transitive_reduction() function removes these, again according to a specified transitive predicate (defaulting again to biolink:subclass_f).

phenos_final <- closed_censored |>
    transitive_reduction(predicate = "biolink:subclass_of")

plot(phenos_final,
    node_label = paste0(
        name,
        " || count: ", count,
        " || total: ", total
    ),
    edge_color = primary_knowledge_source,
    edge_linetype = predicate
)




Session Info

sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.6.0 RC (2026-04-17 r89917)
##  os       Ubuntu 24.04.4 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  C
##  ctype    en_US.UTF-8
##  tz       America/New_York
##  date     2026-05-04
##  pandoc   2.7.3 @ /usr/bin/ (via rmarkdown)
##  quarto   1.8.25 @ /usr/local/bin/quarto
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package      * version   date (UTC) lib source
##  archive        1.1.13    2026-04-12 [3] CRAN (R 4.6.0)
##  assertthat     0.2.1     2019-03-21 [3] CRAN (R 4.6.0)
##  bslib          0.10.0    2026-01-26 [3] CRAN (R 4.6.0)
##  cachem         1.1.0     2024-05-16 [3] CRAN (R 4.6.0)
##  cli            3.6.6     2026-04-09 [3] CRAN (R 4.6.0)
##  dichromat      2.0-0.1   2022-05-02 [3] CRAN (R 4.6.0)
##  digest         0.6.39    2025-11-19 [3] CRAN (R 4.6.0)
##  dplyr        * 1.2.1     2026-04-03 [3] CRAN (R 4.6.0)
##  evaluate       1.0.5     2025-08-27 [3] CRAN (R 4.6.0)
##  farver         2.1.2     2024-05-13 [3] CRAN (R 4.6.0)
##  fastmap        1.2.0     2024-05-15 [3] CRAN (R 4.6.0)
##  generics       0.1.4     2025-05-09 [3] CRAN (R 4.6.0)
##  ggforce        0.5.0     2025-06-18 [3] CRAN (R 4.6.0)
##  ggplot2        4.0.3     2026-04-22 [3] CRAN (R 4.6.0)
##  ggraph         2.2.2     2025-08-24 [3] CRAN (R 4.6.0)
##  ggrepel        0.9.8     2026-03-17 [3] CRAN (R 4.6.0)
##  glue           1.8.1     2026-04-17 [3] CRAN (R 4.6.0)
##  graphlayouts   1.2.3     2026-02-21 [3] CRAN (R 4.6.0)
##  gridExtra      2.3       2017-09-09 [3] CRAN (R 4.6.0)
##  gtable         0.3.6     2024-10-25 [3] CRAN (R 4.6.0)
##  hms            1.1.4     2025-10-17 [3] CRAN (R 4.6.0)
##  htmltools      0.5.9     2025-12-04 [3] CRAN (R 4.6.0)
##  igraph         2.3.0     2026-04-21 [3] CRAN (R 4.6.0)
##  jquerylib      0.1.4     2021-04-26 [3] CRAN (R 4.6.0)
##  jsonlite       2.0.0     2025-03-27 [3] CRAN (R 4.6.0)
##  kableExtra     1.4.0     2024-01-24 [3] CRAN (R 4.6.0)
##  knitr          1.51      2025-12-20 [3] CRAN (R 4.6.0)
##  lifecycle      1.0.5     2026-01-08 [3] CRAN (R 4.6.0)
##  magrittr       2.0.5     2026-04-04 [3] CRAN (R 4.6.0)
##  MASS           7.3-65    2025-02-28 [4] CRAN (R 4.6.0)
##  memoise        2.0.1     2021-11-26 [3] CRAN (R 4.6.0)
##  monarchr     * 2.99.0    2026-05-04 [1] Bioconductor
##  neo2R          2.4.2     2024-01-18 [2] CRAN (R 4.6.0)
##  otel           0.2.0     2025-08-29 [3] CRAN (R 4.6.0)
##  pillar         1.11.1    2025-09-17 [3] CRAN (R 4.6.0)
##  pkgconfig      2.0.3     2019-09-22 [3] CRAN (R 4.6.0)
##  polyclip       1.10-7    2024-07-23 [3] CRAN (R 4.6.0)
##  purrr          1.2.2     2026-04-10 [3] CRAN (R 4.6.0)
##  R.methodsS3    1.8.2     2022-06-13 [3] CRAN (R 4.6.0)
##  R.oo           1.27.1    2025-05-02 [3] CRAN (R 4.6.0)
##  R.utils        2.13.0    2025-02-24 [3] CRAN (R 4.6.0)
##  R6             2.6.1     2025-02-15 [3] CRAN (R 4.6.0)
##  RColorBrewer   1.1-3     2022-04-03 [3] CRAN (R 4.6.0)
##  Rcpp           1.1.1-1.1 2026-04-24 [3] CRAN (R 4.6.0)
##  readr          2.2.0     2026-02-19 [3] CRAN (R 4.6.0)
##  rlang          1.2.0     2026-04-06 [3] CRAN (R 4.6.0)
##  rmarkdown      2.31      2026-03-26 [3] CRAN (R 4.6.0)
##  rstudioapi     0.18.0    2026-01-16 [3] CRAN (R 4.6.0)
##  S7             0.2.2     2026-04-22 [3] CRAN (R 4.6.0)
##  sass           0.4.10    2025-04-11 [3] CRAN (R 4.6.0)
##  scales         1.4.0     2025-04-24 [3] CRAN (R 4.6.0)
##  sessioninfo    1.2.3     2025-02-05 [3] CRAN (R 4.6.0)
##  stringi        1.8.7     2025-03-27 [3] CRAN (R 4.6.0)
##  stringr        1.6.0     2025-11-04 [3] CRAN (R 4.6.0)
##  svglite        2.2.2     2025-10-21 [3] CRAN (R 4.6.0)
##  systemfonts    1.3.2     2026-03-05 [3] CRAN (R 4.6.0)
##  textshaping    1.0.5     2026-03-06 [3] CRAN (R 4.6.0)
##  tibble         3.3.1     2026-01-11 [3] CRAN (R 4.6.0)
##  tidygraph    * 1.3.1     2024-01-30 [3] CRAN (R 4.6.0)
##  tidyr          1.3.2     2025-12-19 [3] CRAN (R 4.6.0)
##  tidyselect     1.2.1     2024-03-11 [3] CRAN (R 4.6.0)
##  tweenr         2.0.3     2024-02-26 [3] CRAN (R 4.6.0)
##  tzdb           0.5.0     2025-03-15 [3] CRAN (R 4.6.0)
##  vctrs          0.7.3     2026-04-11 [3] CRAN (R 4.6.0)
##  viridis        0.6.5     2024-01-29 [3] CRAN (R 4.6.0)
##  viridisLite    0.4.3     2026-02-04 [3] CRAN (R 4.6.0)
##  withr          3.0.2     2024-10-28 [3] CRAN (R 4.6.0)
##  xfun           0.57      2026-03-20 [3] CRAN (R 4.6.0)
##  xml2           1.5.2     2026-01-17 [3] CRAN (R 4.6.0)
##  yaml           2.3.12    2025-12-10 [2] CRAN (R 4.6.0)
## 
##  [1] /tmp/Rtmp8k0xO5/Rinst22b1e5645fbe01
##  [2] /home/pkgbuild/packagebuilder/workers/jobs/4139/R-libs
##  [3] /home/biocbuild/bbs-3.23-bioc/R/site-library
##  [4] /home/biocbuild/bbs-3.23-bioc/R/library
##  * ── Packages attached to the search path.
## 
## ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────