--- title: "TreeHarp S4 class" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{TreeHarp S4 class} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: references.bib --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(autoharp) ``` # Introduction Within `autoharp`, the TreeHarp S4 class is used to represent an R expression. It can then be manipulated in several ways in order to perform static code analysis of student submissions. The `lintr` package does an excellent job of parsing R code, but it provides too much detail for the simpler tasks that `autoharp` carries out. It is still used in the constructor, but some of the parsed output from `lintr` are dropped. For instance, the `(` parentheses are dropped. To understand the elements in a TreeHarp, let us consider the use of a simple expression. Suppose we fit a linear model to variables in a dataset. To create a TreeHarp object from an expression, we provide the expression together with the `quote = TRUE` argument. This is important because method dispatch is performed based on that second argument, not the first! If we were to dispatch on the first, R would evalate the expression in order to check its class - thus destroying the expression we intended to capture. ```{r th-example-1, echo=TRUE} tree1 <- TreeHarp(quote(lm(y ~ x1 + x2, data=mydata)), TRUE) ``` TreeHarp objects have an associated plot method for visualisation of the expression. This method relies on the plotting functions from the `igraph` package. The full set of parameter options from `igraph` can be utilised when plotting TreeHarp objects. Figure 1 displays the visualisation of the `tree1` object created earlier. ```{r th-example-1-plot, echo=TRUE, fig.align='center', collapse=TRUE, fig.cap='Example TreeHarp object'} opar <- par(mar=c(0,0,0,0)) plot(tree1, vertex.size=25, asp=0.6, vertex.color="gray", vertex.frame.color=NA) par(opar) ``` # Slots There are 4 slots in a TreeHarp object. The only required one for valid instantiation is the adjList. ## adjList ```{r th1-adjlist} slot(tree1, "adjList") ``` This slot contains an adjacency list that represents the tree structure of the code. Nodes in a tree are labelled in Breadth-First Search (BFS) order. Thus the root node has id 1, and does not appear in the adjacency list. To avoid redundancy, the TreeHarp convention is to list each edge only once, as a child. Here's what we mean: node 2 in the example above is a neighbour of node 1 and 4, but it only appears under node 1. It does not appear as an adjacent node of node 4 because it is not child of node 4. Terminal nodes (leafs) have a NULL entry in the list. ## nodeTypes If the TreeHarp object was constructed from an R language object, this slot will be automatically populated. To identify node types, functions from [rlang](https://cran.r-project.org/package=rlang) are applied to sub-expressions defined by nodes recursively. Each node is then identified as either: * a function call, or * a formal argument. The `nodeTypes` slot stores the information in a data frame with one row per node. The columns are: 1. *id* (node id). The root node has id 1. 2. *name*. The name of the node. 3. *call_status*. A TRUE/FALSE column indicating if the node was a call or not. 4. *formal_arg*. If the node is not a call, then this column will indicate if it is a formal argument or not. If it is not a call and not a formal argument, it is a symbol representing an R object - we call this an actual argument. 5. *depth*. This is the depth of the node in the tree. The root of the tree has depth 1. autoharp provides a getter function to retrieve the node types easily. ```{r th1-nodetypes, echo=TRUE} get_node_types(tree1) ``` ## call The call slot stores the original expression that was used to construct the TreeHarp object, just in case it needs to be executed later. ```{r ex1_call} slot(tree1, "call") ``` ## repr The repr slot contains a string representation of the object. If the original TreeHarp object has been modified, then it may not be a proper R expression, so this slot stores the best representation of it. This slot is used when the object is printed, or when `show` is called on the S4 object. ```{r ex1_repr} tree1 ``` # TreeHarp Methods As we have already demonstrated, the plot method exists for this class. It relies on the tree layout of igraph package, but additional arguments can be used to customise the plot. For instance, we could use colour to distinguish between calls and non-call nodes: ```{r th-example-2-plot, echo=TRUE, fig.align='center', fig.cap='TreeHarp object with colored nodes'} opar <- par(mar=c(0,0,0,0)) plot(tree1, vertex.size=25, asp=0.6, vertex.color=tree1@nodeTypes$call_status) par(opar) ``` These are the other S4 methods defined for the TreeHarp class: * `length`: returns the number of nodes. * `names`: returns the node names. * `get_parent_id`: returns the parent id of a node. * `get_child_ids`: returns the ids of the children of a node. * `get_node_types`: returns the nodeTypes slot from a TreeHarp object. * `get_adj_list`: returns the adjacency list slot from a TreeHarp object.