---
title: "Getting Started"
format: html
vignette: >
%\VignetteIndexEntry{Getting Started}
%\VignetteEngine{quarto::html}
%\VignetteEncoding{UTF-8}
---
```{r}
#| include: false
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# parsermd
The goal of parsermd is to extract the content of an R Markdown file to allow for programmatic interactions with the document's contents (i.e. code chunks and markdown text). The goal is to capture the fundamental structure of the document and as such we do not attempt to parse every detail of the Rmd. Specifically, the yaml front matter, markdown text, and R code are read as text lines allowing them to be processed using other tools.
The package supports both traditional chunk options (specified in the chunk header) and YAML-style chunk options (specified as special comments within chunks). When both formats are used for the same option, YAML options take precedence.
## Installation
`parsermd` can be installed from CRAN with:
```r
install.packages("parsermd")
```
You can install the latest development version of `parsermd` from [GitHub](https://github.com/rundel/parsermd) with:
```r
remotes::install_github("rundel/parsermd")
```
```{r}
library(parsermd)
```
## Parsing Rmds
This is a basic example which shows you the basic abstract syntax tree (AST) that results from parsing a simple Rmd file,
```{r}
#| label: example
rmd = parsermd::parse_rmd(system.file("examples/minimal.Rmd", package = "parsermd"))
```
The R Markdown document is parsed and stored in a flat, ordered list object containing tagged elements. By default the package will present a hierarchical view of the document where chunks and markdown text are nested within headings, which is shown by the default print method for `rmd_ast` objects.
```{r}
#| label: tree
print(rmd)
```
If you would prefer to see the underlying flat structure, this can be printed by setting `flat = TRUE` with `print`.
```{r}
#| label: no_headings
print(rmd, flat = TRUE)
```
Additionally, to ease the manipulation of the AST the package supports the transformation of the object into a tidy tibble with `as_tibble` or `as.data.frame` (both return a tibble).
```{r}
#| label: tibble
as_tibble(rmd)
```
and it is possible to convert from these data frames back into an `rmd_ast`.
```{r}
#| label: as_ast
as_ast( as_tibble(rmd) )
```
Finally, we can also convert the `rmd_ast` back into an R Markdown document via `as_document`
```{r}
#| label: as_doc
cat(
as_document(rmd),
sep = "\n"
)
```
## Working with the AST
Once we have parsed an R Markdown document, there are a variety of things that we can do with our new abstract syntax tree (ast). Below we will demonstrate some of the basic functionality within `parsermd` to manipulate and edit these objects as well as check their properties.
```{r}
rmd = parse_rmd(system.file("examples/hw01-student.Rmd", package="parsermd"))
rmd
```
Say we were interested in examining the solution a student entered for Exercise 1 - we can get access to this using the `rmd_select` function and its selection helper functions, specifically the `by_section` helper.
```{r}
rmd_select(rmd, by_section( c("Exercise 1", "Solution") ))
```
To view the content instead of the AST we can use the `as_document()` function,
```{r}
rmd_select(rmd, by_section( c("Exercise 1", "Solution") )) |>
as_document()
```
Note that this gives us the *Exercise 1* and *Solution* headings and the contained markdown text, if we only wanted the markdown text then we can refine our selector to only include nodes with the type `rmd_markdown` via the `has_type` helper.
```{r}
rmd_select(rmd, by_section(c("Exercise 1", "Solution")) & has_type("rmd_markdown")) |>
as_document()
```
This approach uses the tidyselect `&` operator within the selection to find the intersection of the selectors `by_section(c("Exercise 1", "Solution"))` and `has_type("rmd_markdown")`. Alternative the same result can be achieved by chaining multiple `rmd_select`s together,
```{r}
rmd_select(rmd, by_section(c("Exercise 1", "Solution"))) |>
rmd_select(has_type("rmd_markdown")) |>
as_document()
```
### Wildcards
One useful feature of the `by_section()` and `has_label()` selection helpers is that they support [glob](https://en.wikipedia.org/wiki/Glob_(programming)) style pattern matching. As such we can do the following to extract all of the solutions from our document:
```{r}
rmd_select(rmd, by_section(c("Exercise *", "Solution")))
```
Similarly, if we wanted to just extract the chunks that involve plotting we can match for chunk labels with a "plot" prefix,
```{r}
rmd_select(rmd, has_label("plot*"))
```