--- title: "Creating an `ontology_index`" author: "Daniel Greene" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Creating an ontology_index} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} --- An `ontology_index` can be obtained by loading a pre-existing one - for example by calling `data(hpo)`, reading ontologies encoded in OBO format into R using the function `get_ontology`, or by calling the function `ontology_index` explicitly. An `ontology_index` is a named `list` of properties for each term, where each property is represented by a `list` or `vector`. Each of these property lists is named by term, facilitating simple lookups of properties by term name. All valid `ontology_index` objects contain `id`, `name`, `parents`, `children` and `ancestors` properties for each term. Additional properties can be added to an `ontology_index`, although they are not required by functions in the package. For details on how to use an `ontology_index`, see the 'Introduction to ontologyX' vignette. ## Reading in an OBO file The function `get_ontology` can read ontologies encoded in OBO format into R as `ontology_index` objects. By default, the properties `id`, `name`, `obsolete`, `parents`, `children` and `ancestors` are populated. ```{r echo=FALSE} library(ontologyIndex) ``` To call the function: ```{r eval=FALSE} ontology <- get_ontology(file) ``` The properties `parents`, `children` and `ancestors` are determined by a given set of relations between terms: the `propagate_relationships` argument ("is_a" by default). Thus the `parents` of a term are set of terms to which it is related by any type of relation contained in `propagate_relationships`; the `children` are those terms related by the inverse relations and `ancestors` are those obtained by propagating the `propagate_relationships` relations (note: the resulting set includes the term itself). ```{r eval=FALSE} ontology <- get_ontology(file, propagate_relationships=c("is_a", "part_of")) ``` The relations given in the `propagate_relationships` argument should be named as they are labelled in the OBO file. In order to see a complete list of relations used in an OBO file, pass the file's path to the function `get_relation_names`. E.g. for the gene ontology: ```{r eval=FALSE} get_relation_names("go.obo") ``` ```{r eval=TRUE, echo=FALSE} c("is_a", "regulates", "part_of", "has_part", "happens_during", "negatively_regulates", "positively_regulates", "occurs_in", "ends_during") ``` Additional information is often present in the original file - for example definitions, labelled by the `def` tag in OBO format. `get_ontology` decides which properties to export based on the `extract_tags` argument. By default `extract_tags="minimal"`, resulting in only the properties `id`, `name`, `obsolete`, `parents`, `children` and `ancestors` being exported. It is possible to include all properties given in the file by setting `extract_tags="everything"`. The names of the properties included in the returned `ontology_index` are then the same as the names of the tags in OBO format. ```{r eval=FALSE} ontology <- get_ontology(file, extract_tags="everything") ``` All properties are stored in the returned `ontology_index` as lists, except for the following, which are coerced to `character` or `logical` vectors as appropriate: `"id", "name", "def", "comment", "obsolete", "created_by", "creation_date"`. Further properties can be mapped to vectors if required, modifying the returned `ontology_index` as a list, e.g. ```{r eval=FALSE} ontology$property <- simplify2array(ontology$property) ``` ## Adding term properties Modifying an existing `ontology_index` to add term properties is the same as adding to a `list` or `data.frame`. In the example below, we add the number of children for each term: ```{r eval=FALSE} ontology$number_of_children <- sapply(ontology$children, length) ``` In the same manner, a valid `ontology_index` can be built up from scratch as a list, of course requiring that the standard properties are included for use with functions in `ontologyIndex`. ## Converting from OWL to OBO format In order to read in ontologies in OWL syntax, it is recommended to first convert to OBO format, for example using the ROBOT command line tool https://github.com/ontodev/robot. ## Term equivalence If the option `merge_equivalent_terms` in `get_ontology`/`get_OBO` is set to `TRUE` (the default), then terms marked `equivalent_to` target terms are merged and properties aggregated (except for those listed above coerced to vectors, in which case the values that would be assigned to the target term are used). ## Creating an `ontology_index` explicitly The function `ontology_index` can be used to create an object with class `ontology_index`. This could be useful for instance if the user wished to convert a directed acyclic graph (DAG) with edges representing sub/super-class relationships into an `ontology_index`. It is similar to the function `data.frame`: it accepts a variable number of arguments corresponding to properties for ontological terms, which must each be a vector or list of the same length (except the `version` argument, which can be any object and should contain any information about the version of the ontology). The only mandatory argument is the `parents` argument, and should be a `list` of `character` vectors giving the IDs of the 'parents'/'superclasses' of each term. The term IDs can either be supplied as the `names` attribute of the `parents` or as a separate `id` argument of the same length as `parents`. The human-readable term names can be passed as the `names` argument (defaults to the same as `id`). As usual the `children` and `ancestors` properties are derived from the `parents`. Warnings are generated if any IDs given in the `parents` argument are not in the `id` argument. A simple invocation: ```{r eval=TRUE} animal_superclasses <- list(animal=character(0), mammal="animal", cat="mammal", fish="animal") animal_ontology <- ontology_index(parents=animal_superclasses) unclass(animal_ontology) ``` For more details, see the help page for the function, `?ontology_index`.