Title: | Fetch and Explore the Cornell Lab of Ornithology Open Tree of Life Avian Phylogeny |
Version: | 0.1.1 |
Maintainer: | Eliot Miller <clootlmaintainers@gmail.com> |
URL: | https://github.com/eliotmiller/clootl |
BugReports: | https://github.com/eliotmiller/clootl/issues |
Depends: | R (≥ 4.3.0), ape |
Imports: | stats, dplyr, RCurl, jsonlite |
LazyData: | true |
LazyDataCompression: | xz |
Description: | Fetches the Cornell Lab of Ornithology Open Tree of Life (clootl) tree in a specified taxonomy. Optionally prune it to a given set of study taxa. Provide a recommended citation list for the studies that informed the extracted tree. Tree generated as described in McTavish et al. (2024) <doi:10.1101/2024.05.20.595017>. |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Suggests: | rmarkdown, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-04-09 19:28:43 UTC; luna |
Author: | Eliot Miller [aut, cre], Emily Jane McTavish [aut], Luna L. Sanchez Reyes [ctb, aut] |
Repository: | CRAN |
Date/Publication: | 2025-04-10 14:40:02 UTC |
A complex data store used in the package.
Description
Taxonomy files and phylogenies.
Usage
clootl_data
Format
List of csv files and Newick and Nexus phylogenies
Details
clootl_data = list()
fullTree2021 <- treeGet("1.4","2021", data_path="~/projects/otapi/AvesData")
fullTree2022 <- treeGet("1.4","2022", data_path="~/projects/otapi/AvesData")
fullTree2023 <- treeGet("1.4","2023", data_path="~/projects/otapi/AvesData")
clootl_data$trees$Aves_1.4
$summary.trees$year2021 <- fullTree2021
clootl_data$trees$Aves_1.4
$summary.trees$year2022 <- fullTree2022
clootl_data$trees$Aves_1.4
$summary.trees$year2023 <- fullTree2023
clootl_data$versions <- c('1.2','1.3','1.4') tax2021 <- taxonomyGet(2021, data_path="~/projects/otapi/AvesData") tax2022 <- taxonomyGet(2022, data_path="~/projects/otapi/AvesData") tax2023 <- taxonomyGet(2023, data_path="~/projects/otapi/AvesData")
clootl_data$taxonomy.files$Year2021 <- tax2021 clootl_data$taxonomy.files$Year2022 <- tax2022 clootl_data$taxonomy.files$Year2023 <- tax2023
clootl_data$tax_years <- c("2021","2022","2023") annot_filename <- "~/projects/otapi/AvesData/Tree_versions/Aves_1.4/OpenTreeSynth/annotated_supertree/annotations.json" all_nodes <- jsonlite::fromJSON(txt=annot_filename)
clootl_data$trees$Aves_1.4
$annotations <- all_nodes
This part pre-processes all citations for studies in the tree so we don't need to do any API calls later. studies <-c() for (inputs in all_nodes$source_id_map) studies<-c(studies, inputs$study_id) studies <- unique(studies) study_info <- clootl:::api_studies_lookup(studies)
clootl_data$study_info <- study_info save(clootl_data, file="~/projects/otapi/clootl/data/clootl_data.rda", compress="xz")
Source
https://github.com/eliotmiller/clootl
Extract a tree from the complete Avian Phylogeny for a set of species
Description
Extract a tree from the complete Avian Phylogeny for a set of species
Usage
extractTree(
species = "all_species",
label_type = "scientific",
taxonomy_year = 2023,
version = "1.4",
data_path = FALSE
)
Arguments
species |
A character vector either of scientific names (directly as they come out of the eBird taxonomy, i.e. without underscores) or of six-letter eBird species codes. Any elements of the species vector that do not match a species-level taxon in the specified eBird taxonomy will result in an error. Default is set to "all_species". |
label_type |
Either "scientific" or "code". Default is set to "scientific". |
taxonomy_year |
The eBird taxonomy year the tree should be output in. Current options include 2021, 2022, and 2023. Both numeric and character inputs are acceptable here. Any value aside from these years will result in an error. Default is set 2023. |
version |
The desired version of the tree. Default to the most recent version of the tree. Other versions available are '1.2','1.3','1.4', and can be passed as a character string or as numeric. |
data_path |
Default to |
Details
This function first ensures that the requested output species overlap with species-level taxa in the requested eBird taxonomy. If they do not, the function will error out. The onus is on the user to ensure the requested taxa are valid. This is critical to ensure no unexpected analysis hiccups later–you don't want to find out many steps later that your dataset doesn't match your phylogeny. The eBird database is currently (as of Mar 2025) in 2024 taxonomy. Trees available in 2024 taxonomy will be available by June 2025. The 2025 taxonomy will be released to the public in October or November 2025. The intention is to release a tree in 2025 taxonomy concurrently with the publication of the taxonomy itself.
Value
A phylogeny of the specified taxa in the specified eBird taxonomy version and clootl tree version.
Author(s)
Eliot Miller, Luna Sanchez Reyes, Emily Jane McTavish
Examples
ex1 <- extractTree(species=c("amerob", "canwar", "reevir1", "yerwar", "gockin"),
label_type="code")
ex2 <- extractTree(species=c("Turdus migratorius",
"Setophaga dominica",
"Setophaga ruticilla",
"Sitta canadensis"),
label_type="scientific",
taxonomy_year="2021",
version="1.4")
Get the DOIs and quantify the contribution of published studies
Description
Standing on the shoulders of giants
Usage
getCitations(tree, version = "1.4", data_path = FALSE)
Arguments
tree |
A phylogeny obtained from extractTree (see details). |
version |
The desired version of the tree. Default to the most recent version of the tree. Other versions available are '0.1','1.0','1.2','1.3','1.4' and can be passed as a character string or as numeric. |
data_path |
Default to |
Details
Importantly: an internet connection is required for this function to work, as it relies on Open Tree of Life APIs. The function will determine what proportion of nodes in your phylogeny (possibly but not necessarily pruned to a set of study taxa) are supported by each study that goes into creating the final clootl tree. In any resulting publication, you should always cite the clootl tree, and you should also "always" cite all the trees/DOIs that contributed to your phylogeny. That said, we are well aware of citation and word count limits that plague modern publishing, and for this reason we quantify the contribution of each study; depending on your phylogeny, it is very possible that one or two studies contributed the majority of information. Currently, this function assumes your output tree matches the taxonomy of the corresponding tree on the OpenTree server. Since the function is actually using the named internal nodes for the API query, and these should not be lost between tree versions and taxonomies, this should not matter, but this has not yet been tested.
Value
A dataframe of the percent of internal nodes supported by a given study, as well as the DOI of that study. The proportion of taxa in the tree supported by taxonomic addition only is included in the dataframe.
Author(s)
Eliot Miller, Emily Jane McTavish
Examples
#pull the taxonomy file out
data(clootl_data)
tax <- clootl_data$taxonomy.files$Year2021
ls(tax)
#subset to species only
# TODO: this step seems no longer necessary, is it??
# tax <- tax[tax$CATEGORY=="species",]
#simulate extracting a tree for a particular family
temp <- tax[tax$FAMILY=="Rhinocryptidae (Tapaculos)",]
spp <- temp$SCI_NAME
#get your tree
prunedTree <- extractTree(species=spp, label_type="scientific",
taxonomy_year=2021, version="1.4")
#get your citation DF
yourCitations <- getCitations(tree=prunedTree)
Pull down full AvesData repository to a working directory
Description
Pull down full AvesData repository to a working directory
Usage
get_avesdata_repo(path, overwrite = FALSE)
Arguments
path |
Path to download data zipfile to, and where it will be unpacked. To download into your working directory, use "." |
overwrite |
Default to |
Value
No return value, called to download the Aves Data repository.
Extract a cloud of trees from the complete Avian Phylogeny for a set of species
Description
Extract a cloud of trees from the complete Avian Phylogeny for a set of species
Usage
sampleTrees(
species = "all_species",
label_type = "scientific",
taxonomy_year = 2023,
version = "1.4",
count = 100,
data_path = FALSE
)
Arguments
species |
A character vector either of scientific names (directly as they come out of the eBird taxonomy, i.e. without underscores) or of six-letter eBird species codes. Any elements of the species vector that do not match a species-level taxon in the specified eBird taxonomy will result in an error. Default is set to "all_species". |
label_type |
Either "scientific" or "code". Default is set to "scientific". |
taxonomy_year |
The eBird taxonomy year the tree should be output in. Current options include 2021, 2022, and 2023. Both numeric and character inputs are acceptable here. Any value aside from these years will result in an error. Default is set 2023. |
version |
The desired version of the tree. Default to the most recent version of the tree. Other versions available are '1.2','1.3','1.4', and can be passed as a character string or as numeric. |
count |
Work in progress, can only sample 100 for now. Eventually: The desired number of sampled trees. |
data_path |
Default to |
Details
This function first ensures that the requested output species overlap with species-level taxa in the requested eBird taxonomy. If they do not, the function will error out. The onus is on the user to ensure the requested taxa are valid. This is critical to ensure no unexpected analysis hiccups later–you don't want to find out many steps later that your dataset doesn't match your phylogeny. The eBird database is currently (as of Mar 2025) in 2024 taxonomy. Trees available in 2024 taxonomy will be available by June 2025. The 2025 taxonomy will be released to the public in October or November 2025. The intention is to release a tree in 2025 taxonomy concurrently with the publication of the taxonomy itself.
Value
A set of phylogenies determined in count
of the specified taxa in the specified eBird taxonomy version and clootl
tree version.
Author(s)
Eliot Miller, Luna Sanchez Reyes, Emily Jane McTavish
Examples
if (Sys.getenv("AVESDATA_PATH") != "") {
ex2 <- sampleTrees(species=c("Turdus migratorius",
"Setophaga dominica",
"Setophaga ruticilla",
"Sitta canadensis"))
}
Set path to Aves Data folder already somewhere on your computer Based on https://github.com/CornellLabofOrnithology/auk/blob/main/R/auk-set-ebd-path.r
Description
Set path to Aves Data folder already somewhere on your computer Based on https://github.com/CornellLabofOrnithology/auk/blob/main/R/auk-set-ebd-path.r
Usage
set_avesdata_repo_path(path, overwrite = FALSE)
Arguments
path |
A character vector with the path to the Aves Data folder. |
overwrite |
Boolean, default to |
Value
No return value, called to set the path to the Aves Data folder.
Examples
## Not run:
set_avesdata_repo_path("/home/ejmctavish/AvesData")
## End(Not run)
Load a bird taxonomy into the R environment
Description
taxonomyGet
either reads a taxonomy file and loads it
as a data frame
, or loads the default taxonomy data object.
Usage
taxonomyGet(taxonomy_year, data_path = FALSE)
Arguments
taxonomy_year |
The eBird taxonomy year the tree should be output in. Current options include 2021, 2022, and 2023. Both numeric and character inputs are acceptable here. Any value aside from these years will result in an error. Default is set 2023. |
data_path |
Default to |
Details
This will return a data object that has the taxonomy of the requested year.
Value
A data.frame
with 17 columns of taxonomic information: order, species code, taxon concept, common name, scientific name, family, OpenTree Taxonomy data, etc.
Load a bird tree into the R environment
Description
treeGet
either reads a tree file and loads it
as a phylo
object, or loads the default tree data object.
Usage
treeGet(version, taxonomy_year, data_path = FALSE)
Arguments
version |
The desired version of the tree. Default to the most recent version of the tree. Other versions available are '1.2','1.3','1.4', and can be passed as a character string or as numeric. |
taxonomy_year |
The eBird taxonomy year the tree should be output in. Current options include 2021, 2022, and 2023. Both numeric and character inputs are acceptable here. Any value aside from these years will result in an error. Default is set 2023. |
data_path |
Default to |
Details
This will return a data object that has the requested tree.
Value
A phylo
object with the requested version and taxonomy.