--- title: "Introduction to babelgene" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to babelgene} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Overview Genomic analysis of model organisms frequently requires the use of databases based on human data or making comparisons to patient-derived resources. This requires harmonization of gene names into the same gene space. The babelgene R package helps to simplify the conversion process. It provides gene orthologs/homologs: * for multiple frequently studied model organisms, such as mouse, rat, fly, and zebrafish * sourced from multiple databases * as gene symbols, NCBI Entrez, and Ensembl IDs * without accessing external resources and requiring an active internet connection * in an R-friendly "[tidy](https://r4ds.had.co.nz/tidy-data.html)" format with one gene pair per row ## Usage You can install the babelgene R package from [CRAN](https://cran.r-project.org/package=babelgene). ```{r install-package, eval=FALSE} install.packages("babelgene") ``` Load babelgene. ```{r load-package, message=FALSE} library(babelgene) ``` The main functionality is accessed via the `orthologs()` function which takes one or more genes and outputs a data frame of predicted ortholog/homolog pairs. The output data frame contains gene symbols and IDs for human and the specified species. Several examples are provided below. Get mouse equivalents for a set of human genes. ```{r human-input} orthologs(genes = c("TP53", "EGFR", "IL6", "TGFB1", "CD4"), species = "mouse") ``` Input genes are assumed to be human by default. You can specify if the input genes are human with the `human` parameter. ```{r fly-input} orthologs(genes = "Pu", species = "fruit fly", human = FALSE) ``` It is possible to search by NCBI Entrez or Ensembl IDs instead of gene symbols. ```{r ensembl-input} orthologs(genes = "ENSG00000111640", species = "mouse", human = TRUE) ``` The `orthologs()` function requires the `species` parameter to be set (both scientific and common names are acceptable). You can check all the species that can be queried with the help of the `species()` function. ```{r species} species() ``` ## Details The package is based on the data provided by the Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC) at the European Bioinformatics Institute. The HGNC Comparison of Orthology Predictions (HCOP) integrates the orthology assertions predicted for human genes by eggNOG, Ensembl Compara, HGNC, HomoloGene, Inparanoid, NCBI Gene Orthology, OMA, OrthoDB, OrthoMCL, Panther, PhylomeDB, TreeFam and ZFIN. The name babelgene is derived from the [Babel Fish](https://en.wikipedia.org/wiki/Babel_fish), a fictional species of fish that could translate for humans.