--- title: "Using ChIPDBData with TFEA.ChIP" author: "Yosra Berrouayel and Luis del Peso" date: "`r Sys.Date()`" output: BiocStyle::html_document vignette: > %\VignetteIndexEntry{Using ChIPDBData} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(TFEA.ChIP) ``` # Overview The `ChIPDBData` package provides curated ChIP-seq transcription factor target databases designed for use with `r Biocpkg("TFEA.ChIP")`. Each dataset contains a collection of ChIP-seq experiments (e.g., from ENCODE) along with their associated gene targets. These datasets are structured as **`ChIPDB` list objects**, and can be accessed either manually or via the `getChIPDB()` function. **Important:** When loading any dataset, make sure it is assigned to an object named `ChIPDB`. This is crucial, as `TFEA.ChIP` looks for a globally defined object called `ChIPDB` and will **not recognize** it under any other name. # Installation To install the package, start R and enter: ```{r lib-install, message=FALSE, eval=FALSE} if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("ChIPDBData") ``` Once `ChIPDBData` is installed, it can be loaded with the following command: ```{r load-lib, message=FALSE, warning=FALSE, cache=FALSE} library(ChIPDBData) ``` # Available Datasets The following datasets are currently available in the `ChIPDBData` package: - ENCODE rE2G (complete) - ENCODE rE2G subsets filtered by: - Score thresholds: 0.25, 0.5, 0.75 - Depth thresholds: 50, 100, 200, 300 - CREdb - GeneHancer These can be accessed via the **ExperimentHub** interface: ```{r load-chipdb-eh} library(ExperimentHub) eh <- ExperimentHub() dbs <- query(eh, "ChIPDBData") dbs # Example: Load ENCODE rE2G300d ChIPDB <- dbs[["EH9854"]] # IMPORTANT: Assign to 'ChIPDB' ``` Alternatively, you can retrieve datasets programmatically using `getChIPDB()` with any of the following identifiers: "ENCODE_rE2G", "ENCODE_rE2G_25score", "ENCODE_rE2G_50score", "ENCODE_rE2G_75score", "ENCODE_rE2G_50depth", "ENCODE_rE2G_100depth", "ENCODE_rE2G_200depth", "ENCODE_rE2G_300depth", "CREdb" or "GeneHancer". For example: ```{r load-chipdb} # Load the ENCODE dataset filtered by depth >= 300 ChIPDB <- getChIPDB("ENCODE_rE2G_300depth") ``` A `ChIPDB` object is a named list with two main components: 1. A **character vector of Entrez Gene IDs**, representing the universe of possible targets. 2. A **named list of ChIP-seq experiments**, where each element is a vector of integer indices pointing to the genes in component 1. Each entry represents the gene targets of a transcription factor in a specific experiment. Exploring the structure: ```{r explore-db} # List names of the top-level elements names(ChIPDB) # Preview the first few Entrez IDs ChIPDB[[1]][1:5] # View names of ChIP-seq experiments names(ChIPDB[[2]])[1:3] # Show gene indices for the first experiment ChIPDB[[2]][[1]][1:5] # Get actual gene IDs from those indices ChIPDB[[1]][ ChIPDB[[2]][[1]][1:5] ] ``` # Integration with `TFEA.ChIP` To perform transcription factor enrichment analysis, start by loading your differential expression data and defining the regulated and control gene sets. Ensure that your ChIP-seq database is loaded and assigned to `ChIPDB`. The `TFEA.ChIP` functions will automatically use this object for analysis. **Important:** Make sure to load `ChIPDB` after running `library(TFEA.CHIP)`. Otherwise, the package's default database (a limited subset from GeneHancer) will overwrite it. ```{r eval=TRUE} # Load and preprocess differential expression table data('hypoxia_DESeq') hypoxia_table <- preprocessInputData(hypoxia_DESeq) # Define gene sets Genes.Upreg <- Select_genes(hypoxia_table, min_LFC = 1) Genes.Control <- Select_genes(hypoxia_table, min_pval = 0.5, max_pval = 1, min_LFC = -0.25, max_LFC = 0.25 ) # Run TF enrichment CM_list <- contingency_matrix(Genes.Upreg, Genes.Control) results <- getCMstats(CM_list) # Display results head(results) ``` # Session Info ```{r session-info} sessionInfo() ```