--- title: "fourSynergy" output: BiocStyle::html_document: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{fourSynergy_vignette} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 72 bibliography: references.bib --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, # Width of plots in inches fig.height = 5 # Height of plots in inches ) ``` # Introduction Circular Chromosome Conformation Capture Sequencing (4C-seq) is a sequencing technique enabling the identification of chromatin interactions. Currently, there are several tools available for 4C-seq analysis: - `r BiocStyle::Biocpkg('r3Cseq')` - fourSig () - FourCSeq () (deprecated) - peakC () - R.4Cker () A benchmarking of those tools revealed that none of the tools performed adequately in all evaluated tasks [@Walter2019]. To address this we developed fourSynergy, a ensemble algorithm that leverages synergies between the base tools *r3cseq*, *fourSig*, *peakC*, and *R.4cker*. To this end, all of these tools need to be run, which can be achieved using the *fourSynergy* pipeline. This pipeline, written in *snakemake*, includes quality control, interaction calling, and post-processing steps to prepare the output of the base tools for the ensemble algorithm. # Requirements **If you want to quickly start with testing the package you can jump to the section [R analysis](#R-analysis) and run the *fourSynergy* R package on testdata.** ## Pipeline - snakemake installation We recommend the pipeline to be executed in a docker. # Running fourSynergy ## Installation *fourSynergy* can be installed as follows: ```{r install, eval=FALSE} BiocManager::install("fourSynergy") ``` ## Requirements To run *fourSynergy*, you need R (Version 4.5 or higher). ## Getting started A basic workflow of 4C-seq data is demonstrated in the following on an exemplary dataset. First, we need to run the fourSynergy pipeline. **If you want to quickly start with testing the package you can jump to the section [R analysis](#R-analysis) and run the *fourSynergy* R package on testdata.** ## fourSynergy Pipeline Before starting the R analysis you have to run the fourSynergy snakemake pipeline, which is available at (). ### Input Input for the fourSynergy pipeline are... - 4C-seq FASTQ files - config.yaml - Reference genome #### config The config file is crucial to run *fourSynergy*. All relevant experimental information is stored here. You can find an exemplary info.yaml in `inst/extdata/Datasets/Demo/info.yaml`. ### Output The pipeline creates a results directory in the directory were its executed. In the results directory is a directory called like the `author` provided in the config file. In this directory, all results of the base tools and *fourSynergy* are written. ``` ├── Datasets ├── ... └── results └── author ├── alignment -> alignment results ├── basic4Cseq -> Basic4Cseq results ├── foursig -> fourSig results ├── peakC -> peakC results ├── qc -> FASTQC results ├── r3cseq -> r3Cseq results ├── r4cker -> R.4Cker results └── sia -> standardized tools results ``` # R analysis {#r-analysis} After the pipeline is executed we can start with the analysis in R by loading the *fourSynergy* R package. ```{r libs, message=FALSE} library(fourSynergy) ``` ## Load data Than the pipeline results can be loaded into an *fourSynergy* object. In the provided example, `config` and the pathes are stored in extdata. However, in a real-world scenario, `res_path` would typically be located in `./results/[DatasetName]` and the `tracks` in `./results/[DatasetName]/alignment`. ```{r data, eval=TRUE} # Load config config <- system.file("extdata", "Datasets", "Demo", "info.yaml", package = "fourSynergy" ) # Load results path res_path <- system.file("extdata", "results", "Demo", package = "fourSynergy" ) # Load path of .bedGraphs tracks <- system.file("extdata", "results", "Demo", "alignment", package = "fourSynergy" ) sia <- createIa( res_path = res_path, config = config, tracks = tracks ) ``` The *fourSynergy* object stores information about the single tool calls in condition and in control as well as metadata. | Slot | Data | |------------------|--------------------------------| | metadata | Metadata from config | | expInteractions | Base tool results - experiment | | ctrlInteractions | Base tool results - control | | expConsensus | Ensemble results - experiment | | ctrlConsensus | Ensemble results - control | | vp | Viewpoint position | | vfl | Virtual fragment library (VFL) | | tracks | Path of .bedGraphs | | differential | Differential ensemble results | ## Base tool results The base tool results are stored in the fourSynergy object in the slots `expInteractions` and `ctrlInteractions`. ```{r slots, eval=TRUE} slotNames(sia) ``` They can be visualized using `plotIaIndiviualTools()`. There is also the object to pass genes of interest, which are than shown in the karyoplot. ```{r plot_ia, message=FALSE, eval=TRUE, warning=FALSE} plotIaIndiviualTools(sia, genes_of_interest = c("Ldlrad4", "Cep76")) ``` Further, the base tool results can be visualized in higher resolution using `plotBaseTracks()`. ```{r plot_tracks_base, message=FALSE, eval=TRUE, warning=FALSE} plotBaseTracks(sia) ``` ## Ensemble results To perform ensemble calling you need to specify if you want to use the F1 or the AUPRC model. ```{r ens} sia <- consensusIa(sia, model = "AUPRC") ``` The ensemble results can than be visualized. ```{r plot_ens, warning=FALSE} plotConsensusIa(ia = sia) ``` The consensus results can be visualized in higher resolution using `plotConsensusTracks()`. ```{r plot_ens_tracks, message=FALSE, eval=TRUE, warning=FALSE} plotConsensusTracks(sia) ``` ## Differential interaction analysis *fourSynergy* provides differential interaction analysis to compare interactions across conditions. The differential interaction analysis is based on DESeq2 using count data to model the variability in the data using a negative binomial distribution. In the example dataset the `fitType` was set to 'mean', by default this value is 'local'. *(The plots do not look representative here because the BAM file size has been drastically reduced for this example.)* ```{r diff} sia <- differentialAnalysis(ia = sia, fitType = "mean") ``` The results of the differential analysis are stored in the `@differential` slot of the `sia` object and can be visualized using `plotDiffIa()`. ```{r diffplot, message=FALSE} plotDiffIa(ia = sia, genes_of_interest = c("Ldlrad4", "Cep76")) ``` ```{r session} sessionInfo() ```