--- title: "Introduction to derfinderData" author: - name: Leonardo Collado-Torres affiliation: - &libd Lieber Institute for Brain Development, Johns Hopkins Medical Campus - &ccb Center for Computational Biology, Johns Hopkins University email: lcolladotor@gmail.com output: BiocStyle::html_document: self_contained: yes toc: true toc_float: true toc_depth: 2 code_folding: show date: "`r doc_date()`" package: "`r pkg_ver('derfinderData')`" vignette: > %\VignetteIndexEntry{Introduction to derfinderData} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r vignetteSetup, echo=FALSE, message=FALSE, warning = FALSE} ## Track time spent on making the vignette startTime <- Sys.time() ## Bib setup library("RefManageR") ## Write bibliography information bib <- c( BiocStyle = citation("BiocStyle"), brainspan = RefManageR::BibEntry(bibtype = "Unpublished", key = "brainspan", title = "Atlas of the Developing Human Brain [Internet]. Funded by ARRA Awards 1RC2MH089921-01, 1RC2MH090047-01, and 1RC2MH089929-01.", author = "BrainSpan", year = 2011, url = "http://developinghumanbrain.org"), RefManageR = citation("RefManageR")[1], knitr = citation("knitr")[3], rmarkdown = citation("rmarkdown")[1] ) ``` # Overview `derfinderData` is a small data package with information extracted from _BrainSpan_ (see [here](http://download.alleninstitute.org/brainspan/MRF_BigWig_Gencode_v10/bigwig/)) `r Citep(bib[['brainspan']])` for 24 samples restricted to chromosome 21. The BigWig files in this package can then be used by other packages for examples, such as in `derfinder` and `derfinderPlot`. While you could download the data from _BrainSpan_ `r Citep(bib[['brainspan']])`, this package is helpful for scenarios where you might encounter some difficulties such as the one described in this [thread](https://stat.ethz.ch/pipermail/bioc-devel/2014-September/006329.html). # Data The following code builds the phenotype table included in `derfinderData`. For two randomly selected structures, 12 samples were chosen with 6 of them being fetal samples and the other 6 coming from adult individuals. For the fetal samples, the age in PCW is transformed into age in years by age_in_years = (age_in_PCW - 40) / 52 In other data sets you might want to subtract 42 instead of 40 if some observations have PCW up to 42. ```{r 'pheno'} ## Construct brainspanPheno table brainspanPheno <- data.frame( gender = c("F", "M", "M", "M", "F", "F", "F", "M", "F", "M", "M", "F", "M", "M", "M", "M", "F", "F", "F", "M", "F", "M", "M", "F"), lab = c("HSB97.AMY", "HSB92.AMY", "HSB178.AMY", "HSB159.AMY", "HSB153.AMY", "HSB113.AMY", "HSB130.AMY", "HSB136.AMY", "HSB126.AMY", "HSB145.AMY", "HSB123.AMY", "HSB135.AMY", "HSB114.A1C", "HSB103.A1C", "HSB178.A1C", "HSB154.A1C", "HSB150.A1C", "HSB149.A1C", "HSB130.A1C", "HSB136.A1C", "HSB126.A1C", "HSB145.A1C", "HSB123.A1C", "HSB135.A1C"), Age = c(-0.442307692307693, -0.365384615384615, -0.461538461538461, -0.307692307692308, -0.538461538461539, -0.538461538461539, 21, 23, 30, 36, 37, 40, -0.519230769230769, -0.519230769230769, -0.461538461538461, -0.461538461538461, -0.538461538461539, -0.519230769230769, 21, 23, 30, 36, 37, 40) ) brainspanPheno$structure_acronym <- rep(c("AMY", "A1C"), each = 12) brainspanPheno$structure_name <- rep(c("amygdaloid complex", "primary auditory cortex (core)"), each = 12) brainspanPheno$file <- paste0("http://download.alleninstitute.org/brainspan/MRF_BigWig_Gencode_v10/bigwig/", brainspanPheno$lab, ".bw") brainspanPheno$group <- factor(ifelse(brainspanPheno$Age < 0, "fetal", "adult"), levels = c("fetal", "adult")) ``` We can then save the phenotype information, which is included in `derfinderData`. ```{r 'savePheno', eval = FALSE} ## Save pheno table save(brainspanPheno, file = "brainspanPheno.RData") ``` Here is how the data looks like: ```{r 'explorePheno', results = 'asis'} library("knitr") ## Explore pheno p <- brainspanPheno[, -which(colnames(brainspanPheno) %in% c("structure_acronym", "structure_name", "file"))] kable(p, format = "html", row.names = TRUE) ``` We can verify that this is indeed the information included in `derfinderData`. ```{r 'verifyPheno'} ## Rename our newly created pheno data newPheno <- brainspanPheno ## Load the included data library("derfinderData") ## Verify identical(newPheno, brainspanPheno) ``` Using the phenotype information, you can use `derfinder` to extract the base-level coverage information for chromosome 21 from these samples. Then, you can export the data to BigWig files. ```{r 'getData', eval = FALSE} library("derfinder") ## Determine the files to use and fix the names files <- brainspanPheno$file names(files) <- gsub(".AMY|.A1C", "", brainspanPheno$lab) ## Load the data system.time(fullCovAMY <- fullCoverage( files = files[brainspanPheno$structure_acronym == "AMY"], chrs = "chr21" )) # user system elapsed # 4.505 0.178 37.676 system.time(fullCovA1C <- fullCoverage( files = files[brainspanPheno$structure_acronym == "A1C"], chrs = "chr21" )) # user system elapsed # 2.968 0.139 27.704 ## Write BigWig files dir.create("AMY") system.time(createBw(fullCovAMY, path = "AMY", keepGR = FALSE)) # user system elapsed # 5.749 0.332 6.045 dir.create("A1C") system.time(createBw(fullCovA1C, path = "A1C", keepGR = FALSE)) # user system elapsed # 5.025 0.299 5.323 ## Check that 12 files were created in each directory all(c(length(dir("AMY")), length(dir("A1C"))) == 12) # TRUE ## Save data for examples running on Windows save(fullCovAMY, file = "fullCovAMY.RData") save(fullCovA1C, file = "fullCovA1C.RData") ``` These BigWig files are available under _extdata_ as shown below: ```{r 'findData'} ## Find AMY BigWigs dir(system.file("extdata", "AMY", package = "derfinderData")) ## Find A1C BigWigs dir(system.file("extdata", "A1C", package = "derfinderData")) ``` # Reproducibility Code for creating the vignette ```{r createVignette, eval=FALSE} ## Create the vignette library("rmarkdown") system.time(render("derfinderData.Rmd", "BiocStyle::html_document")) ## Extract the R code library("knitr") knit("derfinderData.Rmd", tangle = TRUE) ``` Date the vignette was generated. ```{r reproducibility1, echo=FALSE} ## Date the vignette was generated Sys.time() ``` Wallclock time spent generating the vignette. ```{r reproducibility2, echo=FALSE} ## Processing time in seconds totalTime <- diff(c(startTime, Sys.time())) round(totalTime, digits = 3) ``` `R` session information. ```{r reproducibility3, echo=FALSE} ## Session info library("sessioninfo") session_info() ``` # Bibliography This vignette was generated using `BiocStyle` `r Citep(bib[['BiocStyle']])` with `knitr` `r Citep(bib[['knitr']])` and `rmarkdown` `r Citep(bib[['rmarkdown']])` running behind the scenes. Citations made with `knitcitations` `r Citep(bib[['knitcitations']])`. ```{r vignetteBiblio, results='asis', echo=FALSE} ## Print bibliography PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html")) ```