--- title: "HubPub: Help with publication of Hub packages" author: - name: Kayla Interdonato affiliation: Roswell Park Comprehensive Cancer Center, Buffalo, NY - name: Martin Morgan affiliation: Roswell Park Comprehensive Cancer Center, Buffalo, NY date: "`r Sys.Date()`" output: BiocStyle::html_document: toc: true toc_float: true package: HubPub vignette: > %\VignetteIndexEntry{HubPub: Help with publication of Hub packages} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction `HubPub` provides users with functionality to help with the _Bioconductor_ Hub structures. The package provides the ability to create a skeleton of a Hub style package that the user can then populate with the necessary information. There are also functions to help add resources to the Hub pacakge metadata files as well as publish data to the _Bioconductor_ S3 bucket. # Installation Install the most recent version from Bioconductor: ```{r bioconductor, eval = FALSE} if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("HubPub") ``` Then load `HubPub`: ```{r load, message = FALSE} library(HubPub) ``` # HubPub ## Creating a Hub styled package The `create_pkg()` function creates the skeleton of a package that follows the guidelines for a _Bioconductor_ Hub type package. More information about what are the requirements and content for a Hub style package the developer can look at the "Creating A Hub Package" vignette from this package. `create_pkg()` requires a path to where the packages is to be created and the type of package that should be created ("AnnotationHub" or "ExperimentHub"). There is also a variable `use_git` that indicates if the package should be set up with git (default is `TRUE`). NOTE: This function is intended for a developer that has not created the package yet. If the package has already been created, then this function will not benefit the developer. There are a couple other functions in this package that deal with resources that might be helpful, more on these later in the vignette. ```{r create} fl <- tempdir() create_pkg(file.path(fl, "examplePkg"), "ExperimentHub") ``` Once the package is created the developer can go through and make any changes to the package. For example, the DESCRIPTON file contains very basic requirements but the developer should go back and fill in the 'Title:' and 'Description:' fields. ## Adding a resource to the metadata file Another useful function in `HubPub` is `add_resource()`. This function can be useful for developers who are creating a new Hub related package or for developers who want to add a new resource to an existing Hub package. The purpose of this function is to add a hub resource to the package metadata.csv file. The function requires the name of the package (or the path to the newly created package) and a named list with the data to be added to the resource. To get the elements and content for this list look at `?hub_metadata`. There is also information in the "Creating A Hub Package" vignette from this package. ```{r resource} metadata <- hub_metadata( Title = "ENCODE", Description = "a test entry", BiocVersion = "4.1", Genome = NA_character_, SourceType = "JSON", SourceUrl = "http://www.encodeproject.org", SourceVersion = "x.y.z", Species = NA_character_, TaxonomyId = as.integer(9606), Coordinate_1_based = NA, DataProvider = "ENCODE Project", Maintainer = "tst person ", RDataClass = "Rda", DispatchClass = "Rda", Location_Prefix = "s3://experimenthub/", RDataPath = "ENCODExplorerData/encode_df_lite.rda", Tags = "ENCODE:Homo sapiens" ) add_resource(file.path(fl, "examplePkg"), metadata) ``` Then if you want to see what the metadata file looks like you can read in the csv file like the following. ```{r read_metadata} resource <- file.path(fl, "examplePkg", "inst", "extdata", "metadata.csv") tst <- read.csv(resource) tst ``` ## Publishing the resource to AWS S3 The final function in `HubPub` helps the developer with publishing data resources to an Bioconductor AWS S3. The function utilizes functions for the `aws.s3` package to place files or directories on S3. The developer should have already contacted the Bioconductor hubs maintainers to get the necessary credentials to access the bucket. Once the credentials are received the developer should declare them in the system environment before running this function. The function requires a path to the file or name of the directory to be added to the bucket and a name for how the object should be named on the bucket. If adding a directory be sure there are no nested directories and only files. The below code chunk demonstrates the use of the function using a dummy dataset. It will only work if the necessary global environments have been declared with the hub credentials. ```{r publish} ## For publishing directories with multiple files fl <- tempdir() utils::write.csv(mtcars, file = file.path(fl, "mtcars1.csv")) utils::write.csv(mtcars, file = file.path(fl, "mtcars2.csv")) publish_resource(fl, "test_dir") ## For publishing a single file utils::write.csv(mtcars, file = file.path(fl, "mtcars3.csv")) publish_resource(file.path(fl, "mtcars3.csv"), "test_dir") ``` # Session Information ```{r session_info} sessionInfo() ```