---
title: "HubPub: Help with publication of Hub packages"
author:
- name: Kayla Interdonato
  affiliation: Roswell Park Comprehensive Cancer Center, Buffalo, NY
- name: Martin Morgan
  affiliation: Roswell Park Comprehensive Cancer Center, Buffalo, NY
date: "`r Sys.Date()`"
output:
    BiocStyle::html_document:
        toc: true
        toc_float: true
package: HubPub
vignette: >
  %\VignetteIndexEntry{HubPub: Help with publication of Hub packages}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)
```

# Introduction

`HubPub` provides users with functionality to help with the _Bioconductor_ Hub
structures. The package provides the ability to create a skeleton of a Hub
style package that the user can then populate with the necessary information.
There are also functions to help add resources to the Hub pacakge metadata
files as well as publish data to the _Bioconductor_ S3 bucket.

# Installation
Install the most recent version from Bioconductor:

```{r bioconductor, eval = FALSE}
if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("HubPub")
```

Then load `HubPub`:

```{r load, message = FALSE}
library(HubPub)
```

# HubPub

## Creating a Hub styled package

The `create_pkg()` function creates the skeleton of a package that follows the
guidelines for a _Bioconductor_ Hub type package. More information about what
are the requirements and content for a Hub style package the developer can look
at the "Creating A Hub Package" vignette from this package. 

`create_pkg()` requires a path to where the packages is to be created and the type
of package that should be created ("AnnotationHub" or "ExperimentHub"). There is
also a variable `use_git` that indicates if the package should be set up with 
git (default is `TRUE`).

NOTE: This function is intended for a developer that has not created the package
yet. If the package has already been created, then this function will not
benefit the developer. There are a couple other functions in this package that
deal with resources that might be helpful, more on these later in the vignette.

```{r create}
fl <- tempdir()
create_pkg(file.path(fl, "examplePkg"), "ExperimentHub")
```

Once the package is created the developer can go through and make any changes to
the package. For example, the DESCRIPTON file contains very basic
requirements but the developer should go back and fill in the 'Title:' and
'Description:' fields.

## Adding a resource to the metadata file

Another useful function in `HubPub` is `add_resource()`. This function can be
useful for developers who are creating a new Hub related package or for
developers who want to add a new resource to an existing Hub package. The
purpose of this function is to add a hub resource to the package metadata.csv
file. The function requires the name of the package (or the path to the newly
created package) and a named list with the data to be added to the resource. To
get the elements and content for this list look at `?hub_metadata`. There is 
also information in the "Creating A Hub Package" vignette from this package.

```{r resource}
metadata <- hub_metadata(
    Title = "ENCODE",
    Description = "a test entry",
    BiocVersion = "4.1",
    Genome = NA_character_,
    SourceType = "JSON",
    SourceUrl = "http://www.encodeproject.org",
    SourceVersion = "x.y.z",
    Species = NA_character_,
    TaxonomyId = as.integer(9606),
    Coordinate_1_based = NA,
    DataProvider = "ENCODE Project",
    Maintainer = "tst person <tst@email.com>",
    RDataClass = "Rda",
    DispatchClass = "Rda",
    Location_Prefix = "s3://experimenthub/",
    RDataPath = "ENCODExplorerData/encode_df_lite.rda",
    Tags = "ENCODE:Homo sapiens"
)

add_resource(file.path(fl, "examplePkg"), metadata)
```

Then if you want to see what the metadata file looks like you can read in the
csv file like the following.

```{r read_metadata}
resource <- file.path(fl, "examplePkg", "inst", "extdata", "metadata.csv")
tst <- read.csv(resource)
tst
```

## Publishing the resource to AWS S3

The final function in `HubPub` helps the developer with publishing data resources
to an Bioconductor AWS S3. The function utilizes functions for the `aws.s3`
package to place files or directories on S3. The developer should have already
contacted the Bioconductor hubs maintainers to get the necessary credentials to
access the bucket. Once the credentials are received the developer should
declare them in the system environment before running this function. The
function requires a path to the file or name of the directory to be added to the
bucket and a name for how the object should be named on the bucket. If adding a
directory be sure there are no nested directories and only files.

The below code chunk demonstrates the use of the function using a dummy dataset.
It will only work if the necessary global environments have been declared with
the hub credentials.

```{r publish}
## For publishing directories with multiple files
fl <- tempdir()
utils::write.csv(mtcars, file = file.path(fl, "mtcars1.csv"))
utils::write.csv(mtcars, file = file.path(fl, "mtcars2.csv"))
publish_resource(fl, "test_dir")

## For publishing a single file
utils::write.csv(mtcars, file = file.path(fl, "mtcars3.csv"))
publish_resource(file.path(fl, "mtcars3.csv"), "test_dir")
```

# Session Information

```{r session_info}
sessionInfo()
```