--- title: "The Rega User Guide" bibliography: Rega-refs.bib date: "Last edited: `r format(Sys.time(), '%d %B, %Y')`" author: - name: "Igor Cervenka" affiliation: - Department of Biomedicine, University of Basel, Basel, Switzerland - Swiss Institute of Bioinformatics, Basel, Switzerland email: igor.cervenka@unibas.ch - name: "Athimed El Taher" affiliation: - Department of Biomedicine, University of Basel, Basel, Switzerland - Swiss Institute of Bioinformatics, Basel, Switzerland email: athimed.eltaher@unibas.ch - name: "Robert Ivánek" affiliation: - Department of Biomedicine, University of Basel, Basel, Switzerland - Swiss Institute of Bioinformatics, Basel, Switzerland email: robert.ivanek@unibas.ch package: "`r BiocStyle::pkg_ver('Rega')`" output: BiocStyle::html_document: toc_float: true BiocStyle::pdf_document: default abstract: "Rega provides interface to European Genome-Phenome Archive (EGA)." vignette: > %\VignetteIndexEntry{The Rega User Guide} %\VignetteKeywords{API} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console markdown: wrap: 72 --- ```{r options, include=FALSE} library(knitr) opts_chunk$set( echo = TRUE, eval = TRUE, message = FALSE, warning = FALSE, collapse = FALSE, comment = NA, prompt = FALSE, tidy = FALSE, dev = "png", out.width = "100%" ) ## check the output type out_type <- opts_knit$get("rmarkdown.pandoc.to") if (is.null(out_type)) { out_type <- "html" } ## add styling if (out_type == "html") { BiocStyle::markdown() } else if (out_type == "latex") { BiocStyle::latex() } ``` # Load packages ```{r load-package} library(Rega) ``` # Setting Up Secure Credentials The Rega package follows security best practices by storing sensitive information (like API keys or passwords) in credential store or as environment variables rather than hard-coding them into your scripts. To keep your credentials secure, we offer two options (see below for details): - Using operating system credential store - Using environmental variables with a secret key to encrypt/decrypt data ## Using operating system credential store You can add an entry to your operating system credential store using `keyring` package. By default, `Rega` will look for a `REGA_EGA` service name. You should also specify your username, to avoid typing it every time you connect to the API. Avoid using more than a single user for this service, for simplicity `Rega` will only retrieve the first username. ```{r add-credential, eval=FALSE} # You will be prompted for password keyring::key_set( service = "REGA_EGA", username = "" ) ``` ## Using environmental variables with `httr2` secret ### Create and Store a Master Secret Key ```{r generate-secret} # Run this in your R console to generate a key httr2::secret_make_key() ``` To make this key available every time you open R, you must store it in your user-level `.Renviron` file. - Run `usethis::edit_r_environ()` to open the file. - Add the following line (replace the string with the key you just generated): `REGA_KEY=""` - Save and close the file. Important: Restart R after saving to ensure the variable is loaded into your environment. ### Encrypt your EGA password Now, use your master key (REGA_KEY) to encrypt your actual EGA password. This ensures that even if someone sees your .Renviron file, they cannot read your password. Run `httr2::secret_encrypt("", "REGA_KEY")` and copy the encrypted password. ### Store the encrypted password Finally, store the encrypted string (not your plain-text password) in your .Renviron file. - Open your .Renviron again: usethis::edit_r_environ(). - Add the encrypted string as a new variable: `REGA_EGA_PASSWORD=""` - Save and close. ### Store your username - Run `usethis::edit_r_environ()` to open the file. - Add the following line: `REGA_EGA_USERNAME=""` - Save and close the file. ### Restart R # Fill in the submission template Download the empty MS Excel template from `inst/extdata/ega_full_template_v3.xlsx` and fill it in according to the instructions in the 'Instructions' tab. # Data submission ## Metadata parsing The default parser is pre-configured to handle the bundled xlsx template (`inst/extdata/ega_full_template_v3.xlsx`) automatically. As long as the templateis filled out according to the provided instructions, the default parameters will work seamlessly, and no manual adjustments are required. If you need to customize the parser's behavior—such as toggling the `c4gh` file extension, you can modify the settings via the YAML configuration. To do this, create a local copy of `inst/extdata/default_parser_params.yaml`, adjust the values as needed, and pass the path of your new file to the `param_file` argument in the `default_parser` function. ```{r parse-metadata} metadata_file <- system.file( "extdata/submission_example.xlsx", package = "Rega" ) parsed_metadata <- default_parser(metadata_file) head(parsed_metadata) ``` ## Metadata validation To ensure a seamless submission process, the package includes a client-side validation layer. This system automatically cross-references your metadata against the schema requirements of both the EGA API and the underlying target database. To ensure your submission continues smoothly, you should address all flagged validation failures and errors. ```{r validate-metadata} validation_summary <- default_validator(parsed_metadata) validation_summary ``` ## Running `new_submission` workflow ```{r run-workflow, eval=FALSE} responses <- new_submission(parsed_metadata, logfile = "log.yaml") ``` # Manual client creation If you encounter errors during metadata submission and would like to get more details, you can create a client with verbose logging. Extract EGA API using the bundled YAML specification and create a client using the embedded `httr2` OAuth authentication (default), changing the verbosity. ```{r client-with-oauth} api <- extract_api() ega <- create_client(api, verbosity = 3) ``` Run the `new_submission` workflow with the custom client. ```{r run-workflow-verbose, eval=FALSE} responses <- new_submission(parsed_metadata, client = ega) ``` This will create your metadata submission in EGA and fill in all provided information. However, this workflow does not finalize your submission. In order to finalize submission either use the GUI interface of EGA Submitter Portal, or run `finalise_submission("", "")`. Note that the release date should ideally be around 2 weeks away from metadata submission to allow for review by EGA team. # Other workflows There are several other workflow available: - `get_submission`: - `get_entry_by_title`: - `delete_submission_contents`: - `delete_submission`: - `rollback_submission`: Please see the corresponding help pages for more details. ## Examples You can get the detailed data on individual tables (`submissions`, `studies`, `samples`, `experiments`, `runs`, `analyses` and `datasets`) that contain a specific string in their `title` column using `get_entry_by_title` function. ```{r get-entry-by-title, eval=FALSE} # checks all tables resp <- get_entry_by_title("RNASeq") # checks only samples and studies, logs responses resp <- get_entry_by_title( "RNASeq", type = c("samples", "studies"), logfile = "log.yaml" ) ``` Or delete the entire contents of current submission metadata via `delete_submission_contents` workflow or delete the entire submission by using the `delete_submission` workflow. ```{r delete-submission-contents, eval=FALSE} resp <- delete_submission_contents(00001, ega) resp <- delete_submission(00001, ega) ``` # Utilities If you wish to create your own templates for EGA submissions, we provide a few functions to retrieve properties and enums through API and save them in text files. We will use the API and the client created above. Relevant functions include: - `get_schemas()` - `get_properties()` # Notes ## Bearer token authentication For testing, debugging and prototyping purposes, it is possible to directly use generated bearer token with API when creating the client. It is then the responsibility of the user to track the validity and refresh the token as necessary. ```{r client-with-token, eval=FALSE} bt <- ega_token() ega <- create_client(api, bt$access_token) ega$get__enums() ``` # Issues Workflow for updating the submission metadata by `PUT` method is not available. For this particular use case, the users are advised to create the client with `ega <- create_client(extract_api())` and use individual functions prefixed with `put__` e.g. `ega$put__samples__accession_id` to update the submission metadata. # Session Info ```{r session-info} sessionInfo() ```