Type: Package
Title: R Interface to the 'NIH RePORTER Project' API
Version: 0.1.4
Description: Methods to easily build requests in the non-standard JSON schema required by the National Institute of Health (NIH)'s 'RePORTER Project API' https://api.reporter.nih.gov/#/Search/post_v2_projects_search. Also retrieve and process result sets as either a ragged or flattened 'tibble'.
License: MIT + file LICENSE
URL: https://github.com/bikeactuary/repoRter.nih
BugReports: https://github.com/bikeactuary/repoRter.nih/issues
Depends: R (≥ 4.0.0)
Imports: assertthat (≥ 0.2.1), crayon (≥ 1.4.1), dplyr (≥ 1.0.7), httr (≥ 1.4.2), janitor (≥ 2.1.0), jsonlite (≥ 1.7.2), lubridate (≥ 1.7.10), magrittr (≥ 2.0.1), purrr (≥ 0.3.4), tibble (≥ 3.1.3)
Suggests: devtools, ggplot2, ggrepel, knitr, tinytex, rmarkdown, scales, tufte, spelling
VignetteBuilder: knitr
Encoding: UTF-8
Language: en-US
LazyData: true
RoxygenNote: 7.2.3
NeedsCompilation: no
Packaged: 2023-01-14 02:23:49 UTC; mbarr
Author: Michael Barr, ACAS, MAAA, CPCU [cre, aut]
Maintainer: "Michael Barr, ACAS, MAAA, CPCU" <mike@bikeactuary.com>
Repository: CRAN
Date/Publication: 2023-01-15 17:10:02 UTC

covid_response_code translation

Description

A tibble containing name translations between covid_response_code and the funding source(s)

Usage

data("covid_response_codes")

Format

A tibble with 6 rows and 3 columns:

covid_response_code

the name for a data element when specified in the payload criteria of a request; NA indicates that this is not available as payload criteria (can not search/filter on)

funding_source

the name of the funding source, often some federal legislation

fund_src

a short name for the funding source

References

NIH RePORTER API Documentation


get_nih_data

Description

Easily send a pre-made JSON request to NIH RePORTER Project API, retrieve and process the results

Usage

get_nih_data(
  query,
  max_pages = NULL,
  flatten_result = FALSE,
  return_meta = FALSE
)

Arguments

query

A valid JSON request formatted for the RePORTER Project API, as returned by the make_req method

max_pages

numeric(1); default: NULL; An integer specifying to only fetch (up to) the first max_pages number of pages from the result set. Useful for testing your query/obtaining schema information. Default behavior is to fetch all pages.

flatten_result

(default: FALSE) If TRUE, flatten nested dataframes and collapse nested vectors to a single character column with elements delimited by a semi-colon

return_meta

(default: FALSE) If TRUE, will return a list containing your result set as well as the meta data - this includes a count of total projects matching your query and can be useful for programming.

Details

A request to the RePORTER Project API requires retrieving paginated results, combining them, and often flattening the combined ragged data.frame to a familiar flat format which we can use in analyses. This method handles all of that for you.

Value

When return_meta = FALSE: a tibble containing your result set (up to API max of 10,000 records); else if include_meta = TRUE, a named list containing the result set and the metadata from the initial API response.

If an API error occurs, this method will print an informative message and return NA.

Examples


library(repoRter.nih)

## make the usual request
req <- make_req(criteria = 
                    list(advanced_text_search = 
                        list(operator = "Or",
                             search_field = "all",
                             search_text = "sarcoidosis lupus") ),
                 message = FALSE)

## get the data ragged
## Not run: 
res <- get_nih_data(req,
                    max_pages = 1)

## get the data flattened
res_flattened <- get_nih_data(req,
                              flatten_result = TRUE,
                              max_pages = 1)

## End(Not run)


make_req

Description

Easily generate a json request with correct schema to be passed to NIH RePORTER Project API

Usage

make_req(
  criteria = list(fiscal_years = lubridate::year(Sys.Date())),
  include_fields = NULL,
  exclude_fields = NULL,
  offset = 0,
  limit = 500,
  sort_field = NULL,
  sort_order = NULL,
  message = TRUE
)

Arguments

criteria

list(); the RePORTER Project API query criteria used to filter results (projects). See Details for schema and other spec rules.

include_fields

character(); optional; use to return only the specified fields from the result. See Details for valid return field names

exclude_fields

character(); optional; use to exclude specified fields from the result.

offset

integer(1); optional; default: 0; usually not explicitly passed by user. Used to set the start index of the results to be retrieved (indexed from 0). See Details.

limit

integer(1); optional; default: 500; restrict the number of project records returned per page/request inside the calling function. Defaulted to the maximum allowed value of 500. Reducing this may help with bandwidth/timeout issues.

sort_field

character(1); optional; use to sort the result by the specified field. May be useful in retrieving complete result sets above the API maximum of 10K (but below 2x the max = 20K)

sort_order

character(1): optional; one of "asc" or "desc"; sort_field must be specified.

message

logical(1); default: TRUE; print a message with the JSON to console/stdout. You may want to suppress this at times.

Details

The maximum number of records that can be returned from any result set is 10,000. Also, the maximum record index in the result set that can be returned is 9,999 - corresponding to the 10,000'th record in the set. These constraints from the NIH API defy any intuition that the offset argument would be useful to return records beyond this 10K limit. If you need to do this, you have two options:

criteria must be specified as a list and may include any of the following (all optional) top level elements:

Field Names

Full listing of available field names which can be specified in include_fields, exclude_fields, and sort_field is located here

Value

A standard json (jsonlite flavor) object containing the valid JSON request string which can be passed to get_nih_data or elsewhere

Examples

library(repoRter.nih)

## all projects funded in the current (fiscal) year
req <- make_req() 

## projects funded in 2019 through 2021
req <- make_req(criteria = list(fiscal_years = 2019:2021))

## projects funded in 2021 where the principal investigator first name is
## "Michael" or begins with "Jo" 
req <- make_req(criteria = 
                    list(fiscal_years = 2021,
                         pi_names = 
                             list(first_name = c("Michael", "Jo*"),
                                  last_name = c(""), # must specify
                                  any_name = character(1) # same here
                                  )
                         )
                )

## all covid-related projects except those funded by American Rescue Plan
## and specify the fields to return, sorting ascending on ApplId column
req <- make_req(criteria = 
                    list(covid_response = c("Reg-CV", "CV", "C3", "C4", "C5")
                    ),
                include_fields = 
                    c("ApplId", "SubprojectId", "FiscalYear", "Organization",
                      "AwardAmount", "CongDist", "CovidResponse",
                      "ProjectDetailUrl"),
                sort_field = "ApplId",
                sort_order = "asc")
                
## using advanced_text_search with boolean search string

string <- "(head AND trauma) OR \"brain damage\" AND NOT \"psychological\""
req <- make_req(criteria = 
                    list(advanced_text_search =
                         list(operator = "advanced",
                              search_field = c("terms", "abstract"),
                              search_text = string
                              )
                         )
                )


NIH RePORTER Field Translation

Description

A tibble containing name translations between payload criteria, column selection/sorting arguments, and the result set.

Usage

data("nih_fields")

Format

A tibble with 43 rows and 5 columns:

payload_name

the name for a data element when specified in the payload criteria of a request; NA indicates that this is not available as payload criteria (can not search/filter on).

response_name

the name of the field returned by RePORTER (and what you will see in all cases when flatten_result = FALSE.

include_name

the name of the field when specified in include_fields, exclude_fields, and sort_field argument.

return_class

the class of the corresponding column in a tibble returned by get_nih_data(). The tibble contains nested data frames and lists of variable length vectors.

Note: when flatten_result = TRUE, the original field name will prefix the names of the new flattened columns. See: jsonlite:flatten.

References

NIH RePORTER API Documentation