Help for package neonOS

Title:

Basic Data Wrangling for NEON Observational Data

Version:

1.1.0

Date:

2024-06-14

Description:

NEON observational data are provided via the NEON Data Portal https://www.neonscience.org and NEON API, and can be downloaded and reformatted by the 'neonUtilities' package. NEON observational data (human-observed measurements, and analyses derived from human-collected samples, such as tree diameters and algal chemistry) are published in a format consisting of one or more tabular data files. This package provides tools for performing common operations on NEON observational data, including checking for duplicates and joining tables.

Depends:

R (≥ 4.0)

Imports:

utils, data.table, httr, curl, jsonlite

Suggests:

testthat, neonUtilities

License:

AGPL-3

URL:

https://github.com/NEONScience/NEON-OS-data-processing

BugReports:

https://github.com/NEONScience/NEON-OS-data-processing/issues

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.1

NeedsCompilation:

Packaged:

2024-06-14 19:50:09 UTC; clunch

Author:

Claire Lunch

[aut, cre, ctb], Eric Sokol

[aut, ctb], Natalie Robinson

[aut, ctb], NEON (National Ecological Observatory Network) [aut]

Maintainer:

Claire Lunch <clunch@battelleecology.org>

Repository:

CRAN

Date/Publication:

2024-06-14 20:50:02 UTC

Count data table from Breeding landbird point counts (DP1.10003.001)

Description

An example set of NEON observational data. Contains the individual bird observations from Niwot Ridge (NIWO) in 2019, as published in RELEASE-2021.

Usage

brd_countdata

Format

A data frame with 472 rows and 26 columns

uid: Unique ID within NEON database; an identifier for the record
namedLocation: Name of the measurement location in the NEON database
domainID: Unique identifier of the NEON domain
siteID: NEON site code
plotID: Plot identifier (NEON site code_XXX)
plotType: NEON plot type in which sampling occurred: tower, distributed or gradient
pointID: Identifier for a point location
startDate: The start date-time or interval during which an event occurred
eventID: An identifier for the set of information associated with the event, which includes information about the place and time of the event
pointCountMinute: The minute of sampling within the point count period
targetTaxaPresent: Indicator of whether the sample contained individuals of the target taxa
taxonID: Species code, based on one or more sources
scientificName: Scientific name, associated with the taxonID. This is the name of the lowest level taxonomic rank that can be determined
taxonRank: The lowest level taxonomic rank that can be determined for the individual or specimen
vernacularName: A common or vernacular name
family: The scientific name of the family in which the taxon is classified
nativeStatusCode: The process by which the taxon became established in the location
observerDistance: Radial distance between the observer and the individual(s) being observed
detectionMethod: How the individual(s) was (were) first detected by the observer
visualConfirmation: Whether the individual(s) was (were) seen after the initial detection
sexOrAge: Sex of individual if detectable, age of individual if individual can not be sexed
clusterSize: Number of individuals in a cluster (a group of individuals of the same species)
clusterCode: Alphabetic code (A-Z) linked to clusters (groups of individuals of the same species) spanning multiple records
identifiedBy: An identifier for the technician who identified the specimen
publicationDate: Date of data publication on the NEON data portal
release: Identifier for data release

Source

https://data.neonscience.org/api/v0/products/DP1.10003.001

Per-point data table from Breeding landbird point counts (DP1.10003.001)

Description

An example set of NEON observational data. Contains the point metadata associated with bird observations from Niwot Ridge (NIWO) in 2019, as published in RELEASE-2021.

Usage

brd_perpoint

Format

A data frame with 54 rows and 31 columns

uid: Unique ID within NEON database; an identifier for the record
namedLocation: Name of the measurement location in the NEON database
domainID: Unique identifier of the NEON domain
siteID: NEON site code
plotID: Plot identifier (NEON site code_XXX)
plotType: NEON plot type in which sampling occurred: tower, distributed or gradient
pointID: Identifier for a point location
nlcdClass: National Land Cover Database Vegetation Type Name
decimalLatitude: The geographic latitude (in decimal degrees, WGS84) of the geographic center of the reference area
decimalLongitude: The geographic longitude (in decimal degrees, WGS84) of the geographic center of the reference area
geodeticDatum: Model used to measure horizontal position on the earth
coordinateUncertainty: The horizontal distance (in meters) from the given decimalLatitude and decimalLongitude describing the smallest circle containing the whole of the Location. Zero is not a valid value for this term
elevation: Elevation (in meters) above sea level
elevationUncertainty: Uncertainty in elevation values (in meters)
startDate: The start date-time or interval during which an event occurred
samplingImpracticalRemarks: Technician notes; free text comments accompanying the sampling impractical record
samplingImpractical: Samples and/or measurements were not collected due to the indicated circumstance
eventID: An identifier for the set of information associated with the event, which includes information about the place and time of the event
startCloudCoverPercentage: Observer estimate of percent cloud cover at start of sampling
endCloudCoverPercentage: Observer estimate of percent cloud cover at end of sampling
startRH: Relative humidity as measured by handheld weather meter at the start of sampling
endRH: Relative humidity as measured by handheld weather meter at the end of sampling
observedHabitat: Observer assessment of dominant habitat at the sampling point at sampling time
observedAirTemp: The air temperature measured with a handheld weather meter
kmPerHourObservedWindSpeed: The average wind speed measured with a handheld weather meter, in kilometers per hour
laboratoryName: Name of the laboratory or facility that is processing the sample
samplingProtocolVersion: The NEON document number and version where detailed information regarding the sampling method used is available; format NEON.DOC.######vX
remarks: Technician notes; free text comments accompanying the record
measuredBy: An identifier for the technician who measured or collected the data
publicationDate: Date of data publication on the NEON data portal
release: Identifier for data release

Source

https://data.neonscience.org/api/v0/products/DP1.10003.001

Lignin data table from Plant foliar traits (DP1.10026.001)

Description

An example set of NEON observational data, containing duplicate records. NOT APPROPRIATE FOR ANALYTICAL USE. Contains foliar lignin data from Moab and Toolik in 2017, with artificial duplicates introduced to demonstrate the removeDups() function.

Usage

cfc_lignin_test_dups

Format

A data frame with 26 rows and 25 columns

The variable names, descriptions, and units can be found in the cfc_lignin_variables table

Source

https://data.neonscience.org/api/v0/products/DP1.10026.001

Variables file, subset to lignin table, from Plant foliar traits (DP1.10026.001)

Description

The foliar lignin table's variables file from NEON observational data. Example to illustrate use of removeDups().

Usage

cfc_lignin_variables

Format

A data frame with 26 rows and 8 columns

table: The table name of the NEON data table
fieldName: Field name within the table; corresponds to column names in cfc_lignin_test_dups
description: Description for each field name
dataType: Type of data for each field name
units: Units for each field name
downloadPkg: Is the field published in the basic or expanded data package?
pubFormat: Publication formatting, e.g. date format or rounding
primaryKey: Fields indicated by Y, when combined, should identify a unique record. Used by removeDups() to identify duplicate records.

Source

https://data.neonscience.org/api/v0/products/DP1.10026.001

Helper function to remove duplicates from a data table; assumes input data are a duplicated set.

Description

Helper function to carry out duplicate removal on a data table of duplicates.

Usage

dupProcess(data, data.dup, table)

Arguments

data

A data frame containing original duplicated data. [data frame]

data.dup

A data frame containing lowercase conversion of the duplicated data. [character]

table

The table name for the input data frame

Details

Helper function to carry out the flagging and removal for removeDups().

Value

A modified data frame with resolveable duplicates removed and a flag field added and populated.

Author(s)

Claire Lunch clunch@battelleecology.org

References

License: GNU AFFERO GENERAL PUBLIC LICENSE Version 3, 19 November 2007

Get the data from API

Description

Accesses the API with options to use the user-specific API token generated within neon.datascience user accounts.

Usage

getAPI(apiURL, token = NA_character_)

Arguments

apiURL

The API endpoint URL

token

User specific API token (generated within neon.datascience user accounts). Optional.

Author(s)

Claire Lunch clunch@battelleecology.org

References

License: GNU AFFERO GENERAL PUBLIC LICENSE Version 3, 19 November 2007

Find all relatives (parents, children, and outward) of a given sample.

Description

Find all samples in the sample tree of a given sample.

Usage

getSampleTree(
  sampleNode,
  idType = "tag",
  sampleClass = NA_character_,
  token = NA_character_
)

Arguments

sampleNode

A NEON sample identifier. [character]

idType

Is sampleNode a tag, barcode, or guid? Defaults to tag. [character]

sampleClass

The NEON sampleClass of sampleNode. Required if sampleNode is a tag and there are multiple valid classes. [character]

token

User specific API token (generated within neon.datascience user accounts). Optional. [character]

Details

Related NEON samples can be connected to each other in a parent-child hierarchy. Parents can have one or many children, and children can have one or many parents. Sample hierarchies can be simple or complex - for example, particulate mass samples (dust filters) have no parents or children, whereas water chemistry samples can be subsampled for dissolved gas, isotope, and microbial measurements. This function finds all ancestors and descendants of the focal sample (the sampleNode), and all of their relatives, and so on recursively, to provide the entire hierarchy. See documentation for each data product for more specific information.

Value

A table of sample identifiers, their classes, and their parent samples.

Author(s)

Claire Lunch clunch@battelleecology.org

References

License: GNU AFFERO GENERAL PUBLIC LICENSE Version 3, 19 November 2007

Examples

# Find related samples for a soil nitrogen transformation sample
## Not run: 
soil_samp <- getSampleTree(sampleNode="B00000123538", idType="barcode")

## End(Not run)

Get NEON taxon table

Description

This is a function to retrieve a taxon table from the NEON data portal for a given taxon type and provide it in a tractable format.

Usage

getTaxonList(
  taxonType = NA,
  recordReturnLimit = NA,
  stream = "true",
  verbose = "false",
  token = NA
)

Arguments

taxonType

The taxonTypeCode to access. Must be one of ALGAE, BEETLE, BIRD, FISH, HERPETOLOGY, MACROINVERTEBRATE, MOSQUITO, MOSQUITO_PATHOGENS, SMALL_MAMMAL, PLANT, TICK [character]

recordReturnLimit

The number of items to limit the result set to. If NA (the default), will return either the first 100 records, or all records in the table, depending on the value of ‘stream'. Use 'stream=’true'' to get all records. [integer]

stream

True or false, obtain the results as a stream. Utilize for large requests. Note this is lowercase true and false as character strings, not logical. [character]

verbose

True or false, include all possible taxonomic parameters. Defaults to false, only essential parameters. Note this is lowercase true and false as character strings, not logical. [character]

token

User specific API token (generated within neon.datascience.org user account) [character]

Value

Data frame with selected NEON taxonomic data

Author(s)

Eric R. Sokol esokol@battelleecology.org

References

License: GNU AFFERO GENERAL PUBLIC LICENSE Version 3, 19 November 2007

Examples

# taxonTypeCode must be one of	
# ALGAE, BEETLE, BIRD, FISH,	
# HERPETOLOGY, MACROINVERTEBRATE, 
# MOSQUITO, MOSQUITO_PATHOGENS,	
# SMALL_MAMMAL, PLANT, TICK	
#################################	
# get the first 4 fish taxa	
taxa_table <- getTaxonList('FISH', recordReturnLimit = 4)

Reformat sample data to indicate the parents of a focal sample.

Description

Reformat table of sample identifiers to include parent sample identifiers. Used in getSampleTree().

Usage

idSampleParents(sampleUuid, token = NA_character_)

Arguments

sampleUuid

A NEON sample UUID. [character]

token

User specific API token (generated within neon.datascience user accounts). Optional. [character]

Value

A table of sample identifiers, their classes, and their parent samples.

Author(s)

Claire Lunch clunch@battelleecology.org

References

License: GNU AFFERO GENERAL PUBLIC LICENSE Version 3, 19 November 2007

Join two data tables from NEON Observational System

Description

NEON observational data are published in multiple tables, usually corresponding to activities performed in different times or places. This function uses the fields identified in NEON Quick Start Guides to join tables containing related data.

Usage

joinTableNEON(
  table1,
  table2,
  name1 = NA_character_,
  name2 = NA_character_,
  location.fields = NA,
  left.join = NA
)

Arguments

table1

A data frame containing data from a NEON observational data table [data frame]

table2

A second data frame containing data from a NEON observational data table [data frame]

name1

The name of the first table. Defaults to the object name of table1. [character]

name2

The name of the second table. Defaults to the object name of table2. [character]

location.fields

Should standard location fields be included in the list of linking variables, to avoid duplicating those fields? For most data products, these fields are redundant, but there are a few exceptions. This parameter defaults to NA, in which case the Quick Start Guide is consulted. If QSG indicates location fields shouldn't be included, value is updated to FALSE, otherwise to TRUE. Enter TRUE or FALSE to override QSG defaults. [logical]

left.join

Should the tables be joined in a left join? This parameter defaults to NA, in which case the Quick Start Guide is consulted. If the QSG does not specify, a full join is performed. Enter TRUE or FALSE to override the default behavior, including using FALSE to force a full join when a left join is specified in the QSG. Forcing a left join is not generally recommended; remember you will likely be discarding data from the second table. [logical]

Details

The "Table joining" section of NEON Quick Start Guides (QSGs) provides the field names of the linking variables between related NEON data tables. This function uses the QSG information to join tables. Tables are joined using a full join unless the QSG specifies otherwise. If you need to remove duplicates as well as joining, run removeDups() before running joinTableNEON(). Tables that don't appear together in QSG instructions can't be joined here. Some tables may not be straightforwardly joinable, such as tables of analytical standards run as unknowns. Theoretically, these data could be joined to analytical results by a combination of laboratory and date, but in general, a table join is not the best way to analyze this type of data. If a pair of tables is omitted from QSG instructions that you expected to find, contact NEON.

Value

A single data frame created by joining table1 and table2 on the fields identified in the quick start guide.

Author(s)

Claire Lunch clunch@battelleecology.org

References

License: GNU AFFERO GENERAL PUBLIC LICENSE Version 3, 19 November 2007

Examples

# Join metadata from the point level to individual observations, for NEON bird data
all_bird <- joinTableNEON(table1=brd_perpoint, table2=brd_countdata)

Remove duplicates from a data table based on a provided primary key; flag duplicates that can't be removed.

Description

NEON observational data may contain duplicates; this function removes exact duplicates, attempts to resolve non-exact duplicates, and flags duplicates that can't be resolved.

Usage

removeDups(data, variables, table = NA_character_, ncores = 1)

Arguments

data

A data frame containing data from a NEON observational data table [data frame]

variables

The NEON variables file containing metadata about the data table in question [data frame]

table

The name of the table. Must match one of the table names in 'variables' [character]

ncores

The maximum number of cores to use for parallel processing. Defaults to 1. [numeric]

Details

Duplicates are identified based on exact matches in the values of the primary key. For records with identical keys, these steps are followed, in order: (1) If records are identical except for NA or empty string values, the non-empty values are kept. (2) If records are identical except for uid, remarks, and/or personnel (xxxxBy) fields, unique values are concatenated within each field, and the merged version is kept. (3) For records that are identical following steps 1 and 2, one record is kept and flagged with duplicateRecordQF=1. (4) Records that can't be resolved by steps 1-3 are flagged with duplicateRecordQF=2. Note that in a set of three or more duplicates, some records may be resolveable and some may not; if two or more records are left after steps 1-3, all remaining records are flagged with duplicateRecordQF=2. In some limited cases, duplicates can't be unambiguously identified, and these records are flagged with duplicateRecordQF=-1.

Value

A modified data frame with resolveable duplicates removed and a flag field added and populated.

Author(s)

Claire Lunch clunch@battelleecology.org

References

License: GNU AFFERO GENERAL PUBLIC LICENSE Version 3, 19 November 2007

Examples

# Resolve and flag duplicates in a test dataset of foliar lignin
lig_dup <- removeDups(data=cfc_lignin_test_dups, 
                      variables=cfc_lignin_variables,
                      table="cfc_lignin")