Help for package PFW

Title:

Filtering and Processing Data from Project FeederWatch

Version:

0.1.0

Description:

Provides tools to import, clean, filter, and prepare Project FeederWatch data for analysis. Includes functions for taxonomic rollup, easy filtering, zerofilling, merging in site metadata, and more. Project FeederWatch data comes from https://feederwatch.org/explore/raw-dataset-requests/.

License:

GPL-3

Encoding:

UTF-8

RoxygenNote:

7.3.2

Imports:

dplyr, lubridate, httr2, xml2, stats, utils, curl, stringdist

Suggests:

testthat (≥ 3.0.0), purrr, withr, knitr, rmarkdown

Config/testthat/edition:

Depends:

R (≥ 4.1.0)

URL:

https://github.com/ropensci/PFW, https://ropensci.github.io/PFW/

BugReports:

https://github.com/ropensci/PFW/issues

VignetteBuilder:

knitr

LazyData:

true

NeedsCompilation:

Packaged:

2025-07-04 22:58:39 UTC; mmaro

Author:

Mason W. Maron

[aut, cre], Sunny Tseng [rev], Paul Carteron [rev]

Maintainer:

Mason W. Maron <mwmaron2@illinois.edu>

Repository:

CRAN

Date/Publication:

2025-07-09 10:30:20 UTC

View Filter Attributes on Manipulated Project FeederWatch Data

Description

This function allows users to view all filters they've applied to a filtered Project FeederWatch dataset by printing its recorded filter attributes in a readable format.

Usage

pfw_attr(data)

Arguments

data

A filtered Project FeederWatch dataset.

Value

A named list of applied filters.

Examples


# Download/load example dataset
data <- pfw_example

# Filter for Dark-eyed Junco
filtered_data <- pfw_species(data, "Dark-eyed Junco")

# View filters applied to your active data
pfw_attr(filtered_data)

Filter Project FeederWatch Data by Month and/or Year

Description

This function filters Project FeederWatch data by year and/or month, allowing range-based filtering and wrapping months around new years.

Usage

pfw_date(data, year = NULL, month = NULL)

Arguments

data

A Project FeederWatch dataset.

year

Optional. Integer or vector of years (e.g. 2010 or 2010:2015).

month

Optional. Integer or vector of months (1–12). Supports wrapping (e.g. c(11:2) = Nov–Feb).

Value

A filtered dataset with date filter attributes.

Examples


# Download/load example dataset
data <- pfw_example

# Filter by a single year
data_2021 <- pfw_date(data, year = 2021)

# Filter by multiple years
data_2123 <- pfw_date(data, year = 2021:2023)

# Filter by a single month
data_feb <- pfw_date(data, month = 2)

# Filter by a span of months
data_winter <- pfw_date(data, month = 11:2)

# Filter by both year and month
data_filtered <- pfw_date(data, year = 2021:2023, month = 11:2)

Look Up Definitions from the Project FeederWatch Data Dictionary

Description

This function helps users explore the FeederWatch dataset by viewing the full data dictionary or searching for definitions for specific variables.

Usage

pfw_dictionary(variable = NULL)

Arguments

variable

(Optional) A variable name (e.g., "LOC_ID") to look up. If NULL, prints the full dictionary.

Value

A printed description (for a variable) or the full dictionary.

Examples

# View the whole data dictionary
pfw_dictionary()

# View the data dictionary entry for location ID ("LOC_ID")
pfw_dictionary("LOC_ID")

Download Raw Project FeederWatch Data by Year

Description

This function downloads raw data for selected years from the Project FeederWatch website. It unzips the downloaded data and saves the .csv files into a local folder (default: "data-raw/"), removing the zip files afterward. It will download all files required to cover the user-selected years.

Usage

pfw_download(years, folder = NULL)

Arguments

years

Integer or vector of years (e.g., 2001, 2001:2023, c(1997, 2001, 2023)). Data is available from 1998 to present.

folder

The folder where Project FeederWatch data is stored. Default is "data-raw/" in a local directory.

Value

Invisibly returns the downloaded files.

Examples


# Download data from 2001-2006 into the default folder
pfw_download(years = 2001:2006)

Example Project FeederWatch Dataset

Description

A sample dataset for demonstration and testing purposes. This dataset includes data from 2020 - May 2024 from Washington and Oregon.

Usage

pfw_example

Format

A data frame with 556,814 rows and 24 columns.

Source

Created using pfw_download() and pfw_import() in data-raw/pfw_example.R

Examples

# Load the example data into the environment
data(pfw_example)

# Assign the example dataset
testing_data <- pfw_example

Apply Multiple Filters to Project FeederWatch Data

Description

This function filters Project FeederWatch data by species, region, and data validity.

Usage

pfw_filter(
  data,
  species = NULL,
  region = NULL,
  year = NULL,
  month = NULL,
  valid = TRUE,
  reviewed = NULL,
  rollup = TRUE
)

Arguments

data

A Project FeederWatch dataset.

species

(Optional) A character vector of species names (common or scientific).

region

(Optional) A character vector of region names (e.g., "Washington", "British Columbia").

year

(Optional) Integer or vector of years (e.g., 2010 or 2010:2015).

month

(Optional) Integer or vector of months (1–12). Supports wrapping (e.g., 11:2 = Nov–Feb).

valid

(Optional, default = TRUE) Filter out invalid data. Removes rows where VALID == 0.

reviewed

(Optional) If specified, filters by review status (TRUE for reviewed, FALSE for unreviewed).

rollup

(Optional, default = TRUE) Automatically roll up subspecies to species level and remove spuhs, slashes, and hybrids.

Value

A filtered dataset.

Examples


# Download/load example dataset
data <- pfw_example

# Filter for Dark-eyed Junco, Song Sparrow, and Spotted Towhee in Washington in 2023
data_masonsyard <- pfw_filter(
  data,
  species = c("daejun", "sonspa", "spotow"),
  region = "US-WA",
  year = 2023
)

# Filter for all data from Washington, Oregon, or California from November
# through February for 2021 through 2023
data_westcoastwinter <- pfw_filter(
  data,
  region = c("Washington", "Oregon", "California"),
  year = 2021:2023,
  month = 11:2
)

# Filter for Greater Roadrunner in California, keeping only reviewed
# records and disabling taxonomic rollup
data_GRRO_CA <- pfw_filter(
  data,
  species = "Greater Roadrunner",
  region = "California",
  reviewed = TRUE,
  rollup = FALSE
)

# Filter for Fox Sparrow with rollup
rollFOSP <- pfw_filter(pfw_example, species = "Fox Sparrow", rollup = TRUE)
# Taxonomic rollup complete. 116 ambiguous records removed.
# 1 species successfully filtered.
# Filtering complete. 8070 records remaining.

# Filter for Fox Sparrow without rollup
norollFOSP <- pfw_filter(pfw_example, species = "Fox Sparrow", rollup = FALSE)
# 1 species successfully filtered.
# Filtering complete. 7745 records remaining.

# 116 records were identified to subspecies (e.g. "Fox Sparrow (Sooty)",
# listed as 'foxsp2' in SPECIES_CODE)
# These records are merged into the parent "Fox Sparrow" total with rollup,
# but excluded in favor of records only identified exactly as
# "Fox Sparrow" (no subspecies, only SPECIES_CODE = 'foxspa') if rollup = FALSE.

Import Project FeederWatch Data

Description

This function reads all .csv files downloaded from the Project FeederWatch website, either from the default "data-raw/" folder created by pfw_download() or from a user-specified folder. Optionally, it can apply filters like region, species, year, etc. .csv files for import can be downloaded via pfw_download() or from the Project FeederWatch website.

Usage

pfw_import(folder = NULL, filter = FALSE, ...)

Arguments

folder

The folder where Project FeederWatch data is stored. Default is "data-raw/" in a local directory.

filter

Logical. If TRUE, applies filters using pfw_filter(). Default is FALSE.

...

Additional arguments passed to pfw_filter() for filtering (e.g., region, species, year).

Value

A combined and optionally filtered dataset containing all Project FeederWatch data.

Examples

## Not run: 
# This example cannot be run without user-downloaded data! This data can
# be downloaded manually or with pfw_download().

# Import all downloaded data from the default folder ("data-raw")
data <- pfw_import()

# Import and filter for Washington checklists from 2023
data_filtered <- pfw_import(filter = TRUE, region = "Washington", year = 2023)

## End(Not run)

Filter Project FeederWatch Data by Region

Description

This function filters Project FeederWatch data to include only specified states, provinces, or countries.

Usage

pfw_region(data, regions)

Arguments

data

A Project FeederWatch dataset.

regions

A character vector of regions (e.g., "Washington", "United States").

Value

A filtered dataset containing only the selected regions.

Examples


# Download/load example dataset
data <- pfw_example

# Filter for data only from Washington using the state name
data_WA <- pfw_region(data, "Washington")

# Filter for data only from Washington using the state code
data_WA <- pfw_region(data, "US-WA")

# Filter for data from Washington, Oregon,
# and California using the state name
data_westcoastbestcoast <- pfw_region(data, c("Washington", "Oregon", "California"))

Do Taxonomic Rollup on Project FeederWatch Data

Description

This function removes spuhs, hybrids, and slashes and "demotes" subspecies/subspecies intergrades to their parent species.

Usage

pfw_rollup(data)

Arguments

data

A Project FeederWatch dataset.

Value

A cleaned dataset with only species-level codes and a rollup attribute.

Examples

# Download/load example dataset
data <- pfw_example

# Apply taxonomic rollup to an active PFW dataset
rolled_data <- pfw_rollup(data)

Merge Site Metadata into Project FeederWatch Data

Description

This function joins habitat and site metadata into Project FeederWatch observation data using the site description file.If the site metadata file is not found, it will be downloaded automatically to the designated path or "data-raw" if no path is selected.

Usage

pfw_sitedata(data, path)

Arguments

data

A Project FeederWatch dataset.

path

File path to the site description .csv from https://feederwatch.org/explore/raw-dataset-requests/. If not specified, defaults to "data-raw/sitedata.csv".

Value

The original dataset with site metadata merged in.

Examples


# Download/loads the example dataset
data <- pfw_example

# Merge site metadata into example observation data
data_sites <- pfw_sitedata(data, "data-raw/site_data.csv")

Filter Project FeederWatch Data by Species

Description

This function filters Project FeederWatch data to include only selected species, with common names or scientific names via the species translation table.

Usage

pfw_species(data, species, suppress_ambiguous = FALSE)

Arguments

data

The Project FeederWatch dataset.

species

A character vector of species names (common, scientific, or six-letter species code).

suppress_ambiguous

(Optional, default = FALSE) TRUE/FALSE on including missing subspecies in the warning. This is just a silencer for the pfw_filter function.

Value

A filtered dataset containing only the selected species.

Examples


# Download/load example dataset
data <- pfw_example

# Filter for only Greater Roadrunner using the common name
data_GRRO <- pfw_species(data, "Greater Roadrunner")

# Filter for Lesser Goldfinch and American Goldfinch using scientific names
data_goldfinches <- pfw_species(data, c("Spinus psaltria", "Spinus tristis"))

# Filter for Dark-eyed Junco, Song Sparrow, and Spotted Towhee using species codes
data_masonsyard <- pfw_species(data, c("daejun", "sonspa", "spotow"))

# Filter with a pre-existing species list
species_list <- c("daejun", "sonspa", "spotow")
data_masonsyard <- pfw_species(data, species_list)

Filter Project FeederWatch Data to "Standard" Seasonal Window

Description

Project FeederWatch's Data Users Guide (https://birdscanada.github.io/BirdsCanada_PFW/Start2.html) Suggests that data should be truncated by date to avoid biases from years where the Project FeederWatch survey season was extended. This function filters data to include only observations within the typical FeederWatch season: after November 8 and before April 3.

Usage

pfw_truncate(data)

Arguments

data

A Project FeederWatch dataset with Year, Month, and Day columns.

Value

A filtered dataset limited to Nov 8 – Apr 3 across years.

Examples

# Download/load example dataset
data <- pfw_example

# Truncate an active PFW dataset to November 8 - April 3
truncated_data <- pfw_truncate(data)

Zerofill Species not Detected in each Survey Instance for Analysis

Description

This function adds zeros for checklists where selected species were absent, setting HOW_MANY = 0 for presence/absence-based analyses. Note that zerofilling entire, unfiltered datasets from Project FeederWatch will take a long time!

Usage

pfw_zerofill(data)

Arguments

data

A Project FeederWatch dataset, optionally filtered for species.

Value

A dataset with zerofilled values included for each species.

Examples

## Not run: 
# This example cannot be run because it relies on a cached version of the
# data which is created upon using pfw_import(). Storing a version of this
# for the example dataset would be too large for CRAN!

# Zerofill a PFW  dataset
data_zf <- pfw_zerofill(data)

## End(Not run)

Region Lookup Table

Description

The region lookup table, which maps SUBNATIONAL1_CODE values to region "common" names.

Usage

region_lookup

Format

A data frame with 2 columns:

Code: Region code (e.g., "US-WA")
Area: Full area name (e.g., "Washington")

Update the Project FeederWatch Species Translation Table

Description

This function downloads the latest species translation table from the Project FeederWatch website and saves it to a local directory. If a previous version exists in the local directory, the user will be asked for confirmation before overwriting it. This ensures taxonomy can readily be kept up to date annually, since it will only be manually updated on the PFW website otherwise.

Usage

update_taxonomy(user_dir = tools::R_user_dir("PFW", "data"))

Arguments

user_dir

Optional. A custom directory to write the translation table to. Using the default local directory is highly recommended.

Value

A message confirming whether the update was successful.

Examples


# Prompt a species translation table taxonomy update
update_taxonomy()