--- title: "Working With NYC Wetlands Data" author: "Shannon Joyce" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Working With NYC Wetlands Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, message = FALSE} knitr::opts_chunk$set(warning = FALSE, message = FALSE) library(nycOpenData) library(ggplot2) library(dplyr) library(knitr) ``` ## Introduction New York City is home to many wetland features. In an effort to grow awareness of their existence and multitude, [this dataset](https://data.cityofnewyork.us/dataset/NYC-Wetlands/p48c-iqtu/about_data) containing the geographic locations and descriptions of wetland features was created. In R, the `nycOpenData` package can be used to pull this data directly. The `nycOpenData` package provides a streamlined interface for accessing New York City's vast open data resources. It connects directly to the NYC Open Data Portal. It is currently utilized as a primary tool for teaching data acquisition in [Reproducible Research Using R](https://martinezc1-reproducible-research-using-r.share.connect.posit.cloud/), helping students bridge the gap between raw city APIs and tidy data analysis. By using the `nyc_wetlands()` function, we can gather the most recently listed wetland features in New York City, and filter based upon any of the columns inside the dataset. > Note: `nyc_wetlands()` automatically sorts in descending order based on the verificationstatusyear column. Due to this order, the first group of rows are `Unverified`, so the verificationstatus year is omitted for those rows. ## Pulling a Small Sample To start, let's pull a small sample to see what the data looks like. By default, the function pulls in the *10,000 most recent* additions, however, let's change that to only see the latest 3 additions. To do this, we can set `limit = 3`. ```{r small-sample} small_sample <- nyc_wetlands(limit = 3) small_sample # Seeing what columns are in the dataset colnames(small_sample) ``` Fantastic! We successfully pulled wetlands data from the NYC Open Data Portal. Let's now pull the complete dataset to work with: ## Pulling Full Dataset ```{r full-data} wetlands_data <- nyc_wetlands(limit = 100) # Let's take a look at what our full dataset looks like head(wetlands_data) ``` In our small sample data, the first few rows' verification status were `Unverified`. Let's see what the other values in that column are: ```{r ver-status} unique(wetlands_data$verificationstatus) ``` Now that we see the different values in the `verificationstatus` column, let's filter *out* all of the unverified wetland features: ```{r filter-brooklyn-nypd} # Creating the dataset verified_wetlands <- wetlands_data %>% filter(verificationstatus != "Unverified") # Quick check to make sure our filtering worked nrow(verified_wetlands) unique(verified_wetlands$verificationstatus) ``` Success! Now that we have our full list of verified wetland features in NYC, let's take a look at some of its descriptive stats. ## Mini Analysis Let's create a summary table showing how many wetland features were verified each year: ```{r year-summary} verified_per_year <- verified_wetlands %>% group_by(verificationstatusyear) %>% count(verificationstatusyear) verified_per_year %>% kable(caption = "Verified Wetland Features Per Year") ``` Let's create a bar graph to see how many wetlands of each classification are verified! ```{r fig.width=7, fig.height=4} ggplot(data = verified_wetlands, aes(x = classname)) + geom_bar(fill = "forestgreen") + labs(title = "Total Number of Wetland Features By Classification", x = "Classification Name", y = "Total Count") + theme_minimal() ``` Though this vignette only demonstrates a simple use of this function, the inclusion of geospatial data allows users to map these wetland features using the provided multipolygon coordinates. ## Summary The `nycOpenData` package serves as a robust interface for the NYC Open Data portal, streamlining the path from raw city APIs to actionable insights. By abstracting the complexities of data acquisition—such as pagination, type-casting, and complex filtering—it allows users to focus on analysis rather than data engineering. As demonstrated in this vignette, the package provides a seamless workflow for targeted data retrieval, automated filtering, and rapid visualization. ## How to Cite If you use this package for research or educational purposes, please cite it as follows: Martinez C (2026). nycOpenData: Convenient Access to NYC Open Data API Endpoints. R package version 0.1.6, https://martinezc1.github.io/nycOpenData/.