--- title: "Getting Started with nycOpenData" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Working with NYC 311 Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} knitr::opts_chunk$set(warning = FALSE, message = FALSE) library(nycOpenData) library(ggplot2) ``` ## Introduction NYC has a population of almost 8.5 million people. By calling 311, residents are able to make comments, inquiries, complaints, and requests to the city agencies. All 311 service requests are contained in the dataset, [found here](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2020-to-Present/erm2-nwe9/about_data). In R, the `nycOpenData` package can be used to pull this data directly. The `nycOpenData` package provides a streamlined interface for accessing New York City's vast open data resources. It connects directly to the NYC Open Data Portal. It is currently utilized as a primary tool for teaching data acquisition in [Reproducible Research Using R](https://martinezc1-textbook--reproducible-research-using-r.share.connect.posit.cloud), helping students bridge the gap between raw city APIs and tidy data analysis. By using the `nyc_311()` function, we can gather the most recent 311 calls in New York City, and filter based upon any of the columns inside the dataset. > Note: `nyc_311()` automatically sorts in descending order based on the created_date column. ## Pulling a Small Sample To start, let's pull a small sample to see what the data looks like. By default, the function pulls in the *10,000 most recent* requests, however, let's change that to only see the latest 3 requests. To do this, we can set `limit = 3`. ```{r small-sample} small_sample <- nyc_311(limit = 3) small_sample # Seeing what columns are in the dataset colnames(small_sample) ``` Fantastic! We successfully pulled 311 data from the NYC Open Data Portal. Let's now take an example of the last 3 requests from the borough Brooklyn. The `nyc_311()` function can filter based off any of the columns in the dataset. To filter, we add `filters = list()` and put whatever filters we would like inside. From our `colnames` call before, we know that there is a column called "borough" which we can use to accomplish this. ```{r filter-brooklyn} brooklyn_311 <- nyc_311(limit = 3, filters = list(borough = "BROOKLYN")) brooklyn_311 # Checking to see the filtering worked unique(brooklyn_311$borough) ``` Success! From calling the `brooklyn_311` dataset we see there are only 3 rows of data, and from the `unique()` call we see the only borough featured in our dataset is BROOKLYN. One of the strongest qualities this function has is its ability to filter based off of multiple columns. Let's put everything together and get a dataset of the last *50* 311 requests from the New York Police Department in Brooklyn. ```{r filter-brooklyn-nypd} # Creating the dataset brooklyn_nypd <- nyc_311(limit = 50, filters = list(agency = "NYPD", borough = "BROOKLYN")) # Calling head of our new dataset head(brooklyn_nypd) # Quick check to make sure our filtering worked nrow(brooklyn_nypd) unique(brooklyn_nypd$agency) unique(brooklyn_nypd$borough) ``` We successfully created our dataset that contains the 50 most recent requests regarding the NYPD in the borough Brooklyn. ## Mini analysis Now that we have successfully pulled the data and have it in R, let's do a mini analysis on using the `complaint_type` column, to figure out what NYC residents in Brooklyn are complaining about to the NYPD. To do this, we will create a bar graph of the complaint types. ```{r compaint-type-graph, fig.alt="Bar chart showing the frequency of NYPD-related 311 complaint types in Brooklyn from the 50 most recent service requests.", fig.cap="Bar chart showing the frequency of NYPD-related 311 complaint types in Brooklyn from the 50 most recent service requests.", fig.height=5, fig.width=7} # Visualizing the distribution, ordered by frequency ggplot(brooklyn_nypd, aes(y = reorder(complaint_type, complaint_type, length))) + geom_bar(fill = "steelblue") + theme_minimal() + labs( title = "Most Recent NYPD 311 Complaints (Brooklyn)", subtitle = "Top 50 service requests", x = "Number of Complaints", y = "Type of Complaint" ) ``` This graph shows us not only *which* complaints were made, but *how many* of each complaint were made. ## Summary The `nycOpenData` package serves as a robust interface for the NYC Open Data portal, streamlining the path from raw city APIs to actionable insights. By abstracting the complexities of data acquisition—such as pagination, type-casting, and complex filtering—it allows users to focus on analysis rather than data engineering. As demonstrated in this vignette, the package provides a seamless workflow for targeted data retrieval, automated filtering, and rapid visualization. ## How to Cite If you use this package for research or educational purposes, please cite it as follows: Martinez C (2026). nycOpenData: Convenient Access to NYC Open Data API Endpoints. R package version 0.1.4, https://martinezc1.github.io/nycOpenData/.