--- title: 'Extended vignette for **EEAaq: Handle Air Quality Data from the European Environment Agency Data Portal**' author: - Paolo Maranzano, University of Milano-Bicocca, Italy, paolo.maranzano@unimib.it - Riccardo Borgoni, University of Milano-Bicocca, Italy, riccardo.borgoni@unimib.it - Agostino Tassan Mazzocco, University of Milano-Bicocca, Italy - Samir Doghmi, University of Milano-Bicocca, Italy date: "`r format(Sys.time(), '%d %B %Y')`" output: html_document: df_print: paged pdf_document: fig_caption: true keep_md: true subtitle: Extended vignette vignette: > %\VignetteIndexEntry{Extended vignette for EEAaq} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- The **EEAaq package** allows users to retrieve air quality data for multiple geographical zones, pollutants, and time periods in a single request. Queries are submitted as lists, which enables flexibility in specifying combinations of parameters. ```{r setup} library(EEAaq) `%>%` <- dplyr::`%>%` ``` # EEAaq_get_data Below we demonstrate the use of query by using different combinations of user-defined arguments. ### Retrieve NO$_2$ and PM$_10$ data for the specified station IDs ```{r} #' ### Download PM10 data for the province (NUTS-3) of Milano (Italy) from January 1st 2024 to January 31st, 2025 IDstations <- EEAaq_get_stations(byStation = TRUE, complete = FALSE) IDstations <- IDstations %>% dplyr::filter(NUTS3 %in% c("Milano")) %>% dplyr::pull(AirQualityStationEoICode) %>% unique() data <- EEAaq_get_data(IDstations = IDstations, pollutants = c("PM10", "NO2"), from = "2024-01-01", to = "2025-01-31", verbose = TRUE) ``` ```{r} # Preview the first few rows of the dataset head(data) ``` ```{r} unique(data$AirQualityStationEoICode) ``` \ **Note 1**: If the query's IDstations parameter corresponds to a valid *CITY_NAME* (i.e., not NULL in the dataset), the function will return the corresponding data. If no valid CITY_NAME is associated with the IDstations, the function attempts to retrieve all available data for the entire country and subsequently filter for the specified IDstations. \ **Note 3**: If the parameters used in the query include *polygon* or *quadrant*, the function outputs an EEAaq_df_sfc object. Otherwise, it returns an EEAaq_df object, which is a tibble dataframe. \ # EEAaq map stations `EEAaq_map_stations` generates a static or dynamic map of user-defined monitoring stations. The function accepts as input either an object of the `EEAaq_df` class (default output of the `EEAaq_get_data` function), or all other parameters specifying the area and the pollutants. ### Map the stations using as `EEAaq_df` object, which contains NO$_2$ PM$_{10}$ data for Italy. ```{r} # Static map of available stations across the whole country. External borders are given by the #' ### union of the available regions (NUTS-2), while municipalities (LAUs) are used as inner borders. EEAaq_map_stations(data = data, NUTS_extborder = "NUTS2", NUTS_intborder = "LAU", color = TRUE, dynamic = FALSE) ``` \ **Note**: The external borders are given by the union of the available regions (NUTS-2), while municipalities (LAUs) are used as inner borders. ### Dynamic (interactive) map of all the NO$_2$ and PM$_{10}$ monitoring stations in Milano ```{r} EEAaq_map_stations(data = data, NUTS_extborder = "NUTS2", NUTS_intborder = "NUTS3", color = TRUE, dynamic = TRUE) ``` # EEAaq summary This function aims to describe the dataset that has been previously imported, both at a global level, which means considering the complete set of time stamps and monitoring stations in the dataset, and at the station-specific level, where summary statistics and information are grouped by monitoring station. \ In addition to basic exploratory descriptive statistics (e.g., average pollutant concentration, variability, measures of skewness and kurtosis), the function provides information about the gap length and the correlation between pollutants if at least two pollutants are considered simultaneously. \ The `EEAaq_summary` function receives as input an `EEAaq_df` object, i.e. the output of the EEAaq get data function. ### Compute the descriptive statistics ```{r} summ <- EEAaq_summary(data = data) ``` ### Print screen the global statsitics ```{r} summ$Summary ``` ### Print screen the station-specific statsitics ```{r} summ$Summary_byStat$Mean_byStat ``` ### Print screen the linear correlation matrix ```{r} summ$Corr_Matrix ``` # EEAaq time aggregate Recall that most pollutants are monitored by EEA on a hourly or daily basis, posing challenges for interpretation and representation. The `EEAaq_time_aggregate` function simplifies this by aggregating data into annual, monthly, weekly, daily, or hourly intervals, generating summary statistics for each station in an `EEAaq_taggr_df` object. ### Get the station-specific monthly minimum, maximum, average and median concentrations of NO$_2$ and PM$_{10}$ in Belgium and The Netherlands ```{r} t_aggr <- EEAaq_time_aggregate( data = data, frequency = "monthly", aggr_fun = c("min", "max", "mean", "median" ) ) ``` ### Print screen of the aggregated (monthly) data ```{r} t_aggr$TimeAggr ``` ### Print screen of the PM$_{10}$ aggregated data only ```{r} t_aggr$TimeAggr_byPollutant$PM10 ``` # EEAaq_idw_map To enable quick and intuitive visual analysis, the `EEAaq_idw_map` function provides spatial interpolation maps using the Inverse Distance Weighting (IDW) method (Shepard, 1968). This technique estimates the value of a variable at unknown locations by calculating a weighted average of known values, with weights inversely proportional to the distance from known points. Closer points contribute more heavily to the estimate, making it a practical approach for interpolating geolocated air quality data. ### Generate IDW interpolated maps of monthly average concentrations of NO$_2$ in iTALY ```{r} EEAaq::EEAaq_idw_map(data = t_aggr, pollutant = "NO2", aggr_fun = "mean", distinct = TRUE, gradient = FALSE, dynamic = FALSE, NUTS_filler = "NUTS3", NUTS_extborder = "NUTS2") ``` ```{r} t_aggr$TimeAggr_byPollutant$NO2 ``` ### Generate IDW interpolated maps of the maximum monthly concentrations of NO$_2$ in january and february 2024 in Italy ```{r} EEAaq::EEAaq_idw_map( data = t_aggr$TimeAggr_byPollutant$NO2 %>% dplyr::filter(Date %in% c("2024-01-01","2024-02-01")), pollutant = "NO2", aggr_fun = "max", distinct = TRUE, gradient = TRUE, idp = 2,NUTS_extborder = "NUTS2",NUTS_filler = "LAU" ) ```