---
title: "'geeLite' R Package"
subtitle: "Authors:"
author: |
Marcell T. Kurbucz
(*[email](mailto:m.kurbucz@ucl.ac.uk)*)
Bo Pieter Johannes Andree (*[email](mailto:bandree@worldbank.org)*)
**Contents**
output:
rmarkdown::html_document:
df_print: paged
toc: true
toc_depth: 3
toc_float: false
theme: cerulean
highlight: tango
code_folding: hide
number_sections: false
fig_caption: true
css: styles.css
vignette: >
%\VignetteIndexEntry{geeLite}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r global-options, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r install-packages, include = FALSE}
# Check that required packages are available without installing them
required_pkgs <- c("dplyr", "leaflet", "withr", "sf")
missing_pkgs <- required_pkgs[!vapply(
required_pkgs, requireNamespace, logical(1), quietly = TRUE)]
if (length(missing_pkgs) > 0) {
stop(paste0("Missing packages: ", paste(missing_pkgs, collapse = ", "),
".\nPlease install them manually before running this vignette."))
}
# Define a temporary path for SQLite database
path_to_db <- withr::local_tempdir()
```
```{r setup, include = FALSE}
# Load the necessary packages
library(geeLite)
library(withr)
# Set chunk evaluation flag based on GEE authentication status
run_chunks <- geeLite:::check_rgee_ready()
# Set the time zone to UTC temporarily during vignette execution
withr::local_timezone("UTC")
```
## 1. Introduction {#introduction}
`geeLite` simplifies the process of building, managing, and updating local SQLite databases containing geospatial features extracted from Google Earth Engine (GEE). This vignette covers the installation, configuration, data collection, and analysis workflows using the `geeLite` package.
For more detailed information and updates, visit the [geeLite GitHub repository](https://github.com/mtkurbucz/geeLite/).
## 2. Installation {#installation}
To install `geeLite` from GitHub, use the following commands:
```{r, eval = FALSE}
# Install devtools if not installed
# install.packages("devtools")
# Install geeLite from GitHub
devtools::install_github("mtkurbucz/geeLite")
# Install Python dependencies and setup rgee
geeLite::gee_install()
```
## 3. Workflow {#workflow}
The workflow for setting up and using `geeLite` includes configuration, data collection, modification, and data analysis.
### 3.1 Configuration {#configuration}
The first step in using `geeLite` is setting up a configuration file. This file specifies the regions of interest, the datasets to collect from GEE, and other parameters for data collection.
```{r config-setup, warning = FALSE, eval = FALSE}
# Define the path for the SQLite database
path <- path_to_db
# Create a configuration for Somalia (SO) and Yemen (YE) to collect NDVI data
set_config(
path = path,
regions = c("SO", "YE"),
source = list(
"MODIS/061/MOD13A2" = list(
"NDVI" = c("mean", "sd")
)
),
resol = 3,
start = "2020-01-01"
)
```
This configuration will create a JSON file at the specified path that defines the parameters for collecting data from GEE.
### 3.2 Data Collection {#data-collection}
Once the configuration is set, you can collect data from GEE using the `run_geelite()` function. This function retrieves data based on the configuration file and stores it in a local SQLite database.
```{r data-collection, eval = FALSE}
# Collect the data and store it in the SQLite database
run_geelite(path = path)
```
The function will store the collected geospatial data in the SQLite database and log the progress.
### 3.3 Modifying Configuration {#modifying-configuration}
If you want to modify the configuration (e.g., to add new statistics or datasets), you can use the `modify_config()` function to make updates without rebuilding the entire database.
```{r config-modification, eval = FALSE}
# Add more statistics to the NDVI band and include EVI data
modify_config(
path = path,
keys = list(
c("source", "MODIS/061/MOD13A2", "NDVI"),
c("source", "MODIS/061/MOD13A2", "EVI")
),
new_values = list(
c("mean", "min", "max"),
c("mean", "sd")
)
)
```
After modifying the configuration, run `run_geelite()` again to update the database with the new settings.
### 3.4 Reading and Analyzing Data {#reading-and-analyzing-data}
Once the data has been collected and stored in the database, you can read and analyze it using the `read_db()` function. This function allows you to aggregate data at different frequencies (e.g., daily, monthly) and apply preprocessing functions.
```{r reading-analyzing, eval = FALSE}
# Read the data from the database, aggregate to monthly frequency by default
db <- read_db(path = path)
# Read the data with custom aggregation functions
db <- read_db(path = path, aggr_funs = list(
function(x) mean(x, na.rm = TRUE),
function(x) sd(x, na.rm = TRUE))
)
```
## 4. Usage Example {#usage-example}
This example demonstrates how to use `geeLite` to gather NDVI data for Somalia and Yemen, aggregate it monthly, and visualize the results using the `leaflet` package.
### 4.1. Collecting the Data {#collecting-data}
```{r collecting-data-example, warning = FALSE, eval = run_chunks}
# Define the path for the SQLite database
path <- path_to_db
# Set the configuration file for NDVI data collection
set_config(
path = path,
regions = c("SO", "YE"),
source = list(
"MODIS/061/MOD13A2" = list(
"NDVI" = c("mean", "sd")
)
),
resol = 3,
start = "2020-01-01"
)
# Collect the data
run_geelite(path = path)
# Read the data from the database
db <- read_db(path = path, freq = "month")
```
### 4.2. Visualizing the Data {#visualizing-data}
Once the data is collected, you can visualize it using the `leaflet` package. The following code shows how to plot the mean NDVI values for each region.
```{r visualize-ndvi-leaflet, eval = run_chunks}
# Load necessary packages
library(leaflet)
library(dplyr)
library(sf)
# Read database and merge grid with MODIS data
sf <- merge(db$grid, db$`MODIS/061/MOD13A2/NDVI/mean`, by = "id")
# Select the date to visualize
ndvi <- sf$`2020-01-01`
# Create a color palette function based on the values
color_pal <- colorNumeric(palette = "viridis", domain = ndvi)
# Create the leaflet map
leaflet(data = sf) %>%
addTiles() %>% # Add base tiles
addPolygons(
fillColor = color_pal(ndvi), # Fill color
color = "#BDBDC3", # Border color
weight = 1, # Border weight
opacity = 1, # Border opacity
fillOpacity = 0.9 # Fill opacity
) %>%
addScaleBar(position = "bottomleft") %>% # Add scale bar
addLegend(
pal = color_pal, # Color palette
values = ndvi, # Data values to map
title = "Mean NDVI", # Legend title
position = "bottomright" # Legend position
)
```
## 5. Extras {#extras}
Additional features of the `geeLite` package include a `drive` mode for efficiently handling large data requests, as well as command-line interface (CLI) support for automation and integration with job scheduling systems like `cron`.
### 5.1 Drive Mode {#drive}
To efficiently handle large data requests, `drive` mode exports data in parallel batches to Google Drive before importing it into your local SQLite database. Ensure that adequate Google Drive storage is available before using `drive` mode.
```{r drive-mode, eval = FALSE}
# Collect and store data using drive mode
run_geelite(path = path, mode = "drive")
```
### 5.2 Command-Line Interface (CLI) {#cli}
`geeLite` provides a CLI that allows you to run the main functions of the package directly from the command line. This is useful for automating workflows or integrating `geeLite` into larger systems.
```{bash cli-usage, eval = FALSE}
# Setting the CLI files
Rscript /path/to/geeLite/cli/set_cli.R --path "path/to/db"
# Change directory to where the database will be generated
cd "path/to/db"
# Set up the configuration via CLI
Rscript cli/set_config.R --regions "SO YE" --source "list('MODIS/061/MOD13A2' = list('NDVI' = c('mean', 'min')))" --resol 3 --start "2020-01-01"
# Collecting GEE data via CLI
Rscript cli/run_geelite.R
# Modifying the configuration via CLI
Rscript cli/modify_config.R --keys "list(c('source', 'MODIS/061/MOD13A2', 'NDVI'), c('source', 'MODIS/061/MOD13A2', 'EVI'))" --new_values "list(c('mean', 'min', 'max'), c('mean', 'sd'))"
# Updating the database via CLI
Rscript cli/run_geelite.R
```
### 5.3 Automating Data Collection with Cron {#cron}
You can automate database updates using the Linux `cron` job scheduler. Here’s an example cron job that updates the database monthly:
```{bash cron-job, eval = FALSE}
# Open the cron jobs file
crontab -e
# Add the following line to schedule monthly updates (runs on the 1st of every month)
0 0 1 * * Rscript /path/to/db/cli/run_geelite.R
```