Title: | Wrangler for Emergency Events Database |
Version: | 1.1.2 |
Maintainer: | Ram Kripa <ram.m.kripa@berkeley.edu> |
Description: | Makes research involving EMDAT and related datasets easier. These Datasets are manually filled and have several formatting and compatibility issues. Weed aims to resolve these with its functions. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.1.1 |
Imports: | readxl, dplyr, magrittr, tidytext, stringr, tibble, geonames, countrycode, purrr, tidyr, forcats, ggplot2, sf, here |
URL: | https://github.com/rammkripa/weed |
BugReports: | https://github.com/rammkripa/weed/issues |
NeedsCompilation: | no |
Packaged: | 2023-10-16 21:49:32 UTC; ramkripa |
Author: | Ram Kripa [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2023-10-16 22:20:02 UTC |
GeoCodes text locations using the GeoNames API
Description
Uses the location_word
and Country
columns of the data frame to make queries
to the geonames API and geocode the locations in the dataset.
Note:
The Geonames API (for free accounts) limits you to 1000 queries an hour
You need a geonames username to make queries. You can learn more about that here
Usage
geocode(., n_results = 1, unwrap = FALSE, geonames_username)
Arguments
. |
a data frame which has been locationized (see |
n_results |
number of lat/longs to get |
unwrap |
if true, returns lat1, lat2, lng1, lng2 etc. as different columns, otherwise one lat column and 1 lng column |
geonames_username |
Username for geonames API. More about getting one is in the note above. |
Value
the same data frame with a lat column/columns and lng column/columns
Examples
df <- tibble::tribble(
~value, ~location_word, ~Country,
"mumbai region, district of seattle, sichuan province", "mumbai","India",
"mumbai region, district of seattle, sichuan province", "seattle", "USA"
)
geocode(df, n_results = 1, unwrap = TRUE, geonames_username = "rammkripa")
Geocode in batches
Description
Geocode in batches
Usage
geocode_batches(
.,
batch_size = 990,
wait_time = 4800,
n_results = 1,
unwrap = FALSE,
geonames_username
)
Arguments
. |
data frame |
batch_size |
size of each batch to geocode |
wait_time |
in seconds between batches Note: default batch_size and wait_time were set to accomplish the geocoding task optimally within the constraints of geonames free access |
n_results |
same as geocode |
unwrap |
as in geocode |
geonames_username |
as in geocode |
Value
df geocoded
Examples
df <- tibble::tribble(
~value, ~location_word, ~Country,
"mumbai region, district of seattle, sichuan province", "mumbai","India",
"mumbai region, district of seattle, sichuan province", "seattle", "USA",
"mumbai region, district of seattle, sichuan province", "sichuan", "China, People's Republic"
)
geocode_batches(df, batch_size = 2, wait_time = 0.4, geonames_username = "rammkripa")
Locations In the Box
Description
Creates a new column (in_box) that tells whether the lat/long is in a certain box or not.
Usage
located_in_box(
.,
lat_column = "lat",
lng_column = "lng",
top_left_lat,
top_left_lng,
bottom_right_lat,
bottom_right_lng
)
Arguments
. |
Data Frame that has been locationized. see |
lat_column |
Name of column containing Latitude data |
lng_column |
Name of column containing Longitude data |
top_left_lat |
Latitude at top left corner of box |
top_left_lng |
Longitude at top left corner of box |
bottom_right_lat |
Latitude at bottom right corner of box |
bottom_right_lng |
Longitude at bottom right corner of box |
Value
A dataframe that contains the latlong box data
Examples
d <- tibble::tribble(
~value, ~location_word, ~Country, ~lat, ~lng,
"city of new york", "new york", "USA", 40.71427, -74.00597,
"kerala, chennai municipality, and san francisco", "kerala", "India", 10.41667, 76.5,
"kerala, chennai municipality, and san francisco", "chennai", "India", 13.08784, 80.27847)
located_in_box(d, lat_column = "lat",
lng_column = "lng",
top_left_lat = 45,
bottom_right_lat = 12,
top_left_lng = -80,
bottom_right_lng = 90)
Locations In the Shapefile
Description
Creates a new column (in_shape) that tells whether the lat/long is in a certain shapefile.
Usage
located_in_shapefile(
.,
lat_column = "lat",
lng_column = "lng",
shapefile = NA,
shapefile_name = NA
)
Arguments
. |
Data Frame that has been locationized. see |
lat_column |
Name of column containing Latitude data |
lng_column |
Name of column containing Longitude data |
shapefile |
The shapefile itself (either shapefile or shapefile_name must be provided) |
shapefile_name |
FileName/Path to shapefile (either shapefile or shapefile_name must be provided) |
Value
Data Frame with the shapefile data as well as the previous data
Examples
## Not run:
d <- tibble::tribble(
~value, ~location_word, ~Country, ~lat, ~lng,
"city of new york", "new york", "USA", 40.71427, -74.00597,
"kerala, chennai municipality, and san francisco", "kerala", "India", 10.41667, 76.5,
"kerala, chennai municipality, and san francisco", "chennai", "India", 13.08784, 80.2847)
located_in_shapefile(d,
lat_column = "lat",
lng_column = "lng",
shapefile_name = "~/dummy_name")
## End(Not run)
Nest Location Data into a column of Tibbles
Description
Nest Location Data into a column of Tibbles
Usage
nest_locations(
.,
key_column = "Dis No",
columns_to_nest = c("location_word", "lat", "lng"),
keep_nested_cols = FALSE
)
Arguments
. |
Locationized data frame (see |
key_column |
Column name for Column that uniquely IDs each observation |
columns_to_nest |
Column names for Columns to nest inside the mini-dataframes |
keep_nested_cols |
Boolean to Keep the nested columns externally or not. |
Value
Data Frame with A column of data frames
Examples
d <- tibble::tribble(
~value, ~location_word, ~Country, ~lat, ~lng,
"city of new york","new york","USA", c(40.71427, 40.6501), c(-74.00597, -73.94958),
"kerala", "kerala", "India",c(10.41667, 8.4855), c(76.5, 76.94924),
"chennai municipality","chennai","India", c(13.08784, 12.98833),c(80.27847, 80.16578),
"san francisco", "san francisco","USA", c(37.77493, 37.33939), c(-122.41942, -121.89496))
nest_locations(d, key_column = "value")
Percent of Disasters Successfully Geocoded
Description
Tells us how successful the geocoding is.
How many of the disasters in this data frame have non NA coordinates?
Usage
percent_located_disasters(
.,
how = "any",
lat_column = "lat",
lng_column = "lng",
plot_result = TRUE
)
Arguments
. |
Data Frame that has been locationized. see |
how |
takes in a function, "any", or "all" to determine how to count the disaster as being geocoded if any, at least one location must be coded, if all, all locations must have lat/lng if a function, it must take in a logical vector and return a single logical |
lat_column |
Name of column containing Latitude data |
lng_column |
Name of column containing Longitude data |
plot_result |
Determines output type (Plot or Summarized Data Frame) |
Value
The percent and number of Locations that have been geocoded (see plot_result
for type of output)
Examples
d <- tibble::tribble(
~`Dis No`, ~value, ~location_word, ~Country, ~lat, ~lng,
1, "city of new york", "new york", "USA", 40.71427, -74.00597,
2, "kerala, chennai municipality, and san francisco", "kerala", "India", 10.41667, 76.5,
2, "kerala, chennai municipality, and san francisco", "chennai", "India", 13.08784, 80.27847)
percent_located_disasters(d,
how = "any",
lat_column = "lat",
lng_column = "lng",
plot_result = FALSE)
Percent of Locations Successfully Geocoded
Description
Tells us how successful the geocoding is.
How many of the locations in this data frame have non NA coordinates?
Usage
percent_located_locations(
.,
lat_column = "lat",
lng_column = "lng",
plot_result = TRUE
)
Arguments
. |
Data Frame that has been locationized. see |
lat_column |
Name of column containing Latitude data |
lng_column |
Name of column containing Longitude data |
plot_result |
Determines output type (Plot or Summarized Data Frame) |
Value
The percent and number of Locations that have been geocoded (see plot_result
for type of output)
Examples
d <- tibble::tribble(
~value, ~location_word, ~Country, ~lat, ~lng,
"city of new york", "new york", "USA", 40.71427, -74.00597,
"kerala, chennai municipality, and san francisco", "kerala", "India", 10.41667, 76.5,
"kerala, chennai municipality, and san francisco", "chennai", "India", 13.08784, 80.27847)
percent_located_locations(d,
lat_column = "lat",
lng_column = "lng",
plot_result = FALSE)
Reads Excel Files obtained from EM-DAT Database
Description
Reads Excel files downloaded from the EMDAT Database linked here
Usage
read_emdat(path_to_file, file_data = TRUE)
Arguments
path_to_file |
A String, the Path to the file downloaded. |
file_data |
A Boolean, Do you want information about the file and how it was created? |
Value
Returns a list containing one or two tibbles, one for the Disaster Data, and one for File Metadata.
Examples
## Not run:
read_emdat(path_to_file = "~/dummy", file_data = TRUE)
## End(Not run)
Splits string of manually entered locations into one row for each location
Description
Changes the unit of analysis from a disaster, to a disaster-location. This is useful as preprocessing before geocoding each disaster-location pair.
Can be used in piped operations, making it tidy!
Usage
split_locations(
.,
column_name = "locations",
dummy_words = c("cities", "states", "provinces", "districts", "municipalities",
"regions", "villages", "city", "state", "province", "district", "municipality",
"region", "township", "village", "near", "department"),
joiner_regex = ",|\\(|\\)|;|\\+|( and )|( of )"
)
Arguments
. |
data frame of disaster data |
column_name |
name of the column containing the locations |
dummy_words |
a vector of words that we don't want in our final output. |
joiner_regex |
a regex that tells us how to split the locations |
Value
same data frame with the location_word column added as well as a column called uncertain_location_specificity where the same location could be referred to in varying levels of specificity
Examples
locs <- c("city of new york", "kerala, chennai municipality, and san francisco",
"mumbai region, district of seattle, sichuan province")
d <- tibble::as_tibble(locs)
split_locations(d, column_name = "value")