sfhotspot is a package for understanding patterns in data that represent points in space. You can use it to count points in different places, estimate the density of points, show changes in the distribution of points over time, identify places with more points than would be expected by chance, and classify areas based on the number of points in them during different periods.
The specific motivation for this package was to analyse the locations of crimes, but the functions should be useful for understanding patterns in points representing other features or events. The package is called sfhotspot because it works with (and – where relevant – produces) SF objects as produced by the sf package. sfhotspot also produces data that is tidy, making it easy to use functions from packages such as dplyr to filter the results, etc.
All the functions in sfhotspot work on an SF data frame or tibble in
which each row in the data represents a single point (e.g. the location
of an event). In this introduction we will use the built-in
memphis_robberies
dataset to show how each of the
hotspot_*()
family of functions works.
memphis_robberies
contains details of 2,245 robberies in
Memphis, Tennessee, in 2019.
uid | offense_type | date | geometry |
---|---|---|---|
15213800 | personal robbery | 2019-01-01 01:30:00 | POINT (-89.942 35.149) |
15214030 | personal robbery | 2019-01-01 20:00:00 | POINT (-89.86 35.059) |
15214042 | personal robbery | 2019-01-01 21:58:00 | POINT (-89.929 35.058) |
15214050 | personal robbery | 2019-01-01 22:30:00 | POINT (-90.018 35.201) |
15214118 | personal robbery | 2019-01-02 09:38:00 | POINT (-89.96 35.14) |
15214242 | personal robbery | 2019-01-02 18:50:00 | POINT (-89.953 35.159) |
15214290 | personal robbery | 2019-01-02 23:30:00 | POINT (-89.95 35.026) |
15214295 | personal robbery | 2019-01-03 00:00:00 | POINT (-89.932 35.076) |
15214319 | personal robbery | 2019-01-03 03:00:00 | POINT (-90.021 35.033) |
15214428 | personal robbery | 2019-01-03 14:45:00 | POINT (-90.032 35.165) |
We can plot this raw data, but the resulting plot is not very informative (even with the points made semi-transparent), since there are too many points to see clear patterns.
The hotspot_count()
produces an SF object with counts
for the number of points in (by default) each cell in a grid of cells.
As with all the functions in the package, this can be customised in
various ways – see Common arguments, below.
point_counts <- hotspot_count(memphis_robberies)
#> Cell size set to 0.00524 degrees automatically
point_counts
#> Simple feature collection with 2926 features and 1 field
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -90.1261 ymin: 34.99475 xmax: -89.72786 ymax: 35.26199
#> Geodetic CRS: WGS 84
#> # A tibble: 2,926 × 2
#> n geometry
#> * <dbl> <POLYGON [°]>
#> 1 0 ((-90.08418 34.99475, -90.07894 34.99475, -90.07894 34.99999, -90.0841…
#> 2 0 ((-90.07894 34.99475, -90.0737 34.99475, -90.0737 34.99999, -90.07894 …
#> 3 0 ((-90.0737 34.99475, -90.06846 34.99475, -90.06846 34.99999, -90.0737 …
#> 4 0 ((-90.06846 34.99475, -90.06322 34.99475, -90.06322 34.99999, -90.0684…
#> 5 0 ((-90.06322 34.99475, -90.05798 34.99475, -90.05798 34.99999, -90.0632…
#> 6 0 ((-90.05798 34.99475, -90.05274 34.99475, -90.05274 34.99999, -90.0579…
#> 7 0 ((-90.05274 34.99475, -90.0475 34.99475, -90.0475 34.99999, -90.05274 …
#> 8 0 ((-90.0475 34.99475, -90.04226 34.99475, -90.04226 34.99999, -90.0475 …
#> 9 0 ((-90.04226 34.99475, -90.03702 34.99475, -90.03702 34.99999, -90.0422…
#> 10 0 ((-90.03702 34.99475, -90.03178 34.99475, -90.03178 34.99999, -90.0370…
#> # ℹ 2,916 more rows
We can then plot that grid of cells.
The hotspot_kde()
function can be used to calculate
kernel density estimates for each cell in a grid. The kernel density
estimation (KDE) can be customised using the bandwidth
and
bandwidth_adjust
arguments. This function also accepts the
argument explained in the Common arguments section, below.
If you do not specify any optional arguments,
hotspot_kde()
will try to choose reasonable default values.
The KDE algorithm requires projected co-ordinates (i.e. not longitudes
and latitudes), so we must first transform the data to use an
appropriate local projected co-ordinate system.
#> Simple feature collection with 3301 features and 2 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: 223497.9 ymin: 80772.86 xmax: 260497.9 ymax: 110272.9
#> Projected CRS: NAD83(HARN) / Tennessee
#> # A tibble: 3,301 × 3
#> n kde geometry
#> * <dbl> <dbl> <POLYGON [m]>
#> 1 0 20.7 ((229497.9 80772.86, 229497.9 81272.86, 229997.9 81272.86, 22999…
#> 2 0 25.1 ((229997.9 80772.86, 229997.9 81272.86, 230497.9 81272.86, 23049…
#> 3 0 30.2 ((230497.9 80772.86, 230497.9 81272.86, 230997.9 81272.86, 23099…
#> 4 0 35.6 ((230997.9 80772.86, 230997.9 81272.86, 231497.9 81272.86, 23149…
#> 5 0 40.7 ((231497.9 80772.86, 231497.9 81272.86, 231997.9 81272.86, 23199…
#> 6 0 45.2 ((231997.9 80772.86, 231997.9 81272.86, 232497.9 81272.86, 23249…
#> 7 0 48.4 ((232497.9 80772.86, 232497.9 81272.86, 232997.9 81272.86, 23299…
#> 8 0 50.1 ((232997.9 80772.86, 232997.9 81272.86, 233497.9 81272.86, 23349…
#> 9 0 50.2 ((233497.9 80772.86, 233497.9 81272.86, 233997.9 81272.86, 23399…
#> 10 1 48.6 ((233997.9 80772.86, 233997.9 81272.86, 234497.9 81272.86, 23449…
#> # ℹ 3,291 more rows
Again, we can plot the result.
ggplot() +
geom_sf(
mapping = aes(fill = kde),
data = robbery_kde,
alpha = 0.75,
colour = NA
) +
scale_fill_distiller(direction = 1)
We can adjust the appearance of the KDE layer on this map by
specifying optional arguments to hotspot_kde()
. In
particular, the bandwidth_adjust
argument is useful for
controlling the level of detail visible in the density layer – use
values of bandwidth_adjust
below 1 to show more detail, and
values above 1 to show a smoother density surface.
All the functions in this package work on a grid of cells, which can be customised using one or more of these common arguments:
cell_size
specifies the size of each equally spaced
grid cell, using the same units (metres, degrees, etc.) as used in the
sf data frame given in the data argument. Ignored if grid
is not NULL
. If this argument and grid
are
both NULL
(the default), the cell size will be calculated
automatically.grid_type
specifies whether the grid should be made up
of squares (“rect”, the default) or hexagons (“hex”). Ignored if
grid
is not NULL
.grid
specifies an sf
data frame containing
polygons, which will be used as the grid for which counts are made.quiet
whether messages should be printed reporting the
values of any parameters (such as cell_size
) that have been
set automatically.If grid
and cell_size
are both
NULL
, the cell size will be set so that there are 50 cells
on the shorter side of the grid. If the data SF object is projected in
metres or feet, the number of cells will be adjusted upwards so that the
cell size is a multiple of 100.