| Title: | Spatial Data Science Complementary Features | 
| Version: | 0.8.1 | 
| Description: | Wrapping and supplementing commonly used functions in the R ecosystem related to spatial data science, while serving as a basis for other packages maintained by Wenbo Lv. | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| URL: | https://stscl.github.io/sdsfun/, https://github.com/stscl/sdsfun | 
| BugReports: | https://github.com/stscl/sdsfun/issues | 
| RoxygenNote: | 7.3.3 | 
| Depends: | R (≥ 4.1.0) | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| Imports: | dplyr, geosphere, magrittr, pander, purrr, sf, spdep, stats, tibble, utils | 
| Suggests: | ggplot2, Rcpp, RcppArmadillo, terra, testthat (≥ 3.0.0) | 
| Config/testthat/edition: | 3 | 
| NeedsCompilation: | yes | 
| Packaged: | 2025-09-22 09:05:25 UTC; 31809 | 
| Author: | Wenbo Lv | 
| Maintainer: | Wenbo Lv <lyu.geosocial@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-09-22 10:30:02 UTC | 
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Value
NULL (this is the magrittr pipe operator)
check for NA values in a tibble
Description
check for NA values in a tibble
Usage
check_tbl_na(tbl)
Arguments
| tbl | A  | 
Value
A logical value.
Examples
demotbl = tibble::tibble(x = c(1,2,3,NA,1),
                         y = c(NA,NA,1:3),
                         z = 1:5)
demotbl
check_tbl_na(demotbl)
(partial) correlation test
Description
(partial) correlation test
Usage
cor_test(x, y, z = NULL, level = 0.05)
Arguments
| x | A numeric vector representing the first variable. | 
| y | A numeric vector representing the second variable. | 
| z | An optional numeric vector or matrix of control variables. If provided, partial correlation is computed. | 
| level | (optional) Significance level. Default is 0.05. | 
Value
A numeric vector
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
cor_test(gzma$PS_Score,gzma$EL_Score)
cor_test(gzma$PS_Score,gzma$EL_Score,gzma$OH_Score)
discretization
Description
discretization
Usage
discretize_vector(
  x,
  n,
  method = "natural",
  breakpoint = NULL,
  sampleprob = 0.15,
  thr = 0.4,
  seed = 123456789
)
Arguments
| x | A continuous numeric vector. | 
| n | (optional) The number of discretized classes. | 
| method | (optional) The method of discretization, default is  | 
| breakpoint | (optional) Break points for manually splitting data. When
 | 
| sampleprob | (optional) When the data size exceeds  | 
| thr | (optional) Threshold for controlling iteration, applicable only to
headtails breaks. Default is  | 
| seed | (optional) Random seed number, default is  | 
Value
A discretized integer vector
Examples
xvar = c(22361, 9573, 4836, 5309, 10384, 4359, 11016, 4414, 3327, 3408,
         17816, 6909, 6936, 7990, 3758, 3569, 21965, 3605, 2181, 1892,
         2459, 2934, 6399, 8578, 8537, 4840, 12132, 3734, 4372, 9073,
         7508, 5203)
discretize_vector(xvar, n = 5, method = 'natural')
transforming a category tibble into the corresponding dummy variable tibble
Description
transforming a category tibble into the corresponding dummy variable tibble
Usage
dummy_tbl(tbl)
Arguments
| tbl | A  | 
Value
A tibble
Examples
a = tibble::tibble(x = 1:3,y = 4:6)
dummy_tbl(a)
transforming a categorical variable into dummy variables
Description
transforming a categorical variable into dummy variables
Usage
dummy_vec(x)
Arguments
| x | An integer vector or can be converted into an integer vector. | 
Value
A matrix.
Examples
dummy_vec(c(1,1,3,2,4,6))
get variable names in a formula and data
Description
get variable names in a formula and data
Usage
formula_varname(formula, data)
Arguments
| formula | A formula. | 
| data | A  | 
Value
A list.
- yname
- Independent variable name 
- xname
- Dependent variable names 
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
formula_varname(PS_Score ~ EL_Score + OH_Score, gzma)
formula_varname(PS_Score ~ ., gzma)
spatial fuzzy overlay
Description
spatial fuzzy overlay
Usage
fuzzyoverlay(formula, data, method = "and")
Arguments
| formula | A formula of spatial fuzzy overlay. | 
| data | A data.frame or tibble of discretized data. | 
| method | (optional) Overlay methods. When  | 
Value
A numeric vector.
Note
Independent variables in the data provided to fuzzyoverlay() must be discretized
variables, and dependent variable are continuous variable.
Examples
set.seed(42)
sim = tibble::tibble(y = stats::runif(7,0,10),
                     x1 = c(1,rep(2,3),rep(3,3)),
                     x2 = c(rep(1,2),rep(2,2),rep(3,3)))
fo1 = fuzzyoverlay(y~x1+x2,data = sim, method = 'and')
fo1
fo2 = fuzzyoverlay(y~x1+x2,data = sim, method = 'or')
fo2
generate subsets of a set
Description
generate subsets of a set
Usage
generate_subsets(set, empty = TRUE, self = TRUE)
Arguments
| set | A vector. | 
| empty | (optional) When  | 
| self | (optional) When  | 
Value
A list.
Examples
generate_subsets(letters[1:3])
generate_subsets(letters[1:3],empty = FALSE)
generate_subsets(letters[1:3],self = FALSE)
generate_subsets(letters[1:3],empty = FALSE,self = FALSE)
only geodetector q-value
Description
only geodetector q-value
Usage
geodetector_q(y, hs)
Arguments
| y | Dependent variable | 
| hs | Independent variable | 
Value
A numeric value
Examples
geodetector_q(y = 1:7, hs = c('x',rep('y',3),rep('z',3)))
hierarchical clustering with spatial soft constraints
Description
hierarchical clustering with spatial soft constraints
Usage
hclustgeo_disc(
  data,
  n,
  alpha = 0.5,
  D1 = NULL,
  hclustm = "ward.D2",
  scale = TRUE,
  wt = NULL,
  ...
)
Arguments
| data | An  | 
| n | The number of hierarchical clustering classes, which can be a numeric value or vector. | 
| alpha | (optional) A positive value between  | 
| D1 | (optional) A  | 
| hclustm | (optional) The agglomeration method to be used, default is  | 
| scale | (optional) Whether to scaled the dissimilarities matrix, default is  | 
| wt | (optional) Vector with the weights of the observations. By default,  | 
| ... | (optional) Other arguments passed to  | 
Value
The grouped membership: a vector if n is a scalar, a matrix (columns correspond to elements
of n) if not.
Note
This is a C++ enhanced implementation of the hclustgeo function in ClustGeo package.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
gzma$group = hclustgeo_disc(gzma,5,alpha = 0.75)
plot(gzma["group"])
construct inverse distance weight
Description
Function for constructing inverse distance weight.
Usage
inverse_distance_swm(sfj, power = 1, bandwidth = NULL)
Arguments
| sfj | Vector object that can be converted to  | 
| power | (optional) Default is 1. Set to 2 for gravity weights. | 
| bandwidth | (optional) When the distance is bigger than bandwidth, the
corresponding part of the weight matrix is set to 0. Default is  | 
Details
The inverse distance weight formula is
w_{ij} = 1 / d_{ij}^\alpha
Value
A inverse distance weight matrices with class of matrix.
Examples
library(sf)
pts = read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun'))
wt = inverse_distance_swm(pts)
wt[1:5,1:5]
determine optimal spatial data discretization for individual variables
Description
Function for determining optimal spatial data discretization for individual variables based on locally estimated scatterplot smoothing (LOESS) model.
Usage
loess_optnum(qvec, discnumvec, increase_rate = 0.05)
Arguments
| qvec | A numeric vector of q statistics. | 
| discnumvec | A numeric vector of break numbers corresponding to  | 
| increase_rate | (optional) The critical increase rate of the number of discretization.
Default is  | 
Value
A two element numeric vector.
- discnum
- optimal number of spatial data discretization 
- increase_rate
- the critical increase rate of the number of discretization 
Note
When increase_rate is not satisfied by the calculation, the discrete number corresponding
to the highest q statistic is selected as a return.
Note that sdsfun sorts discnumvec from smallest to largest and keeps qvec in
one-to-one correspondence with discnumvec.
Examples
qv = c(0.26045642,0.64120405,0.43938704,0.95165535,0.46347836,
       0.25385338,0.78778726,0.95938330,0.83247885,0.09285196)
loess_optnum(qv,3:12)
global spatial autocorrelation test
Description
global spatial autocorrelation test
Usage
moran_test(sfj, wt = NULL, alternative = "greater", symmetrize = FALSE)
Arguments
| sfj | An  | 
| wt | (optional) Spatial weight matrix. Must be a  | 
| alternative | (optional) Specification of alternative hypothesis as  | 
| symmetrize | (optional) Whether or not to symmetrize the asymmetrical spatial weight matrix
wt by: 1/2 * (wt + wt'). Default is  | 
Value
A list utilizing a result tibble to store the following information for each variable:
- MoranI
- observed value of the Moran coefficient 
- EI
- expected value of Moran's I 
- VarI
- variance of Moran's I (under normality) 
- ZI
- standardized Moran coefficient 
- PI
- p-value of the test statistic 
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
moran_test(gzma)
normalization
Description
normalization
Usage
normalize_vector(x, to_left = 0, to_right = 1)
Arguments
| x | A continuous numeric vector. | 
| to_left | (optional) Specified minimum. Default is  | 
| to_right | (optional) Specified maximum. Default is  | 
Value
A continuous vector which has normalized.
Examples
normalize_vector(c(-5,1,5,0.01,0.99))
remove variable linear trend based on covariate
Description
remove variable linear trend based on covariate
Usage
rm_lineartrend(formula, data, method = c("cpp", "r"))
Arguments
| formula | A formula. | 
| data | The observation data. | 
| method | (optional) The method for using, which can be chosen as either  | 
Value
A numeric vector.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
rm_lineartrend(PS_Score ~ ., gzma)
rm_lineartrend(PS_Score ~ ., gzma, method = "r")
extract locations
Description
Extract locations of sf objects.
Usage
sf_coordinates(sfj)
Arguments
| sfj | An  | 
Value
A matrix.
Examples
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun'))
sf_coordinates(pts)
generates distance matrix
Description
Generates distance matrix for sf object
Usage
sf_distance_matrix(sfj)
Arguments
| sfj | An  | 
Value
A matrix.
Examples
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun'))
pts_distm = sf_distance_matrix(pts)
pts_distm[1:5,1:5]
sf object geometry column name
Description
Get the geometry column name of an sf object
Usage
sf_geometry_name(sfj)
Arguments
| sfj | An  | 
Value
A character.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
sf_geometry_name(gzma)
sf object geometry type
Description
Get the geometry type of an sf object
Usage
sf_geometry_type(sfj)
Arguments
| sfj | An  | 
Value
A lowercase character vector
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
sf_geometry_type(gzma)
generates cgcs2000 Gauss-Kruger projection epsg coding character
Description
Generates a Gauss-Kruger projection epsg coding character corresponding to an sfj object
under the CGCS2000 spatial reference.
Usage
sf_gk_proj_cgcs2000(sfj, degree = 6L)
Arguments
| sfj | An  | 
| degree | (optional)  | 
Value
A character.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) |>
  sf::st_transform(4490)
sf_gk_proj_cgcs2000(gzma,3)
sf_gk_proj_cgcs2000(gzma,6)
generates wgs84 utm projection epsg coding character
Description
Generates a utm projection epsg coding character corresponding to an sfj object
under the WGS84 spatial reference.
Usage
sf_utm_proj_wgs84(sfj)
Arguments
| sfj | An  | 
Value
A character.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
sf_utm_proj_wgs84(gzma)
generates voronoi diagram
Description
Generates Voronoi diagram (Thiessen polygons) for sf object
Usage
sf_voronoi_diagram(sfj)
Arguments
| sfj | An  | 
Value
An sf object of polygon geometry type or can be converted to this by sf::st_as_sf().
Note
Only sf objects of (multi-)point type are supported to generate voronoi diagram and the returned result includes only the geometry column.
Examples
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun'))
pts_v = sf_voronoi_diagram(pts)
library(ggplot2)
ggplot() +
  geom_sf(data = pts_v, color = 'red',
          fill = 'transparent') +
  geom_sf(data = pts, color = 'blue', size = 1.25) +
  theme_void()
only spade power of spatial determinant
Description
only spade power of spatial determinant
Usage
spade_psd(y, hs, wt)
Arguments
| y | Dependent variable | 
| hs | Independent variable | 
| wt | Spatial weight matrix | 
Value
A numeric value
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
wt1 = inverse_distance_swm(gzma)
spade_psd(y = gzma$PS_Score,
          hs = discretize_vector(gzma$PS_Score,5),
          wt = wt1)
constructs spatial weight matrices based on contiguity
Description
Constructs spatial weight matrices based on contiguity via spdep package.
Usage
spdep_contiguity_swm(
  sfj,
  queen = TRUE,
  k = NULL,
  order = 1L,
  cumulate = TRUE,
  style = "W",
  zero.policy = TRUE
)
Arguments
| sfj | An  | 
| queen | (optional) if  | 
| k | (optional) The number of nearest neighbours. Ignore this parameter when not using distance based neighbours to construct spatial weight matrices. | 
| order | (optional) The order of the adjacency object. Default is  | 
| cumulate | (optional) Whether to accumulate adjacency objects. Default is  | 
| style | (optional)  | 
| zero.policy | (optional)  if  | 
Value
A matrix
Note
When k is set to a positive value, using K-Nearest Neighbor Weights.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
wt1 = spdep_contiguity_swm(gzma, k = 6, style = 'B')
wt2 = spdep_contiguity_swm(gzma, queen = TRUE, style = 'B')
wt3 = spdep_contiguity_swm(gzma, queen = FALSE, order = 2, style = 'B')
constructs spatial weight matrices based on distance
Description
Constructs spatial weight matrices based on distance via spdep package.
Usage
spdep_distance_swm(
  sfj,
  kernel = NULL,
  k = NULL,
  bandwidth = NULL,
  power = 1,
  style = "W",
  zero.policy = TRUE
)
Arguments
| sfj | An  | 
| kernel | (optional) The kernel function, can be one of  | 
| k | (optional) The number of nearest neighbours. Default is  | 
| bandwidth | (optional) The bandwidth, default is  | 
| power | (optional) Default is  | 
| style | (optional)  | 
| zero.policy | (optional)  if  | 
Details
five different kernel weight functions:
- uniform: - K_{(z)} = 1/2,for- \lvert z \rvert < 1
- triangular - K_{(z)} = 1 - \lvert z \rvert,for- \lvert z \rvert < 1
- quadratic (epanechnikov) - K_{(z)} = \frac{3}{4} \left( 1 - z^2 \right),for- \lvert z \rvert < 1
- quartic - K_{(z)} = \frac{15}{16} {\left( 1 - z^2 \right)}^2,for- \lvert z \rvert < 1
- gaussian - K_{(z)} = \frac{1}{\sqrt{2 \pi}} e^{- \frac{z^2}{2}}
For the equation above, z = d_{ij} / h_i
where h_i is the bandwidth
Value
A matrix
Note
When kernel is setting, using distance weight based on kernel function, Otherwise
the inverse distance weight will be used.
Examples
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun'))
wt1 = spdep_distance_swm(pts, style = 'B')
wt2 = spdep_distance_swm(pts, kernel = 'gaussian')
wt3 = spdep_distance_swm(pts, k = 3, kernel = 'gaussian')
wt4 = spdep_distance_swm(pts, k = 3, kernel = 'gaussian', bandwidth = 10000)
spatial linear models selection
Description
spatial linear models selection
Usage
spdep_lmtest(formula, data, listw = NULL)
Arguments
| formula | A formula for linear regression model. | 
| data | An  | 
| listw | (optional) A listw. See  | 
Value
A list
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
spdep_lmtest(PS_Score ~ ., gzma)
construct neighbours list
Description
construct neighbours list
Usage
spdep_nb(sfj, queen = TRUE, k = NULL, order = 1L, cumulate = TRUE)
Arguments
| sfj | An  | 
| queen | (optional) if  | 
| k | (optional) The number of nearest neighbours. Ignore this parameter when not using distance based neighbours. | 
| order | (optional) The order of the adjacency object. Default is  | 
| cumulate | (optional) Whether to accumulate adjacency objects. Default is  | 
Value
A neighbours list with class nb
Note
When k is set to a positive value, using K-Nearest Neighbor
Examples
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun'))
nb1 = spdep_nb(pts, k = 6)
nb2 = spdep_nb(pts, queen = TRUE)
nb3 = spdep_nb(pts, queen = FALSE, order = 2)
spatial c(k)luster analysis by tree edge removal
Description
SKATER forms clusters by spatially partitioning data that has similar values for features of interest.
Usage
spdep_skater(sfj, k = 6, nb = NULL, ini = 5, ...)
Arguments
| sfj | An  | 
| k | (optional) The number of clusters. Default is  | 
| nb | (optional) A neighbours list with class nb. If the input  | 
| ini | (optional) The initial node in the minimal spanning tree. Defaul is  | 
| ... | (optional) Other parameters passed to spdep::skater(). | 
Value
A numeric vector of clusters.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
gzma_c = spdep_skater(gzma,8)
gzma$group = gzma_c
plot(gzma["group"])
spatial variance
Description
spatial variance
Usage
spvar(x, wt, method = c("cpp", "r"))
Arguments
| x | A numerical vector . | 
| wt | The spatial weight matrix. | 
| method | (optional) The method for calculating spatial variance, which can be chosen as
either  | 
Details
The spatial variance formula is
\Gamma = \frac{\sum_i \sum_{j \neq i} \omega_{ij}\frac{(y_i-y_j)^2}{2}}{\sum_i \sum_{j \neq i} \omega_{ij}}
Value
A numerical value.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
wt1 = inverse_distance_swm(gzma)
spvar(gzma$PS_Score,wt1)
spatial stratified heterogeneity test
Description
spatial stratified heterogeneity test
Usage
ssh_test(y, hs)
Arguments
| y | Variable Y, continuous numeric vector. | 
| hs | Spatial stratification or classification of each explanatory variable.
 | 
Value
A tibble
Examples
ssh_test(y = 1:7, hs = c('x',rep('y',3),rep('z',3)))
standardization
Description
To calculate the Z-score using variance normalization, the formula is as follows:
Z = \frac{(x - mean(x))}{sd(x)}
Usage
standardize_vector(x)
Arguments
| x | A numeric vector | 
Value
A standardized numeric vector
Examples
standardize_vector(1:10)
convert discrete variables in a tibble to integers
Description
convert discrete variables in a tibble to integers
Usage
tbl_all2int(tbl)
Arguments
| tbl | A  | 
Value
A converted tibble,data.frame or sf object.
Examples
demotbl = tibble::tibble(x = c(1,2,3,3,1),
                         y = letters[1:5],
                         z = c(1L,1L,2L,2L,3L),
                         m = factor(letters[1:5],levels = letters[5:1]))
tbl_all2int(demotbl)
convert xyz tbl to matrix
Description
convert xyz tbl to matrix
Usage
tbl_xyz2mat(tbl, x = 1, y = 2, z = 3)
Arguments
| tbl | A  | 
| x | (optional) The x-axis coordinates column number, default is  | 
| y | (optional) The y-axis coordinates column number, default is  | 
| z | (optional) The z (attribute) coordinates column number, default is  | 
Value
A list.
- z_attrs_matrix
- A matrix with attribute information. 
- x_coords_matrix
- A matrix with the x-axis coordinates. 
- y_coords_matrix
- A matrix with the y-axis coordinates. 
Examples
set.seed(42)
lon = rep(1:3,each = 3)
lat = rep(1:3,times = 3)
zattr = rnorm(9, mean = 10, sd = 1)
demodf = data.frame(x = lon, y = lat, z = zattr)
demodf
tbl_xyz2mat(demodf)