Version: | 1.2-2 |
Date: | 2020-05-08 |
Title: | Fitting the Centered Autologistic and Sparse Spatial Generalized Linear Mixed Models for Areal Data |
Type: | Package |
Author: | John Hughes <drjphughesjr@gmail.com> and Xiaohui Cui |
Maintainer: | John Hughes <drjphughesjr@gmail.com> |
Depends: | Rcpp, batchmeans |
Suggests: | parallel, pbapply |
Description: | Provides tools for analyzing spatial data, especially non- Gaussian areal data. The current version supports the sparse restricted spatial regression model of Hughes and Haran (2013) <doi:10.1111/j.1467-9868.2012.01041.x>, the centered autologistic model of Caragea and Kaiser (2009) <doi:10.1198/jabes.2009.07032>, and the Bayesian spatial filtering model of Hughes (2017) <doi:10.48550/arXiv.1706.04651>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
LinkingTo: | Rcpp, RcppArmadillo |
RcppModules: | ngspatialmod |
RoxygenNote: | 5.0.1 |
NeedsCompilation: | yes |
Packaged: | 2020-05-08 13:33:18 UTC; jphughesjr |
Repository: | CRAN |
Date/Publication: | 2020-05-08 16:10:03 UTC |
Adjacency matrix for the infant mortality data.
Description
A contains the adjacency matrix for the infant mortality data analyzed in (Hughes and Haran, 2013).
Usage
data(A)
Source
The data were obtained from the 2008 Area Resource File (ARF), a county-level database maintained by the Bureau of Health Professions, Health Resources and Services Administration, US Department of Health and Human Services.
Examples
data(A)
Return an adjacency matrix for a square lattice.
Description
Return an adjacency matrix for a square lattice.
Usage
adjacency.matrix(m, n = NULL)
Arguments
m |
the number of rows in the lattice. |
n |
the number of columns in the lattice. Defaults to |
Details
This function builds the adjacency matrix for the m
by n
square lattice.
Value
A matrix A
of 0s and 1s, where A_{ij}
is equal to 1 iff vertices i
and j
are adjacent.
Fit a centered autologistic model using maximum pseudolikelihood estimation or MCMC for Bayesian inference.
Description
Fit a centered autologistic model using maximum pseudolikelihood estimation or MCMC for Bayesian inference.
Usage
autologistic(formula, data, A, method = c("PL", "Bayes"), model = TRUE,
x = FALSE, y = FALSE, verbose = FALSE, control = list())
Arguments
formula |
an object of class |
data |
an optional data frame, list, or environment (or object coercible by |
A |
the adjacency matrix for the underlying graph. The matrix need not be binary, but it must be numeric and symmetric. |
method |
the method to use for inference. “ |
model |
a logical value indicating whether the model frame should be included as a component of the returned value. |
x |
a logical value indicating whether the model matrix used in the fitting process should be returned as a component of the returned value. |
y |
a logical value indicating whether the response vector used in the fitting process should be returned as a component of the returned value. |
verbose |
a logical value indicating whether to print various messages to the screen, including progress updates when |
control |
a list of the following control parameters.
|
Details
This function fits the centered autologistic model of Caragea and Kaiser (2009) using maximum pseudolikelihood estimation or MCMC for Bayesian inference. The joint distribution for the centered autologistic model is
\pi(Z\mid\theta)=c(\theta)^{-1}\exp\left(Z^\prime X\beta - \eta Z^\prime A\mu + \frac{\eta}{2}Z^\prime AZ\right),
where \theta = (\beta^\prime, \eta)^\prime
is the parameter vector, c(\theta)
is an intractable normalizing function, Z
is the response vector, X
is the design matrix,
\beta
is a (p-1)
-vector of regression coefficients, A
is the adjacency matrix for the underlying graph, \mu
is the vector of independence expectations,
and \eta
is the spatial dependence parameter.
Maximum pseudolikelihood estimation sidesteps the intractability of c(\theta)
by maximizing the product of the conditional likelihoods.
Confidence intervals are then obtained using a parametric bootstrap or a normal approximation, i.e., sandwich estimation. The bootstrap datasets are generated by perfect sampling (rautologistic
).
The bootstrap samples can be generated in parallel using the parallel package.
Bayesian inference is obtained using the auxiliary variable algorithm of Moller et al. (2006).
The auxiliary variables are generated by perfect sampling.
The prior distributions are (1) zero-mean normal with independent coordinates for \beta
, and (2) uniform for \eta
.
The common standard deviation for the normal prior can be supplied by the user as control parameter sigma
. The default is 1,000. The uniform prior has support [0, 2] by default, but the right endpoint can be supplied (as control parameter eta.max
) by the user.
The posterior covariance matrix of \theta
is estimated using samples obtained during a training run. The default number of iterations for the training run is 100,000, but this can be controlled by the user (via parameter trainit
). The estimated covariance matrix is then used as the proposal variance for a Metropolis-Hastings random walk. The proposal distribution is normal. The posterior samples obtained during the second run are used for inference. The length of the run can be controlled by the user via parameters minit
, maxit
, and tol
. The first determines the minimum number of iterations. If minit
has been reached, the sampler will terminate when maxit
is reached or all Monte Carlo standard errors are smaller than tol
, whichever happens first. The default values for minit
, maxit
, and tol
are 100,000; 1,000,000; and 0.01, respectively.
Value
autologistic
returns an object of class “autologistic
”, which is a list containing the following components.
coefficients |
the point estimate of |
fitted.values |
the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. |
linear.predictors |
the linear fit on link scale. |
residuals |
the response residuals. |
iter |
the size of the bootstrap/posterior sample. |
sample |
(where relevant) an |
mcse |
(where relevant) a |
S |
(where relevant) the estimated sandwich matrix. |
accept |
(Bayes) the acceptance rate for the MCMC sampler. |
y |
if requested (the default), the |
X |
if requested, the model matrix. |
model |
if requested (the default), the model frame. |
call |
the matched call. |
formula |
the formula supplied. |
method |
the method used for inference. |
convergence |
the integer code returned by |
message |
a character string to go along with |
terms |
the |
data |
the |
xlevels |
(where relevant) a record of the levels of the factors used in fitting. |
control |
a list containing the names and values of the control parameters. |
value |
the value of the negative log pseudolikelihood at its minimum. |
References
Caragea, P. and Kaiser, M. (2009) Autologistic models with interpretable parameters. Journal of Agricultural, Biological, and Environmental Statistics, 14(3), 281–300.
Hughes, J., Haran, M. and Caragea, P. C. (2011) Autologistic models for binary data on a lattice. Environmetrics, 22(7), 857–871.
Moller, J., Pettitt, A., Berthelsen, K., and Reeves, R. (2006) An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93(2), 451–458.
See Also
rautologistic
, residuals.autologistic
, summary.autologistic
, vcov.autologistic
Examples
# Use the 20 x 20 square lattice as the underlying graph.
n = 20
A = adjacency.matrix(n)
# Assign coordinates to each vertex such that the coordinates are restricted to the unit square
# centered at the origin.
x = rep(0:(n - 1) / (n - 1), times = n) - 0.5
y = rep(0:(n - 1) / (n - 1), each = n) - 0.5
X = cbind(x, y) # Use the vertex locations as spatial covariates.
beta = c(2, 2) # These are the regression coefficients.
col1 = "white"
col2 = "black"
colfunc = colorRampPalette(c(col1, col2))
# Simulate a dataset with the above mentioned regression component and eta equal to 0.6. This
# value of eta corresponds to dependence that is moderate in strength.
theta = c(beta, eta = 0.6)
set.seed(123456)
Z = rautologistic(X, A, theta)
# Find the MPLE, and do not compute confidence intervals.
fit = autologistic(Z ~ X - 1, A = A, control = list(confint = "none"))
summary(fit)
# The following examples are not executed by default since the computation is time consuming.
# Compute confidence intervals based on the normal approximation. Estimate the "filling" in the
# sandwich matrix using a parallel parametric bootstrap, where the computation is distributed
# across six cores on the host workstation.
# set.seed(123456)
# fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE,
# control = list(confint = "sandwich", nodes = 6))
# summary(fit)
# Compute confidence intervals based on a parallel parametric bootstrap. Use a bootstrap sample
# of size 500, and distribute the computation across six cores on the host workstation.
# set.seed(123456)
# fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE,
# control = list(confint = "bootstrap", bootit = 500, nodes = 6))
# summary(fit)
# Do MCMC for Bayesian inference. The length of the training run will be 10,000, and
# the length of the subsequent inferential run will be at least 10,000.
# set.seed(123456)
# fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE, method = "Bayes",
# control = list(trainit = 10000, minit = 10000, sigma = 1000))
# summary(fit)
Infant mortality data.
Description
infant contains the infant mortality data analyzed in (Hughes and Haran, 2013).
Usage
data(infant)
Source
The data were obtained from the 2008 Area Resource File (ARF), a county-level database maintained by the Bureau of Health Professions, Health Resources and Services Administration, US Department of Health and Human Services.
Examples
data(infant)
Family function for negative binomial GLMs.
Description
Provides the information required to apply a sparse SGLMM to conditionally negative binomial outcomes.
Usage
negbinomial(theta = stop("'theta' must be specified."), link = "log")
Arguments
theta |
the dispersion parameter (must be positive). |
link |
the link function, as a character string, name, or one-element character vector, specifying one of |
Value
An object of class “family
”, a list of functions and expressions needed to fit a negative binomial GLM.
Return a perfect sample from a centered autologistic model.
Description
Return a perfect sample from a centered autologistic model.
Usage
rautologistic(X, A, theta)
Arguments
X |
the design matrix. |
A |
the adjacency matrix for the underlying graph. The matrix need not be binary, but it must be numeric and symmetric. |
theta |
the vector of parameter values: |
Details
This function implements a perfect sampler for the centered autologistic model. The sampler employs coupling from the past.
Value
A vector that is distributed exactly according to the centered autologistic model with the given design matrix and parameter values.
References
Moller, J. (1999) Perfect simulation of conditionally specified models. Journal of the Royal Statistical Society, Series B, Methodological, 61, 251–264.
Propp, J. G. and Wilson, D. B. (1996) Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures and Algorithms, 9(1-2), 223–252.
Examples
# Use the 20 x 20 square lattice as the underlying graph.
n = 20
A = adjacency.matrix(n)
# Assign coordinates to each vertex such that the coordinates are restricted to the unit square
# centered at the origin.
x = rep(0:(n - 1) / (n - 1), times = n) - 0.5
y = rep(0:(n - 1) / (n - 1), each = n) - 0.5
X = cbind(x, y) # Use the vertex locations as spatial covariates.
beta = c(2, 2) # These are the regression coefficients.
# Simulate a dataset with the above mentioned regression component and eta equal to 0.6. This
# value of eta corresponds to dependence that is moderate in strength.
theta = c(beta, eta = 0.6)
set.seed(123456)
Z = rautologistic(X, A, theta)
Extract model residuals.
Description
Extract model residuals.
Usage
## S3 method for class 'autologistic'
residuals(object, type = c("deviance", "pearson",
"response"), ...)
Arguments
object |
an object of class |
type |
the type of residuals that should be returned. The alternatives are “ |
... |
additional arguments. |
Value
A vector of residuals.
See Also
Extract model residuals.
Description
Extract model residuals.
Usage
## S3 method for class 'sparse.sglmm'
residuals(object, type = c("deviance", "pearson",
"response"), ...)
Arguments
object |
an object of class |
type |
the type of residuals that should be returned. The alternatives are “ |
... |
additional arguments. |
Value
A vector of residuals.
See Also
Fit a sparse SGLMM.
Description
Fit a sparse SGLMM.
Usage
sparse.sglmm(formula, family = gaussian, data, offset, A, method = c("BSF",
"RSR"), attractive = 50, repulsive = 0, tol = 0.01, minit = 10000,
maxit = 1e+06, tune = list(), hyper = list(), model = TRUE,
x = FALSE, y = FALSE, verbose = FALSE, parallel = FALSE)
Arguments
formula |
an object of class |
family |
a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function, or the result of a call to a family function. (See |
data |
an optional data frame, list, or environment (or object coercible by |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be |
A |
the adjacency matrix for the underlying graph. |
method |
the basis to use. The options are Bayesian spatial filtering (“ |
attractive |
the number of attractive Moran eigenvectors to use. The default is 50. See ‘Details’ for more information. |
repulsive |
the number of repulsive Moran eigenvectors to use. The default is 0. See ‘Details’ for more information. |
tol |
a tolerance. If all Monte Carlo standard errors are smaller than |
minit |
the minimum sample size. This should be large enough to permit accurate estimation of Monte Carlo standard errors. The default is 10,000. |
maxit |
the maximum sample size. Sampling from the posterior terminates when all Monte Carlo standard errors are smaller than |
tune |
(where relevant) a list containing |
hyper |
a list containing |
model |
a logical value indicating whether the model frame should be included as a component of the returned value. |
x |
a logical value indicating whether the model matrix used in the fitting process should be returned as a component of the returned value. |
y |
a logical value indicating whether the response vector used in the fitting process should be returned as a component of the returned value. |
verbose |
a logical value indicating whether to print MCMC progress to the screen. Defaults to |
parallel |
(for parallelized computation of the Moran operator) a list containing |
Details
This function fits the sparse restricted spatial regression model of Hughes and Haran (2013), or the Bayesian spatial filtering model of Hughes (2017). The first stage of the model is
g(\mu_i)=x_i^\prime\beta+m_i^\prime\gamma\hspace{1 cm}(i=1,\dots,n)
or, in vectorized form,
g(\mu)=X\beta+M\gamma,
where X
is the design matrix, \beta
is a p
-vector of regression coefficients, the columns of M
are q
eigenvectors of the Moran operator, and \gamma
are spatial random effects. Arguments attractive
and repulsive
can be used to control the number of eigenvectors used. The default values are 50 and 0, respectively, which corresponds to pure spatial smoothing. Inclusion of some repulsive eigenvectors can be advantageous in certain applications.
The second stage, i.e., the prior for \gamma
, is
p(\gamma\mid\tau_s)\propto\tau_s^{q/2}\exp\left(-\frac{\tau_s}{2}\gamma^\prime M^\prime QM\gamma\right),
where \tau_s
is a smoothing parameter and Q
is the graph Laplacian.
The prior for \beta
is spherical p
-variate normal with mean zero and common standard deviation sigma.b
, which defaults to 1,000. The prior for \tau_s
is gamma with parameters 0.5 and 2,000. The same prior is used for \theta
(when family is negbinomial
).
When the response is normally distributed, the identity link is assumed, in which case the first stage is
\mu=X\beta+M\gamma+M\delta,
where \delta
are heterogeneity random effects. When the response is Poisson distributed, heterogeneity random effects are optional. In any case, the prior on \delta
is spherical q
-variate normal with mean zero and common variance 1/\tau_h
. The prior for \tau_h
is gamma with parameters a_h
and b_h
, the values of which are controlled by the user through argument hyper
.
If the response is Bernoulli, negative binomial, or Poisson, \beta
and \gamma
are updated using Metropolis-Hastings random walks with normal proposals. The proposal covariance matrix for \beta
is the estimated asymptotic covariance matrix from a glm
fit to the data (see vcov
). The proposal for \gamma
is spherical normal with common standard deviation sigma.s
.
The updates for \tau_s
and \tau_h
are Gibbs updates irrespective of the response distribution.
If the response is Poisson distributed and heterogeneity random effects are included, those random effects are updated using a Metropolis-Hastings random walk with a spherical normal proposal. The common standard deviation is sigma.h
.
If the response is normally distributed, all updates are Gibbs updates.
If the response is negative binomial, the dispersion parameter \theta
is updated using a Metropolis-Hastings random walk with a normal proposal. Said proposal has standard deviation sigma.theta
, which can be provided by the user as an element of argument tune
.
Value
sparse.sglmm
returns an object of class “sparse.sglmm
”, which is a list containing the following components.
coefficients |
the estimated regression coefficients. |
fitted.values |
the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. |
linear.predictors |
the linear fit on link scale. |
residuals |
the response residuals. |
iter |
the size of the posterior sample. |
beta.sample |
an |
gamma.sample |
an |
delta.sample |
(where relevant) an |
theta.sample |
(where relevant) a vector containing the posterior samples for |
tau.s.sample |
a vector containing the posterior samples for |
tau.h.sample |
(where relevant) a vector containing the posterior samples for |
gamma.est |
the estimate of |
delta.est |
(where relevant) the estimate of |
tau.s.est |
the estimate of |
tau.h.est |
(where relevant) the estimate of |
theta.est |
(where relevant) the estimate of |
beta.mcse |
the Monte Carlo standard errors for |
gamma.mcse |
the Monte Carlo standard errors for |
delta.mcse |
(where relevant) the Monte Carlo standard errors for |
tau.s.mcse |
the Monte Carlo standard error for |
tau.h.mcse |
(where relevant) the Monte Carlo standard error for |
theta.mcse |
(where relevant) the Monte Carlo standard error for |
D.bar |
the goodness of fit component of the DIC. |
pD |
the penalty component of the DIC. |
dic |
the deviance information criterion. |
beta.accept |
the acceptance rate for |
gamma.accept |
the acceptance rate for |
delta.accept |
(where relevant) the acceptance rate for |
theta.accept |
(where relevant) the acceptance rate for |
y |
if requested (the default), the |
X |
if requested, the model matrix. |
M |
if requested, the matrix of Moran eigenvectors. |
eigen.values |
if requested, the spectrum of the Moran operator. |
hyper |
a list containing the names and values of the hyperparameters. |
tune |
a list containing the names and values of the tuning parameters. |
model |
if requested (the default), the model frame. |
call |
the matched call. |
formula |
the formula supplied. |
terms |
the |
data |
the |
offset |
the offset vector used. |
xlevels |
(where relevant) a record of the levels of the factors used in fitting. |
References
Hughes, J. and Haran, M. (2013) Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. Journal of the Royal Statistical Society, Series B, 75(1), 139–159.
See Also
residuals.sparse.sglmm
, summary.sparse.sglmm
, vcov.sparse.sglmm
Examples
## Not run:
The following code duplicates the analysis described in (Hughes and Haran, 2013). The data are
infant mortality data for 3,071 US counties. We do a spatial Poisson regression with offset.
data(infant)
infant$low_weight = infant$low_weight / infant$births
attach(infant)
Z = deaths
X = cbind(1, low_weight, black, hispanic, gini, affluence, stability)
data(A)
set.seed(123456)
fit = sparse.sglmm(Z ~ X - 1 + offset(log(births)), family = poisson, A = A, method = "RSR",
tune = list(sigma.s = 0.02), verbose = TRUE)
summary(fit)
## End(Not run)
Print a summary of a centered autologistic model fit.
Description
Print a summary of a centered autologistic model fit.
Usage
## S3 method for class 'autologistic'
summary(object, alpha = 0.05, digits = 4, ...)
Arguments
object |
an object of class |
alpha |
the significance level for the quantile/HPD intervals. The default is 0.05. |
digits |
the number of significant digits to display. The default is 4. |
... |
additional arguments. |
Details
This function displays (1) the call to autologistic
, (2) the values of the control parameters, (3) a table of estimates, and (4) the size of the bootstrap/posterior sample. Each row of the table of estimates shows a parameter estimate, the confidence or HPD interval for the parameter, and, where applicable, the Monte Carlo standard error.
See Also
Print a summary of a sparse SGLMM fit.
Description
Print a summary of a sparse SGLMM fit.
Usage
## S3 method for class 'sparse.sglmm'
summary(object, alpha = 0.05, digits = 4, ...)
Arguments
object |
an object of class |
alpha |
the significance level used to compute the HPD intervals. The default is 0.05. |
digits |
the number of significant digits to display. The default is 4. |
... |
additional arguments. |
Details
This function displays (1) the call to sparse.sglmm
, (2) the values of the hyperparameters and tuning parameters, (3) a table of estimates, (4) the DIC value for the fit, and (5) the number of posterior samples. Each row of the table of estimates shows an estimated regression coefficient, the HPD interval for the coefficient, and the Monte Carlo standard error.
See Also
Return the estimated covariance matrix for an autologistic
model object.
Description
Return the estimated covariance matrix for an autologistic
model object.
Usage
## S3 method for class 'autologistic'
vcov(object, ...)
Arguments
object |
a fitted |
... |
additional arguments. |
Value
An estimate of the covariance matrix of the parameters (in a Bayesian setting), or an estimate of the covariance matrix of the maximum pseudolikelihood estimator of the parameters.
Return the covariance matrix of the regression parameters of a sparse.sglmm
model object.
Description
Return the covariance matrix of the regression parameters of a sparse.sglmm
model object.
Usage
## S3 method for class 'sparse.sglmm'
vcov(object, ...)
Arguments
object |
a fitted |
... |
additional arguments. |
Value
An estimate of the posterior covariance matrix of the regression coefficients.