Title: | Goodness of Fit Tests Based on Empirical Distribution Functions |
Version: | 1.0.0 |
Description: | Routines that allow the user to run goodness of fit tests based on empirical distribution functions for formal model evaluation in a general likelihood model. In addition, functions are provided to test if a sample follows Normal or Gamma distributions, validate the normality assumptions in a linear model, and examine the appropriateness of a Gamma distribution in generalized linear models with various link functions. Michael Arthur Stephens (1976) http://www.jstor.org/stable/2958206. |
License: | GPL (≥ 3) |
URL: | https://github.com/pnickchi/gofedf |
BugReports: | https://github.com/pnickchi/gofedf/issues |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 2.10) |
Imports: | stats, CompQuadForm, MASS, glm2, statmod |
NeedsCompilation: | no |
Author: | Richard Lockhart [aut], Payman Nickchi [aut, cre] |
Maintainer: | Payman Nickchi <payman.nickchi@gmail.com> |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
VignetteBuilder: | knitr |
Packaged: | 2025-05-29 07:10:10 UTC; payman |
Repository: | CRAN |
Date/Publication: | 2025-05-29 14:50:06 UTC |
Compute the maximum likelihood estimate of parameters in Inverse Gaussian distribution with weighted observations.
Description
This function is used in testYourModel
function for example purposes.
Usage
IGMLE(obs, ...)
Arguments
obs |
a numeric vector of sample observations. |
... |
a list of additional parameters to define the likelihood. |
Value
The function compute the MLE of parameters in Inverse Gaussian distribution and returns a vector of estimates. The first and second elements of the vector are MLE of the mean and shape, respectively.
Compute the probability integral transformed values for a sample from Inverse Gaussian distribution.
Description
This function is used in testYourModel
function for example purposes.
Usage
IGPIT(obs, ...)
Arguments
obs |
A numeric vector of sample observations. |
... |
A list of additional parameters to define the likelihood. |
Value
A numeric vector of probability integral transformed values of sample observations.
Compute the score function of the Inverse Gaussian distribution based on a sample.
Description
This function is used in testYourModel
function for example purposes.
Usage
IGScore(obs, ...)
Arguments
obs |
a numeric vector of sample observations. |
... |
a list of additional parameters to define the likelihood. |
Value
The score matrix with n rows (number of sample observations) and 2 columns (mean and shape).
Apply Goodness of Fit Test for Exponential Distribution
Description
Performs the goodness-of-fit test based on empirical distribution function to check if an i.i.d sample follows an Exponential distribution.
Usage
testExponential(
x,
discretize = FALSE,
ngrid = length(x),
gridpit = FALSE,
hessian = FALSE,
method = "cvm"
)
Arguments
x |
a non-empty numeric vector of sample data. |
discretize |
If |
ngrid |
the number of equally spaced points to discretize the (0,1) interval for computing the covariance function. |
gridpit |
logical. If |
hessian |
logical. If |
method |
a character string indicating which goodness-of-fit statistic is to be computed. The default value is 'cvm' for the Cramer-von-Mises statistic. Other options include 'ad' for the Anderson-Darling statistic, and 'both' to compute both cvm and ad. |
Value
A list of two containing the following components:
Statistic: the value of goodness-of-fit statistic.
p-value: the approximate p-value for the goodness-of-fit test. if method = 'cvm' or method = 'ad', it returns a numeric value for the statistic and p-value. If method = 'both', it returns a numeric vector with two elements and one for each statistic.
Examples
set.seed(123)
n <- 50
sim_data <- rexp(n, rate = 2)
testExponential(x = sim_data)
Apply Goodness of Fit Test to the Residuals of a Generalized Linear Model with Gamma Link Function
Description
testGLMGamma
is used to check the validity of Gamma assumption for the
response variable when fitting generalized linear model. Common link functions
in glm
can be used here.
Usage
testGLMGamma(
x,
y,
fit = NULL,
l = "log",
discretize = FALSE,
ngrid = length(y),
gridpit = TRUE,
hessian = FALSE,
start.value = NULL,
control = NULL,
method = "cvm"
)
Arguments
x |
is either a numeric vector or a design matrix. In the design matrix, rows indicate observations and columns presents covariats. |
y |
is a vector of numeric values with the same number of observations or number of rows as x. |
fit |
is an object of class |
l |
a character vector indicating the link function that should be used
for Gamma family. Acceptable link functions for Gamma family are inverse,
identity and log. For more details see |
discretize |
If |
ngrid |
the number of equally spaced points to discretize the (0,1) interval for computing the covariance function. |
gridpit |
logical. If |
hessian |
logical. If |
start.value |
a numeric value or vector. This is the same as |
control |
a list of parameters to control the fitting process in
|
method |
a character string indicating which goodness-of-fit statistic is to be computed. The default value is 'cvm' for the Cramer-von-Mises statistic. Other options include 'ad' for the Anderson-Darling statistic, and 'both' to compute both cvm and ad. |
Value
A list of two containing the following components:
Statistic: the value of goodness-of-fit statistic.
p-value: the approximate p-value for the goodness-of-fit test. if method = 'cvm' or method = 'ad', it returns a numeric value for the statistic and p-value. If method = 'both', it returns a numeric vector with two elements and one for each statistic.
converged: logical to indicate if the IWLS algorithm have converged or not.
Examples
set.seed(123)
n <- 50
p <- 5
x <- matrix( rnorm(n*p, mean = 10, sd = 0.1), nrow = n, ncol = p)
b <- runif(p)
e <- rgamma(n, shape = 3)
y <- exp(x %*% b) * e
testGLMGamma(x, y, l = 'log')
myfit <- glm(y ~ x, family = Gamma('log'), x = TRUE, y = TRUE)
testGLMGamma(fit = myfit)
Apply Goodness of Fit Test for Gamma Distribution
Description
Performs the goodness-of-fit test based on empirical distribution function to check if an i.i.d sample follows a Gamma distribution.
Usage
testGamma(
x,
discretize = FALSE,
ngrid = length(x),
gridpit = FALSE,
hessian = FALSE,
rate = TRUE,
method = "cvm"
)
Arguments
x |
a non-empty numeric vector of sample data. |
discretize |
If |
ngrid |
the number of equally spaced points to discretize the (0,1) interval for computing the covariance function. |
gridpit |
logical. If |
hessian |
logical. If |
rate |
logical. If |
method |
a character string indicating which goodness-of-fit statistic is to be computed. The default value is 'cvm' for the Cramer-von-Mises statistic. Other options include 'ad' for the Anderson-Darling statistic, and 'both' to compute both cvm and ad. |
Value
A list of two containing the following components:
Statistic: the value of goodness-of-fit statistic.
p-value: the approximate p-value for the goodness-of-fit test. if method = 'cvm' or method = 'ad', it returns a numeric value for the statistic and p-value. If method = 'both', it returns a numeric vector with two elements and one for each statistic.
Examples
set.seed(123)
sim_data <- rgamma(n = 50, shape = 3)
testGamma(x = sim_data)
sim_data <- runif(n = 50)
testGamma(x = sim_data)
Apply Goodness of Fit Test to Residuals of a Linear Model
Description
testLMNormal
is used to check the normality assumption of
residuals in a linear model. This function can take the response variable
and design matrix, fit a linear model, and apply the goodness-of-fit test.
Conveniently, it can take an object of class "lm" and directly applies the
goodness-of-fit test. The function returns a goodness-of-fit statistic
along with an approximate p-value.
Usage
testLMNormal(
x,
y,
fit = NULL,
discretize = FALSE,
ngrid = length(y),
gridpit = TRUE,
hessian = FALSE,
method = "cvm"
)
Arguments
x |
is either a numeric vector or a design matrix. In the design matrix, rows indicate observations and columns presents covariates. |
y |
is a vector of numeric values with the same number of observations or number of rows as x. |
fit |
an object of class "lm" returned by |
discretize |
If |
ngrid |
the number of equally spaced points to discretize the (0,1) interval for computing the covariance function. |
gridpit |
logical. If |
hessian |
logical. If |
method |
a character string indicating which goodness-of-fit statistic is to be computed. The default value is 'cvm' for the Cramer-von-Mises statistic. Other options include 'ad' for the Anderson-Darling statistic, and 'both' to compute both cvm and ad. |
Value
A list of two containing the following components:
Statistic: the value of goodness-of-fit statistic.
p-value: the approximate p-value for the goodness-of-fit test. if method = 'cvm' or method = 'ad', it returns a numeric value for the statistic and p-value. If method = 'both', it returns a numeric vector with two elements and one for each statistic.
Examples
set.seed(123)
n <- 50
p <- 5
x <- matrix( runif(n*p), nrow = n, ncol = p)
e <- rnorm(n)
b <- runif(p)
y <- x %*% b + e
testLMNormal(x, y)
# Or pass lm.fit object directly:
lm.fit <- lm(y ~ x, x = TRUE, y = TRUE)
testLMNormal(fit = lm.fit)
Apply Goodness of Fit Test for Normal Distribution
Description
Performs the goodness-of-fit test based on empirical distribution function to check if an i.i.d sample follows a Normal distribution.
Usage
testNormal(
x,
discretize = FALSE,
ngrid = length(x),
gridpit = TRUE,
hessian = FALSE,
method = "cvm"
)
Arguments
x |
a non-empty numeric vector of sample data. |
discretize |
If |
ngrid |
the number of equally spaced points to discretize the (0,1) interval for computing the covariance function. |
gridpit |
logical. If |
hessian |
logical. If |
method |
a character string indicating which goodness-of-fit statistic is to be computed. The default value is 'cvm' for the Cramer-von-Mises statistic. Other options include 'ad' for the Anderson-Darling statistic, and 'both' to compute both cvm and ad. |
Value
A list of two containing the following components:
Statistic: the value of goodness-of-fit statistic.
p-value: the approximate p-value for the goodness-of-fit test. if method = 'cvm' or method = 'ad', it returns a numeric value for the statistic and p-value. If method = 'both', it returns a numeric vector with two elements and one for each statistic.
Examples
set.seed(123)
sim_data <- rnorm(n = 50)
testNormal(x = sim_data)
sim_data <- rgamma(n = 50, shape = 3)
testNormal(x = sim_data)
Apply the Goodness of Fit Test Based on Empirical Distribution Function to Any Likelihood Model.
Description
This function applies the goodness-of-fit test based on empirical distribution function. It requires certain inputs depending on whether the model involves parameter estimation or not. If the model is known and there is no parameter estimation, the function requires the probability transformed (or pit) values of the sample. This ought to be a numeric vector. If there is parameter estimation in the model, the function additionally requires the score as a matrix with n rows and p columns, where n is the sample size and p is the number of estimated parameters. The function checks if the sum of columns in score is near zero at the estimated parameter (which is assumed to be the maximum likelihood estimate).
Usage
testYourModel(
pit,
score = NULL,
discretize = FALSE,
ngrid = length(pit),
gridpit = TRUE,
precision = 1e-09,
method = "cvm"
)
Arguments
pit |
The probability transformed (or pit) values of the sample which ought to be a numeric vector. |
score |
The default value is null and refers to no parameter estimation case. If there is parameter estimation, the score must be a matrix with n rows and p columns, where n is the sample size and p is the number of estimated parameters. |
discretize |
If |
ngrid |
The number of equally spaced points to discretize the (0,1)interval for computing the covariance function. |
gridpit |
logical. If |
precision |
The theory behind goodness-of-fit test based on empirical distribution function (edf) works well if the MLE is indeed the root of derivative of log likelihood function. A precision of 1e-9 (default value) is used to check this. A warning message is generated if the score evaluated at MLE is not close enough to zero. |
method |
a character string indicating which goodness-of-fit statistic is to be computed. The default value is 'cvm' for the Cramer-von-Mises statistic. Other options include 'ad' for the Anderson-Darling statistic, and 'both' to compute both cvm and ad. |
Value
A list of two containing the following components:
Statistic: the value of goodness-of-fit statistic.
p-value: the approximate p-value for the goodness-of-fit test. if method = 'cvm' or method = 'ad', it returns a numeric value for the statistic and p-value. If method = 'both', it returns a numeric vector with two elements and one for each statistic.
Examples
# Example: Inverse Gaussian (IG) distribution with weights
# Set the seed to reproduce example.
set.seed(123)
# Set the sample size
n <- 50
# Assign weights
weights <- rep(1.5,n)
# Set mean and shape parameters for IG distribution.
mio <- 2
lambda <- 2
# Generate a random sample from IG distribution with weighted shape.
sim_data <- statmod::rinvgauss(n, mean = mio, shape = lambda * weights)
# Compute MLE of parameters, score matrix, and pit values.
theta_hat <- IGMLE(obs = sim_data, w = weights)
ScoreMatrix <- IGScore(obs = sim_data, w = weights, mle = theta_hat)
pitvalues <- IGPIT(obs = sim_data , w = weights, mle = theta_hat)
# Apply the goodness-of-fit test.
testYourModel(pit = pitvalues, score = ScoreMatrix)