Type: | Package |
Title: | Kernel Density Estimation with Parametric Starts and Asymmetric Kernels |
Version: | 1.1.1 |
Description: | Handles univariate non-parametric density estimation with parametric starts and asymmetric kernels in a simple and flexible way. Kernel density estimation with parametric starts involves fitting a parametric density to the data before making a correction with kernel density estimation, see Hjort & Glad (1995) <doi:10.1214/aos/1176324627>. Asymmetric kernels make kernel density estimation more efficient on bounded intervals such as (0, 1) and the positive half-line. Supported asymmetric kernels are the gamma kernel of Chen (2000) <doi:10.1023/A:1004165218295>, the beta kernel of Chen (1999) <doi:10.1016/S0167-9473(99)00010-9>, and the copula kernel of Jones & Henderson (2007) <doi:10.1093/biomet/asm068>. User-supplied kernels, parametric starts, and bandwidths are supported. |
License: | MIT + file LICENSE |
URL: | https://github.com/JonasMoss/kdensity |
BugReports: | https://github.com/JonasMoss/kdensity/issues |
Encoding: | UTF-8 |
Suggests: | extraDistr, SkewHyperbolic, testthat, covr, knitr, rmarkdown |
Imports: | assertthat, univariateML, EQL |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-03-04 11:59:23 UTC; jmoss |
Author: | Jonas Moss |
Maintainer: | Jonas Moss <jonas.gjertsen@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-03-04 11:50:04 UTC |
Add a new bw to bw_environment
.
Description
Add a new bw to bw_environment
.
Usage
add_bw(bw_str, bw)
Arguments
bw_str |
A string giving the name of the density. |
bw |
The bw function. |
Value
None.
Add a new kernel to kernels_environment
.
Description
Add a new kernel to kernels_environment
.
Usage
add_kernel(kernel_str, kernel)
Arguments
kernel_str |
A string giving the name of the density. |
kernel |
The kernel function. |
Value
None.
Add a new parametric start to starts_environment
.
Description
Add a new parametric start to starts_environment
.
Usage
add_start(start_str, start)
Arguments
start_str |
A string giving the name of the density. |
start |
The parametric start function. |
Value
None.
Bandwidth Selectors
Description
The available options for bandwidth selectors, passed as the bw
argument to kdensity
.
Arguments
x |
The input data. |
kernel_str |
A string specifying the kernel, e.g. "gaussian." |
start_str |
A string specifying the parametric start, e.g. "normal". |
support |
The domain of definition for the kernel. (-Inf, Inf) for symmetric kernels. |
Details
The bandwidth functions are not exported. They are members of the
environment bw_environments
, and can be accessed by
kdensity:::bw_environments
.
Bandwidth selectors
"nrd0", "nrd", "bcv", "SJ"
: Bandwidth selectors from stats
.
They are documented in [bandwidth][stats::bandwidth] stats:bandwidth
.
"nrd0" is the standard bandwidth selector for symmetric kernels with
constant parametric starts.
"ucv"
: Unbiased cross validation. The standard option for
asymmetric kernels.
"RHE"
: Selector for parametric starts with a symmetric kernel,
based on a reference rule with Hermite polynomials.
Described in Hjort & Glad (1995). The default method in kdensity
when a parametric
start is supplied and the kernel is symmetric.
"JH"
: Selector for the Gaussian copula kernel, based on
normal reference rule. Described in Jones & Henderson. The default method when
the gcopula
kernel is used in kdensity
.
Structure
The bandwidth selector is a function of four arguments: The data
x
, a kernel string kernel
, a start string start
,
and a support vector support
. To obtain the functions associated
with these strings, use get_kernel
and get_start
. The
function should return a double.
References
Jones, M. C., and D. A. Henderson. "Kernel-type density estimation on the unit interval." Biometrika 94.4 (2007): 977-984. Hjort, Nils Lid, and Ingrid K. Glad. "Nonparametric density estimation with a parametric start." The Annals of Statistics (1995): 882-904.
See Also
kdensity()
, stats::bandwidth.kernel()
for the
bandwidth selectors of stats::density()
. In addition,
kernels()
; parametric_starts()
Examples
## Not a serious bandwidth function.
silly_width <- function(x, kernel = NULL, start = NULL, support = NULL) {
rexp(1)
}
kdensity(mtcars$mpg, start = "gumbel", bw = silly_width)
Get bandwidth functions from string.
Description
Get bandwidth functions from string.
Usage
get_bw(bw_str)
Arguments
bw_str |
a string specifying the density of interest. |
Value
a bandwidth function.
Helper function that gets a kernel function for kdensity.
Description
Helper function that gets a kernel function for kdensity.
Usage
get_kernel(kernel_str)
Arguments
kernel_str |
a string specifying which kernel to use. |
Value
a kernel function of the format k(u) with integral normalized to 1.
Fill in missing kernel, start or support given the supplied values.
Description
This function takes the supplied values of kernel, start, and support and fills in the non-supplied ones. It also handles inconsistencies, such as providing a support on (-Inf, Inf) but a kernel on (0, Inf).
Usage
get_kernel_start_support(kernel, start, support)
Arguments
kernel |
Supplied kernel; string or list. |
start |
Supplied parametric start; string or list. |
support |
Binary vector. |
Details
The kernel
and start
parameters are either strings or
adhering to the kernel/start list structure. support
is a
numeric vector of length two.
Value
a list with members kernel, kernel_str, start, start_str, and support.
Supplies a plotting range from a kdensity object.
Description
Supplies a plotting range from a kdensity object.
Usage
get_range(obj)
Arguments
obj |
A kdensity object. |
Value
S vector of size 1000, used for plotting.
Get a bandwidth string when 'bw' is unspecified.
Description
Get a bandwidth string when 'bw' is unspecified.
Usage
get_standard_bw(kernel_str, start_str, support)
Arguments
kernel_str |
a kernel string |
start_str |
a parametric start string. |
support |
the support. |
Value
a bandwidth string.
Get densities and estimators from strings.
Description
Get densities and estimators from strings.
Usage
get_start(start_str)
Arguments
start_str |
A string specifying the density of interest. |
Value
A list of two functions.
Parametrically guided kernel density estimation
Description
kdensity
computes a parametrically guided kernel density estimate
for univariate data. It supports asymmetric kernels and parametric starts
through the kernel
and start
arguments.
Usage
kdensity(
x,
bw = NULL,
adjust = 1,
kernel = NULL,
start = NULL,
support = NULL,
na.rm = FALSE,
normalized = TRUE,
tolerance = 0.01
)
Arguments
x |
Numeric vector containing the data. |
bw |
A bandwidth function. Can be either a string, a custom-made
function, or a double. The supported bandwidth functions are documented
in |
adjust |
An adjustment constant, so that |
kernel |
The kernel function. Can be chosen from the list of built-in
kernels or be custom-made. See |
start |
Parametric start. Can be chosen from the list of built-in
parametric starts or be custom-made. See |
support |
The support of the data. Must be compatible with the supplied
|
na.rm |
Logical; if |
normalized |
Logical; if |
tolerance |
Numeric; the relative error to tolerate in normalization. |
Details
The default values for bw
, kernel
, start
, and
support
are interdependent, and are chosen to make sense. E.g.,
the default value for support
when start = beta
is
c(0, 1)
.
The start
argument defaults to uniform
, which corresponds
to ordinary kernel density estimation. The typical default value for
kernel
is gaussian
.
If normalized
is FALSE
and start != "uniform"
, the resulting
density will not integrate to 1 in general.
Value
kdensity
returns an S3 function object of
base::class()
"kdensity". This is a callable function with the
following elements, accessible by '$':
x
The data supplied in
x
.bw_str, bw, adjust, h
The bandwidth function, the resulting bandwidth, the
adjust
argument, and the adjusted bandwidth.kernel_str, kernel, start, start_str, support
Name of the kernel, the kernel object, name of the parametric start, the start object, and the support of the density.
data.name, n, range, has.na, na.rm, normalized
Name of the data, number of observations, the range of the data, whether the data
x
containedNA
values, whether na.rm isTRUE
or not, and whether the density is normalized.call
The
call
tokdensity
.estimates
Named numeric vector containing the parameter estimates from the parametric start.
logLik
The log-likelihood of the parametric starts. Is
NA
for the uniform start.
References
Hjort, Nils Lid, and Ingrid K. Glad. "Nonparametric density estimation with a parametric start." The Annals of Statistics (1995): 882-904.
Jones, M. C., and D. A. Henderson. "Miscellanea kernel-type density estimation on the unit interval." Biometrika 94.4 (2007): 977-984.
Chen, Song Xi. "Probability density function estimation using gamma kernels." Annals of the Institute of Statistical Mathematics 52.3 (2000): 471-480.
Silverman, Bernard W. Density estimation for statistics and data analysis. Vol. 26. CRC press, 1986.
See Also
The stats
package function stats::density()
.
Examples
## Use gamma kernels to model positive data, the concentration of
## theophylline
concentration <- Theoph$conc + 0.001
plot(kdensity(concentration, start = "gamma", kernel = "gamma", adjust = 1 / 3),
ylim = c(0, 0.15), lwd = 2, main = "Concentration of theophylline"
)
lines(kdensity(concentration, start = "gamma", kernel = "gaussian"),
lty = 2, col = "grey", lwd = 2
)
lines(kdensity(concentration, start = "gaussian", kernel = "gaussian"),
lty = 3, col = "blue", lwd = 2
)
lines(kdensity(concentration, start = "gaussian", kernel = "gamma", adjust = 1 / 3),
lty = 4, col = "red", lwd = 2
)
rug(concentration)
## Using a density and and estimator from another package.
skew_hyperbolic <- list(
density = SkewHyperbolic::dskewhyp,
estimator = function(x) SkewHyperbolic::skewhypFit(x, printOut = FALSE)$param,
support = c(-Inf, Inf)
)
kde <- kdensity(diff(LakeHuron), start = skew_hyperbolic)
plot(kde,
lwd = 2, col = "blue",
main = "Annual differences in water level (ft) of Lake Huron, 1875 - 1972"
)
lines(kde, plot_start = TRUE, lty = 2, lwd = 2) # Plots the skew hyperbolic density.
rug(diff(LakeHuron))
kde$estimates # Also: coef(kde)
# Displays the parameter estimates:
# mu delta beta nu
# -1.140713 3.301112 2.551657 26.462469
Kernel functions
Description
Kernel functions are an important part of kdensity
. This document
lists the available built-in functions and the structure of them. Any kernel
in the list can be used in kdensity
by using kernel = "kernel"
for the intended kernel.
Details
Be careful combining kernels with compact support with parametric starts,
as the normalizing integral typically fails to converge. Use gaussian
instead.
Symmetric kernels
gaussian, normal
: The Gaussian kernel. The default argument when
starts
is supported on R.
epanechnikov, rectangular (uniform), triangular, biweight, cosine, optcosine
: Standard symmetric kernels, also used in
stats::density()
.
tricube, triweight
: Standard symmetric kernels. Not supported by
stats::density()
.
laplace
: Uses the Laplace density, also known as the double
exponential density.
Asymmetric kernels
gamma, gamma_biased
: The gamma kernel of Chen (2000). For use on the positive
half-line. gamma
is the recommended biased-corrected kernel.
gcopula
: The Gaussian copula kernel of Jones & Henderson (2007). For use
on the unit interval.
beta, beta_biased
: The beta kernel of Chen (1999). For use on the unit interval.
beta
is the recommended bias-corrected kernel.
Structure
A kernel is a list containing two mandatory elements and one optional
element. The mandatory element 'kernel
' is the kernel function.
It takes arguments y, x, h
, where x
is the data supplied
to kdensity
and y
is the point of evaluation. h
is
the bandwidth. Internally, the kernel function is evaluated as
1/h*kernel(y, x, h)
. It should be vectorized in x
, but
vectorization in y
is not needed.
The second mandatory element is support
, stating the domain of
definition for the kernel. This is used to distinguish kernels on the
unit interval / positive half-line from kernels on R.
sd
is used for symmetric kernels, and states the standard error
of the kernel. This is used to make kernels comparable to the Gaussian
kernel when calculating bandwidths.
References
Chen, Song Xi. "Probability density function estimation using gamma kernels." Annals of the Institute of Statistical Mathematics 52.3 (2000): 471-480. Jones, M. C., and D. A. Henderson. "Kernel-type density estimation on the unit interval." Biometrika 94.4 (2007): 977-984. Chen, Song Xi. "Beta kernel estimators for density functions." Computational Statistics & Data Analysis 31.2 (1999): 131-145.
See Also
kdensity()
; parametric_starts()
;
bandwidths()
.
Examples
gaussian <- list(
kernel = function(y, x, h) stats::dnorm((y - x) / h),
sd = 1,
support = c(-Inf, Inf)
)
gcopula <- list(
kernel = function(y, x, h) {
rho <- 1 - h^2
inside <- rho^2 * (qnorm(y)^2 + qnorm(x)^2) - 2 * rho * qnorm(y) * qnorm(x)
exp(-inside / (2 * (1 - rho^2)))
},
support = c(0, 1)
)
Merges two lists.
Description
Merges two lists.
Usage
listmerge(x, y, type = c("merge", "template"))
Arguments
x |
A list of default arguments. |
y |
A list of supplied arguments |
type |
If |
Value
A merged list where conflicts are solved in favour of y. Does not preserve ordering.
Parametric starts
Description
A parametric start is a density function with an associated estimator which
is used as a starting point in kdensity
. Several parametric starts
are implemented, all with maximum likelihood estimation. Custom-made
parametric starts are possible, see the Structure section.
Structure
The parametric start contains three elements: The density function, an
estimation function, and the support of the density. The parameters of
the density function must partially match the parameters of the estimator
function. The estimator function takes one argument, a numeric vector,
which is passed from kdensity
.
Supported parametric starts
kdensity
supports more than
20 built-in starts from the univariateML::univariateML package, see
univariateML::univariateML_models
for a list. Densities with variable
support, power
, are not supported. The pareto
density has its
support fixed to (1,Inf)
. The
options uniform, constant
makes kdensity
estimate a kernel
density without parametric starts.
See Also
kdensity()
; kernels()
; bandwidths()
Examples
start_exponential <- list(
density = stats::dexp,
estimator = function(data) {
c(rate = 1 / mean(data))
},
support = c(0, Inf)
)
start_inverse_gaussian <- list(
density = extraDistr::dwald,
estimator = function(data) {
c(
mu = mean(data),
lambda = mean(1 / data - 1 / mean(data))
)
},
support = c(0, Inf)
)
Plot, Lines and Points Methods for Kernel Density Estimation
Description
The plot
method for kdensity
objects.
Usage
## S3 method for class 'kdensity'
plot(x, range = NULL, plot_start = FALSE, zero_line = TRUE, ...)
## S3 method for class 'kdensity'
lines(x, range = NULL, plot_start = FALSE, zero_line = TRUE, ...)
## S3 method for class 'kdensity'
points(x, range = NULL, plot_start = FALSE, zero_line = TRUE, ...)
Arguments
x |
a |
range |
range of x values. |
plot_start |
logical; if |
zero_line |
logical; if |
... |
further plotting parameters. |
Value
None.
See Also
Examples
## Using the data set "precip" to eye-ball the similarity between
## a kernel fit, a parametric fit, and a kernel with parametric start fit.
kde_gamma <- kdensity(precip, kernel = "gaussian", start = "gamma")
kde <- kdensity(precip, kernel = "gaussian", start = "uniform")
plot(kde_gamma, main = "Annual Precipitation in US Cities")
lines(kde_gamma, plot_start = TRUE, lty = 2)
lines(kde, lty = 3)
rug(precip)
Helper function for the plot methods.
Description
A helper function for the plot methods that does most of the work under the hood.
Usage
plot_helper(
x,
range = NULL,
plot_start = FALSE,
zero_line = TRUE,
ptype = c("plot", "lines", "points"),
...
)
Arguments
x |
A |
range |
An optional range vector; like |
plot_start |
Logical; if |
zero_line |
Logical; if |
ptype |
The kind of plot to make |
... |
Passed to plot.default. |
Value
None.
Recycles arguments.
Description
Recycles arguments.
Usage
recycle(..., prototype)
Arguments
... |
A list of arguments to be recycled. |
prototype |
an optional argument. If given, repeats all arguments up to the length of the prototype. If an element of the list has the name, it is used. If not, the variable itself is used. |
Details
Recycles arguments so that all vectors are equally long. If a prototype is given, each vector will have the same size as the prototype.
Checks compatibility between supports.
Description
The supplied support must never be larger than the support of the parametric start / kernel.
Usage
support_compatible(kernel, start, support)
Arguments
kernel , start , support |
The kernel, start and support to check. |
Value
None.