Tweedie distributions are exponential dispersion models, with a mean \(\mu\) and a variance \(\phi \mu^\xi\), for some dispersion parameter \(\phi > 0\) and a variance-power index \(\xi\) (Tweedie 1984), that is sometimes called \(p\), that uniquely defines the distribution within the Tweedie family (for all real values of \(\xi\) not between 0 and 1).
Special cases of the Tweedie distributions are:
For all other values of \(\xi\), the probability functions and distribution functions have no closed forms.
| Variance power, \(\xi\) | Distribution | Support |
|---|---|---|
| \(\xi = 0\) | Normal | \((-\infty, \infty)\) |
| \(0 < \xi < 1\) | Do not exist | |
| \(\xi = 1\) (with \(\phi = 1)\) | Poisson | Discrete: \((0, 1, 2, \dots)\) |
| \(1 < \xi < 2\) | Poisson–gamma | \([ 0, \infty)\) |
| \(\xi = 2\) | Gamma | \((0, \infty)\) |
| \(\xi > 2\) | Skewed right | \((0, \infty\)) |
| \(\xi = 3\) | inverse Gaussian | \((0, \infty)\) |
For \(\xi < 1\), applications are limited (non-existent so far?), but have support on the entire real line and \(\mu > 0\).
For \(1 < \xi < 2\), Tweedie distributions can be represented as a Poisson sum of gamma distributions. These distributions are continuous for \(Y > 0\) but have a discrete mass at \(Y = 0\).
Likelihood computations are not need to fit a Tweedie generalized
linear model, using the tweedie() family from the statmod
package (Smyth et al. (2025), Dunn and Smyth (2018)):
library(tweedie)
library(statmod) # For the tweedie() family function, for use in glm()
set.seed(96)
N <- 25
# Mean of the Poisson (lambda) and Gamma (shape/scale)
lambda <- 1.5
# Generating Compound Poisson-Gamma data manually
y <- replicate(N, {
n_events <- rpois(1, lambda = lambda)
if (n_events == 0) 0 else sum(rgamma(n_events, shape = 2, scale = 1))
})
mod.tw <- glm(y ~ 1,
family = statmod::tweedie(var.power = 1.5, link.power = 0) )
# link.power = 0 means the log-linkHowever, likelihood computations are necessary in other situations. Since likelihoods cannot be computed analytically apart from special cases, the tweedie package computes Tweedie densities using numerical methods (Dunn and Smyth (2005), Dunn and Smyth (2008)).
Random numbers from a Tweedie distribution are generated using rtweedie():
Accurate density functions are generated using dtweedie():
y <- seq(0, 2, length = 100)
xi <- 1.1
mu <- 0.5
phi <- 0.4
twden <- tweedie::dtweedie(y, xi = xi, mu = mu, phi = phi)
twdtn <- tweedie::ptweedie(y, xi = xi, mu = mu, phi = phi)
plot( twden[y > 0] ~ y[y > 0],
xlab = expression(italic(y)),
ylab = "Density function",
type ="l",
lwd = 2,
las = 1)
points(twden[y==0] ~ y[y == 0],
lwd = 2,
pch = 19)Accurate probability functions are generated using ptweedie():
plot(twdtn ~ y,
xlab = expression(italic(y)),
ylab = "Distribution function",
type = "l",
las = 1,
lwd = 2,
las = 1,
ylim = c(0, 1) )
points(twdtn[y==0] ~ y[y == 0],
lwd = 2,
pch = 19)However, the function tweedie_plot() is sometimes more convenient (especially for \(1 < \xi \ < 2\), when a probability mass at \(Y = 0\) is present):
tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi,
ylab = "Density function",
xlab = expression(italic(y)),
las = 1,
lwd = 2)tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi,
ylab = "Distribution function",
xlab = expression(italic(y)),
las = 1,
lwd = 2,
ylim = c(0, 1),
type = "cdf")Quantile residuals (Dunn and Smyth 1996) can be computed to assess the fit of a glm. The function qresid() is in the statmod package (Smyth et al. (2025), Dunn and Smyth (2018)):
The quantile residuals suggest the model may not be adequate.
To use a Tweedie distribution in a glm, the value of \(\xi\) is needed. To find the most suitable value of \(\xi\), tweedie_profile() can be used (Dunn and Smyth 2018):
out <- tweedie::tweedie_profile(y ~ 1,
xi.vec = seq(1.2, 1.8, by = 0.05),
do.plot = TRUE)
#> ========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================Consider the poison data in the GLMsData
package (Dunn and Smyth 2022), which
records the survival times of animals under various treatments and
poisons (used in Dunn and Smyth (2018),
Sect. 12.4.2). For convenience, a copy is included with this
package:
poison <- read.csv(system.file("extdata", "poison.csv", package = "tweedie"))
# Convert to factors:
poison$Psn <- factor(poison$Psn)
poison$Trmt <- factor(poison$Trmt)
head(poison)
#> Psn Trmt Time
#> 1 I A 0.31
#> 2 I A 0.45
#> 3 I A 0.46
#> 4 I A 0.43
#> 5 I B 0.82
#> 6 I B 1.10The survival time is likely related to the poison type and type of treatment:
plot(Time ~ Psn,
data = poison,
xlab = "Poison type",
ylab = "Survival time (tens of hours)",
las = 1)plot(Time ~ Trmt,
data = poison,
xlab = "Treatment type",
ylab = "Survival time (tens of hours)",
las = 1)In both cases, the variance of the survival times seems to increase with the mean survival time. Perhaps a Tweedie glm would be appropriate. To fit a Tweedie glm, the value of \(\xi\) is needed; since no exact zeros are present (or possible) for the survival times, very likely \(\xi \ge 2\):
PSNPrf <- tweedie_profile(Time ~ Trmt + Psn, data = poison,
do.plot = TRUE, xi.vec = seq(2.5, 5.5, length = 11) )
#> ========================================================================================================================================================================================================================================================================================================================================================================================================================================================Then, \(\xi\approx 4\):
So then the Tweedie glm can be fitted:
PSNMod <- glm(Time ~ Trmt + Psn,
data = poison,
family=statmod::tweedie(var = 4, link.power = 0))
anova(PSNMod, test = "F")
#> Analysis of Deviance Table
#>
#> Model: Tweedie, link: mu^0
#>
#> Response: Time
#>
#> Terms added sequentially (first to last)
#>
#>
#> Df Deviance Resid. Df Resid. Dev F Pr(>F)
#> NULL 47 62.239
#> Trmt 3 19.620 44 42.619 24.623 2.365e-09 ***
#> Psn 2 32.221 42 10.398 60.657 4.119e-13 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1Quantile residuals can be used to assess the model:
The quantile residuals suggest that this model looks fine.