Tweedie likelihood computations

Tweedie distributions

Tweedie distributions are exponential dispersion models, with a mean \(\mu\) and a variance \(\phi \mu^\xi\), for some dispersion parameter \(\phi > 0\) and a variance-power index \(\xi\) (Tweedie 1984), that is sometimes called \(p\), that uniquely defines the distribution within the Tweedie family (for all real values of \(\xi\) not between 0 and 1).

Special cases of the Tweedie distributions are:

the normal distribution, with \(\xi = 0\) (i.e., the variance is \(\phi\) and not related to the mean);
the Poisson distribution, with \(\xi = 1\) and \(\phi = 1\) (i.e., the variance is the same as the mean);
the gamma distribution, with \(\xi = 2\); and
the inverse Gaussian distribution, with \(\xi = 3\).

For all other values of \(\xi\), the probability functions and distribution functions have no closed forms.

Variance power, \(\xi\)	Distribution	Support
\(\xi = 0\)	Normal	\((-\infty, \infty)\)
\(0 < \xi < 1\)	Do not exist
\(\xi = 1\) (with \(\phi = 1)\)	Poisson	Discrete: \((0, 1, 2, \dots)\)
\(1 < \xi < 2\)	Poisson–gamma	\([ 0, \infty)\)
\(\xi = 2\)	Gamma	\((0, \infty)\)
\(\xi > 2\)	Skewed right	\((0, \infty\))
\(\xi = 3\)	inverse Gaussian	\((0, \infty)\)

For \(\xi < 1\), applications are limited (non-existent so far?), but have support on the entire real line and \(\mu > 0\).

For \(1 < \xi < 2\), Tweedie distributions can be represented as a Poisson sum of gamma distributions. These distributions are continuous for \(Y > 0\) but have a discrete mass at \(Y = 0\).

Use cases

Likelihood computations are not need to fit a Tweedie generalized linear model, using the tweedie() family from the statmod package (Smyth et al. (2025), Dunn and Smyth (2018)):

library(tweedie)
library(statmod)   # For the tweedie() family function, for use in glm()

set.seed(96)
N <- 25
# Mean of the Poisson (lambda) and Gamma (shape/scale)
lambda <- 1.5
# Generating Compound Poisson-Gamma data manually
y <- replicate(N, {
  n_events <- rpois(1, lambda = lambda)
  if (n_events == 0) 0 else sum(rgamma(n_events, shape = 2, scale = 1))
})

mod.tw <- glm(y ~ 1, 
              family = statmod::tweedie(var.power = 1.5, link.power = 0) )
              # link.power = 0  means the log-link

However, likelihood computations are necessary in other situations. Since likelihoods cannot be computed analytically apart from special cases, the tweedie package computes Tweedie densities using numerical methods (Dunn and Smyth (2005), Dunn and Smyth (2008)).

Generating random numbers

Random numbers from a Tweedie distribution are generated using rtweedie():

tweedie::rtweedie(10, xi = 1.1, mu = 2, phi = 1)
#>  [1] 1.536587 2.520597 4.624115 1.999447 2.532301 4.259932 1.725366 0.000000
#>  [9] 0.000000 2.545023

Plotting density and probability functions

Accurate density functions are generated using dtweedie():

y <- seq(0, 2, length = 100)
xi <- 1.1
mu <- 0.5
phi <- 0.4

twden <- tweedie::dtweedie(y, xi = xi, mu = mu, phi = phi)  
twdtn <- tweedie::ptweedie(y, xi = xi, mu = mu, phi = phi)

plot( twden[y > 0] ~ y[y > 0], 
      xlab = expression(italic(y)),
      ylab = "Density function",
      type ="l",
      lwd = 2,
      las = 1)
points(twden[y==0] ~ y[y == 0],
      lwd = 2,
      pch = 19)

Accurate probability functions are generated using ptweedie():

plot(twdtn ~ y,
     xlab = expression(italic(y)),
     ylab = "Distribution function",
     type = "l",
     las = 1,
     lwd = 2,
     las = 1,
     ylim = c(0, 1) )
points(twdtn[y==0] ~ y[y == 0],
      lwd = 2,
      pch = 19)

However, the function tweedie_plot() is sometimes more convenient (especially for \(1 < \xi \ < 2\), when a probability mass at \(Y = 0\) is present):

tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi,
                      ylab = "Density function",
                      xlab = expression(italic(y)),
                      las = 1,
                      lwd = 2)

tweedie::tweedie_plot(y, xi = xi, mu = mu, phi = phi, 
                      ylab = "Distribution function",
                      xlab = expression(italic(y)),
                      las = 1, 
                      lwd = 2, 
                      ylim = c(0, 1), 
                      type = "cdf")

Computing the quantile residuals

Quantile residuals (Dunn and Smyth 1996) can be computed to assess the fit of a glm. The function qresid() is in the statmod package (Smyth et al. (2025), Dunn and Smyth (2018)):

library(tweedie)

qqnorm( qr <- statmod::qresid(mod.tw),
        las = 1)
qqline(qr,
       col = "grey")

The quantile residuals suggest the model may not be adequate.

Estimating \(\xi\)

To use a Tweedie distribution in a glm, the value of \(\xi\) is needed. To find the most suitable value of \(\xi\), tweedie_profile() can be used (Dunn and Smyth 2018):

out <- tweedie::tweedie_profile(y ~ 1, 
                                xi.vec = seq(1.2, 1.8, by = 0.05), 
                                do.plot = TRUE)
#> ========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================


# The estimated power index:
out$xi.max
#> [1] 1.432653

Example

Consider the poison data in the GLMsData package (Dunn and Smyth 2022), which records the survival times of animals under various treatments and poisons (used in Dunn and Smyth (2018), Sect. 12.4.2). For convenience, a copy is included with this package:

poison <- read.csv(system.file("extdata", "poison.csv", package = "tweedie"))

# Convert to factors:
poison$Psn <- factor(poison$Psn)
poison$Trmt <- factor(poison$Trmt)

head(poison)
#>   Psn Trmt Time
#> 1   I    A 0.31
#> 2   I    A 0.45
#> 3   I    A 0.46
#> 4   I    A 0.43
#> 5   I    B 0.82
#> 6   I    B 1.10

The survival time is likely related to the poison type and type of treatment:

plot(Time ~ Psn,
     data = poison,
     xlab = "Poison type",
     ylab = "Survival time (tens of hours)",
     las = 1)

plot(Time ~ Trmt,
     data = poison,
     xlab = "Treatment type",
     ylab = "Survival time (tens of hours)",
     las = 1)

In both cases, the variance of the survival times seems to increase with the mean survival time. Perhaps a Tweedie glm would be appropriate. To fit a Tweedie glm, the value of \(\xi\) is needed; since no exact zeros are present (or possible) for the survival times, very likely \(\xi \ge 2\):

PSNPrf <- tweedie_profile(Time ~ Trmt + Psn, data = poison, 
                          do.plot = TRUE, xi.vec = seq(2.5, 5.5, length = 11) )
#> ========================================================================================================================================================================================================================================================================================================================================================================================================================================================

Then, \(\xi\approx 4\):

PSNPrf$xi.max
#> [1] 4.091837

So then the Tweedie glm can be fitted:

PSNMod <- glm(Time ~ Trmt + Psn, 
              data = poison,
              family=statmod::tweedie(var = 4, link.power = 0))
anova(PSNMod, test = "F")
#> Analysis of Deviance Table
#> 
#> Model: Tweedie, link: mu^0
#> 
#> Response: Time
#> 
#> Terms added sequentially (first to last)
#> 
#> 
#>      Df Deviance Resid. Df Resid. Dev      F    Pr(>F)    
#> NULL                    47     62.239                     
#> Trmt  3   19.620        44     42.619 24.623 2.365e-09 ***
#> Psn   2   32.221        42     10.398 60.657 4.119e-13 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Quantile residuals can be used to assess the model:

qr <- statmod::qresid(PSNMod)
qqnorm(qr,
       las = 1)
qqline(qr,
       col = "grey")

The quantile residuals suggest that this model looks fine.

References

Dunn, Peter K., and Gordon K. Smyth. 1996. “Randomized Quantile Residuals.” Journal of Computational and Graphical Statistics 5 (3): 236–44.

———. 2005. “Series Evaluation of Tweedie Exponential Dispersion Model Densities.” Statistics and Computing 15 (4): 267–80.

———. 2008. “Evaluation of Tweedie Exponential Dispersion Model Densities by Fourier Inversion.” Statistics and Computing 18 (1): 73–86.

———. 2018. Generalized Linear Models with Examples in R. Springer.

———. 2022. GLMsData: Generalized Linear Model Data Sets. https://doi.org/10.32614/CRAN.package.GLMsData.

Smyth, Gordon K., Lizhong Chen, Yifang Hu, Peter K. Dunn, Belinda Phipson, and Yunshun Chen. 2025. “Statmod: Statistical Modeling.”

Tweedie, M. C. K. 1984. “An Index Which Distinguishes Between Some Important Exponential Families.” Statistics: Applications and New Directions, 579–604.