Tweedie Likelihood (compound Poisson-gamma) in Stan

I’ve been trying to fit a Tweedie likelihood model in Stan that allows for inflated zeroes. In case it is useful to anyone, I have reproduced what I have done here. I think the efficiency of the code could probably be improved, but otherwise it appears to work.

I have implemented the R “tweedie” package functions (Dunn 2017) which work, but are quite slow. The strategy is to split the data into 3 groups: i) y==0, ii) low y (dependent variable) values for which the series sum can be used and iii) higher y values which use a inverse Fourier approximation. This just follows what the R “tweedie” package density function does. The Tweedie density is parameterised with a mean (mu), dispersion (phi) and index ( p ) parameters. My approach has been to allow the index parameter p to be fitted. If this parameter is fixed, the function could be made much more efficient, but I was interested in taking into account the uncertainty associated with this parameter. The parameter controls the form of the function, so calculation methods also vary dependent on this parameter’s range. I don’t think this will be a problem in practice as the estimate of p should not be varying enormously, although there could be a problem if close to a boundary for the inverse Fourier method. In this latter case, a constant matrix grid is supplied to the function and you need to know the range of p to use the correct one. Switching between grids could affect the differentiability of the function at the boundaries. In my case, I bounded p within the appropriate range for 1.5-2.0 and this worked fine as p did not move outside 1.65-1.7. Parameters p and phi seemed well-behaved in the MCMC.

There are three log probability density functions for each of: zero y values, the series approximation and the inverse Fourier approximation.

  real cpgz_lpdf(vector lmu, real phi, real p) {
    //Compound Poisson-gamma (Tweedie) log density for y==0
    //Returns the sum of log-likelihood for zeroes based on Tweedie 
    //  parameters exp(lmu), phi and p
    int  N  = rows(lmu);
    real p2 = 2.0 - p;
    real lscale = 1.0/(phi * p2);
    vector[N] lambda = exp(lmu*p2)*lscale;
    return -sum(lambda);
  }  //cpgz_lpdf

real cpgs_lpdf(vector ly, vector lmu, real phi, real p, vector jlohi, vector lgam_jlohi) {
    //Returns compound Poisson-gamma (Tweedie) log-density for series calculation y>0
    //ly is the log dependent variable vector. Some values are pre-calculated for speed:
    //  jlohi is the vector sequence of integers Nmin:Nmax for the series sum 
    //  and lgam_jlohi=lgamma(jlohi + 1).
    //Only works for 1 < p < 2: No check is made that p is in range
    int  N  = rows(ly);
    int  Nj = rows(jlohi);
    real p1 = 1.0 - p;
    real p2 = 2.0 - p;
    real a = (2.0 - p)/(1.0 - p);
    real a1 = 1.0 - a;
    real r_p1phi = 1.0/(p1 * phi);
    real r_p2phi = 1.0/(p2 * phi);
    real rconst = a * log(-p1) - a1 * log(phi) - log(p2);
    vector[N]  r = -a * ly + rconst;
    vector[N]  lambda = r_p2phi * exp(p2*lmu);
    vector[N]  r_tau  = r_p1phi * exp(ly + p1*lmu);
    vector[N]  logw;
    vector[Nj] lGj = lgam_jlohi + lgamma(-a * jlohi);

    for (i in 1:N) 
      logw[i] = log_sum_exp(r[i]*jlohi - lGj);

    return sum(r_tau - lambda - ly + logw);
  }  //cpgs_lpdf
 
  real cpgf_lpdf(vector ly, vector lmu, real phi, real p, matrix grid) {
    //Compound Poisson Gamma (Tweedie) log density using  
    //Fourier Inversion method of Dunn and Smythe 2008 for y>0
    // 1.5 < p < 2.0 - dependent on pre-calculated grid matrix. 
    //   Different grids cover different ranges of p. No check is made that p is in range
    //ly and lmu are log values
    int   N = rows(ly);
    int   nx = rows(grid);  //nx=26
    int   np = cols(grid);   //np=16
    real  ln_sqrt_2pi = 0.5*log(2 * pi());  //constant could be dropped
    real  p1 = 1 - p;
    real  p2 = 2 - p;
    real  pmid = (1.5 + 2.0)*0.5;
    real  prange = 2.0 - 1.5;
    real  xixrange = 0.9;
    real  xixmid   = 0.9*0.5;
    vector[nx] jt  = cumulative_sum(rep_vector(1, nx)) - 1; //0:nx-1
    vector[nx] ts  = 0.5*cos((2 * jt + 1)/(2.0 * nx ) * pi())* xixrange + xixmid;
    vector[np] jp  = cumulative_sum(rep_vector(1, np)) - 1; //0:np-1
    vector[np] ps  = 0.5*cos((2 * jp + 1)/(2.0 * np ) * pi()) * prange + pmid;
    vector[nx] dd1;
    vector[N]  lxi = -ly * p2 + log(phi);
    vector[N]  xi = exp(lxi);
    vector[N]  xix = xi ./ (1.0 + xi);
    vector[N]  rho;
    vector[N]  dev;
    vector[N]  ll;

    for (i in 1:nx) {
      dd1[i] = grid[i, np];
      for (j in 1:(np-1)) {
        int k = np-j;
        dd1[i] = dd1[i] * (p - ps[k]) + grid[i, k];
      }
    }

    rho = rep_vector(dd1[nx], N);
    for (i in 1:(nx-1)) {
      int k = nx-i;
      rho = rho .* (xix - ts[k]) + dd1[k];
    }

    dev = exp(ly*p2) - p2*exp(ly + lmu*p1) + p1*exp(lmu*p2);  //deviance
    ll = -dev/(p1 * p2 * phi) - ly - ln_sqrt_2pi - 0.5*lxi + log(rho); 
    return sum(ll);
  }  //cpgf_lpdf

The dependent variable y is used to determine which method is used. It is index-sorted to provide effectively three vectors, one for each function. Choosing between the series sum and inverse Fourier is a little more complicated as the break-point is determined by phi and p, so a function is used to calculate that point and adjust where the switch occurs each iteration.

  real ly_series_break(real phi, real p) {
    //returns log(y) benchmark for switching between series and Fourier inversion methods
    return -log(0.9/(0.1*phi))/(2.0-p);
  }  //ly_series_break

Finally, whereas the number of calculations in the inverse Fourier estimate of the density is fixed, the series sum terms vary dependent on the precision required. In my case, the lower bound for the series sum was always 1, so I didn’t bother calculating it. This is the function for the upper bound which was calculated once on each interation:

  int calc_jhi(real phi, real p)  {
    //Calculates the upper bound of series terms to reach the required precision 
    //Here the reference level to switch y=(0.9/(0.1*phi))^(-1/(2-p)) relevant for 1.5 < p < 2.0 
    //  as used by Dunn 2017: Code ported from R tweedie package  
    real drop = 35;   //Required precision (log)
    real p2 = (2 - p);
    real a = p2/(1 - p);
    real a1 = 1 - a;
    real ly = -log(9/phi)/p2;
    real lp2 = log(p2);
    real j_max =  1 / (9*p2);
    real cc = a * (lp2-ly) - a1 * log(phi) - lp2 + a1;
    real wmax = a1 * j_max - drop;
    real estlogw = a1 * j_max;
    real rj = 1;
    int  j = 1;

    while (estlogw > wmax) {
      j += 2;
      rj += 2;
      estlogw = rj * (cc - a1 * log(rj));
    }
    return j;
  }  //calc_jhi

The link (hopefully) provides a simple RStudio project with a fixed mu, demonstrating a practical example:

Tweedie Example Project

Any comments, corrections and improvements most welcome.

References etc.
Dunn, P. K. (2017). Tweedie: Evaluation of Tweedie exponential family models. R package version 2.3.
which I believe is mostly based on:
Dunn, P. K. and Smyth, G. K. (2005). Series evaluation of Tweedie exponential dispersion models. Statistics and Computing, 15(4), 267-280.
Dunn, P. K. and Smyth, G. K. (2008). Evaluation of Tweedie exponential dispersion models using Fourier inversion. Statistics and Computing, 18(1), 73-86.
Thanks also for some early pointers from Bob Carpenter.

7 Likes

Thank you so much for the example. What should I do if I want to relax the bound for p to be between 1 and 2?

Hi. It gets messy. If you check the dtweedie package code in R, you can see 1-1.1 uses the series method only. Then you need separate grids within each range 1.1, 1.2, 1.3, 1.4, 1.5, 2.0. I only need the last range fortunately.

The problem with switching between grids is that there may be a discontinuity which may cause the autodiff to mess up as the estimate takes small steps across a boundary - but it may not as the interpolation functions are very similar. I would be interested to know whether it works. I would think that you need to add the grids as arrays and then add Stan code to switch between them dependent on the p parameter. To save you some time (with no guarantees!) the link is for all the grids in the R tweedie package in a function:

Stored grids in R

1 Like

Hi Paul,

Thanks for supplying your code and a working example, it has been very helpful! How would you generate a function of the equivalent of *_rng for the generated quantities section? It would be great to use to assess residuals and conduct posterior predictive checks.

Thanks!

I followed the implementation at Tweedie distribution in Stan · GitHub for powers \in [1, 2]. Which is just summing over the poisson and gamma parts. I followed the rng implementation in the tweedie R package.

functions {
  int num_non_zero_fun(array[] real y) {
    int A = 0;
    int N = num_elements(y);
    
    for (n in 1 : N) {
      if (y[n] != 0) {
        A += 1;
      }
    }
    return A;
  }
  
  array[] int non_zero_index_fun(array[] real y, int A) {
    int N = num_elements(y);
    array[A] int non_zero_index;
    int counter = 0;
    for (n in 1 : N) {
      if (y[n] != 0) {
        counter += 1;
        non_zero_index[counter] = n;
      }
    }
    return non_zero_index;
  }
  
  array[] int zero_index_fun(array[] real y, int Z) {
    int N = num_elements(y);
    array[Z] int zero_index;
    int counter = 0;
    for (n in 1 : N) {
      if (y[n] == 0) {
        counter += 1;
        zero_index[counter] = n;
      }
    }
    return zero_index;
  }
  
  void check_tweedie(real mu, real phi, real theta) {
    if (mu < 0) {
      reject("mu must be >= 0; found mu =", mu);
    }
    if (phi < 0) {
      reject("phi must be >= 0; found phi =", phi);
    }
    if (theta < 1 || theta > 2) {
      reject("theta must be in [1, 2]; found theta =", theta);
    }
  }
  
  void check_tweedie(vector mu, real phi, real theta) {
    int N = num_elements(mu);
    if (phi < 0) {
      reject("phi must be >= 0; found phi =", phi);
    }
    if (theta < 1 || theta > 2) {
      reject("theta must be in [1, 2]; found theta =", theta);
    }
    
    for (n in 1 : N) {
      if (mu[n] < 0) {
        reject("mu must be >= 0; found mu =", mu[n], "on element", n);
      }
    }
  }
  
  real tweedie_lpdf(array[] real y, array[] int non_zero_index, int M, real mu, real phi, real theta) {
    check_tweedie(mu, phi, theta);
    int N = num_elements(y);
    int A = num_elements(non_zero_index);
    int NmA = N - A;
    real lambda = 1 / phi * mu ^ (2 - theta) / (2 - theta);
    real alpha = (2 - theta) / (theta - 1);
    real beta = 1 / phi * mu ^ (1 - theta) / (theta - 1);
    real lp = -NmA * lambda;
    
    for (n in 1 : A) {
      vector[M] ps;
      for (m in 1 : M) {
        ps[m] = poisson_lpmf(m | lambda) + gamma_lpdf(y[non_zero_index[n]] | m * alpha, beta);
      }
      lp += log_sum_exp(ps);
    }
    return lp;
  }
  
  // vector version for mu is untested
  real tweedie_lpdf(array[] real y, array[] int non_zero_index, array[] int zero_index, int M, vector mu, real phi, real theta) {
    check_tweedie(mu, phi, theta);
    int N = num_elements(y);
    int A = num_elements(non_zero_index);
    int NmA = N - A;
    vector[N] lambda = 1 / phi * mu ^ (2 - theta) / (2 - theta);
    real alpha = (2 - theta) / (theta - 1);
    vector[N] beta = 1 / phi * mu ^ (1 - theta) / (theta - 1);
    real lp = -sum(lambda[zero_index]);
    
    for (n in 1 : A) {
      vector[M] ps;
      for (m in 1 : M) {
        ps[m] = poisson_lpmf(m | lambda[n]) + gamma_lpdf(y[non_zero_index[n]] | m * alpha, beta[n]);
      }
      lp += log_sum_exp(ps);
    }
    return lp;
  }
  
  real tweedie_rng(real mu, real phi, real theta) {
    check_tweedie(mu, phi, theta);
    
    real lambda = 1 / phi * mu ^ (2 - theta) / (2 - theta);
    real alpha = (2 - theta) / (theta - 1);
    real beta = 1 / phi * mu ^ (1 - theta) / (theta - 1);
    
    int N = poisson_rng(lambda);
    real tweedie_val;
    
    if (theta == 1) {
      return phi * poisson_rng(mu / phi);
    }
    if (theta == 2) {
      return gamma_rng(1 / phi, beta);
    }
    if (N * alpha == 0) {
      return 0.;
    }
    
    return gamma_rng(N * alpha, beta);
  }
}
data {
  int N;
  int M;
  array[N] real<lower=0> Y;
}
transformed data {
  int R = 20;
  int N_non_zero = num_non_zero_fun(Y);
  int N_zero = N - N_non_zero;
  array[N_zero] int zero_index = zero_index_fun(Y, N_zero);
  array[N_non_zero] int non_zero_index = non_zero_index_fun(Y, N_non_zero);
}
parameters {
  real<lower=0> mu;
  real<lower=0> phi;
  real<lower=1, upper=2> theta;
}
model {
  mu ~ cauchy(0, 5);
  phi ~ cauchy(0, 5);
  
  Y ~ tweedie(non_zero_index, M, mu, phi, theta);
  // mu is a vector then
  // Y ~ tweedie(non_zero_index, zero_index, M, mu, phi, theta);

}
generated quantities {
  vector[R] r_tweedie;
  // below has mu as a real
  // if mu is a vector you need to
  // generate values for each mu
  for (r in 1 : R) {
    r_tweedie[r] = tweedie_rng(mu, phi, theta);
  }
}

4 Likes

Hi, spinkey’s suggestion below should work OK for theta ∈ [1,2], just generate the poisson and gamma separately. It should be efficient. For powers greater than 2, look at the rtweedie function in the R tweedie package. They use quantile function which may be the best and only way.

@spinkney , how do you pick M for the _lpdf ? Is the answer in any of the previous posts? Don’t hesitate to point that out if it is.

1 Like

It’s in the comments of the gist and not obvious. There’s a paper linked in their mentions after 30 the contribution to logsumexp is so small as to be insignificant.

2 Likes

Hi @spinkney ,

Thanks very much for sharing this code, and apologies if I’m late in reigniting this thread. I am not well versed in Stan code, but I took a stab at converting it brms-compatible Stan code via brms::custom_family. It seems to work, but I would really appreciate anyone’s feedback on how to improve this. Full R code at the end of this post.

A few comments:

  • I had to use the vectorised version of tweedie_lpdf which seems to work fine. That was because brms declares Y as a real in the data block, and not an array. That also made me modify some of the functions declarations in the original Stan code.
  • I originally tried to make M a free parameter, but I think Stan does not allow integer parameters. I tried creating an m real parameter, and then creating a transformed integer M in the transformed parameters block, but the Stan round function only returns real values, and not integers. So, in the end I decided the only way I could include M was via a transformed data parameter (see the argument var in brms::custom_family) which has to be informed by the user via stanvars in the brm call, i.e., it needs to be fixed a priori like in your original code above.
  • I renamed theta to mtheta because the former is a special type of parameter in brms.
  • The post processing functions in R (tweedie_lpdf, log_lik_tweedie, tweedie_rng, posterior_predict_tweedie, posterior_epred_tweedie) mostly rely on the R package tweedie which covers different cases of \theta outside the [1,2] range, and they are not vectorised. So, the computations are slow, particularly for posterior_predict and for adding loo to the brmsfit object which depends on the custom tweedie_lpdf(which in turn depends on tweedie::dtweedie). Any ideas on how to speed up computation time there would be great, which leads me to my next point.
  • For this particular subset where \theta \in [1,2], perhaps it would be nice to have all of the above functions written in Stan which could then be exposed in R. When I tried to do this following brms vignette about setting up custom families, I unfortunately got the following error:
> brms::expose_functions(fit, vectorize = TRUE)
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
 error in 'modelf1a433ebdecb_User_defined_functions' at line 14, column 2
  -------------------------------------------------
    12:   }
    13:   
    14:   array[] int non_zero_index_fun(vector y, int A) {
         ^
    15:     int N = num_elements(y);
  -------------------------------------------------

PARSER EXPECTED: "}"
Error in stanc(model_code = mc, model_name = "User-defined functions",  : 
  failed to parse Stan model 'User-defined functions' due to the above error.

and that is why I decided that for now it was better to just rely on existing R code from the tweedie package despite the computation limitations raised above.

Full code in R

library(brms)
library(tidyverse)
library(tweedie)
library(patchwork)

tweedie <- brms::custom_family(
  "tweedie", dpars = c("mu", "phi", "mtheta"),
  links = rep("identity", 3), lb = c(0, 0, 1),
  ub = c(NA, NA, 2), type = "real", loop = FALSE, var = "M"
)

stan_funs <- "
  int num_non_zero_fun(vector y) {
    int A = 0;
    int N = num_elements(y);
    for (n in 1 : N) {
      if (y[n] != 0) {
        A += 1;
      }
    }
    return A;
  }
  
  array[] int non_zero_index_fun(vector y, int A) {
    int N = num_elements(y);
    array[A] int non_zero_index;
    int counter = 0;
    for (n in 1 : N) {
      if (y[n] != 0) {
        counter += 1;
        non_zero_index[counter] = n;
      }
    }
    return non_zero_index;
  }
  
  array[] int zero_index_fun(vector y, int Z) {
    int N = num_elements(y);
    array[Z] int zero_index;
    int counter = 0;
    for (n in 1 : N) {
      if (y[n] == 0) {
        counter += 1;
        zero_index[counter] = n;
      }
    }
    return zero_index;
  }
  
  void check_tweedie(real mu, real phi, real mtheta) {
    if (mu < 0) {
      reject(\"mu must be >= 0; found mu =\", mu);
    }
    if (phi < 0) {
      reject(\"phi must be >= 0; found phi =\", phi);
    }
    if (mtheta < 1 || mtheta > 2) {
      reject(\"mtheta must be in [1, 2]; found mtheta =\", mtheta);
    }
  }
  
  void check_tweedie(vector mu, real phi, real mtheta) {
    int N = num_elements(mu);
    if (phi < 0) {
      reject(\"phi must be >= 0; found phi =\", phi);
    }
    if (mtheta < 1 || mtheta > 2) {
      reject(\"mtheta must be in [1, 2]; found mtheta =\", mtheta);
    }
    for (n in 1 : N) {
      if (mu[n] < 0) {
        reject(\"mu must be >= 0; found mu =\", mu[n], \"on element\", n);
      }
    }
  }
  
  real tweedie_lpdf(vector y, vector mu, real phi, real mtheta, int M) {
    check_tweedie(mu, phi, mtheta);
    int N = num_elements(y);
    int N_non_zero = num_non_zero_fun(y);
    int N_zero = N - N_non_zero;
    array[N_zero] int zero_index = zero_index_fun(y, N_zero);
    array[N_non_zero] int non_zero_index = non_zero_index_fun(y, N_non_zero);
    int A = num_elements(non_zero_index);
    int NmA = N - A;
    vector[N] lambda = 1 / phi * mu ^ (2 - mtheta) / (2 - mtheta);
    real alpha = (2 - mtheta) / (mtheta - 1);
    vector[N] beta = 1 / phi * mu ^ (1 - mtheta) / (mtheta - 1);
    real lp = -sum(lambda[zero_index]);
    for (n in 1 : A) {
      vector[M] ps;
      for (m in 1 : M) {
        ps[m] = poisson_lpmf(m | lambda[n]) + gamma_lpdf(y[non_zero_index[n]] | m * alpha, beta[n]);
      }
      lp += log_sum_exp(ps);
    }
    return lp;
  }
  
  real tweedie_rng(real mu, real phi, real mtheta) {
    check_tweedie(mu, phi, mtheta);
    real lambda = 1 / phi * mu ^ (2 - mtheta) / (2 - mtheta);
    real alpha = (2 - mtheta) / (mtheta - 1);
    real beta = 1 / phi * mu ^ (1 - mtheta) / (mtheta - 1);
    int N = poisson_rng(lambda);
    real tweedie_val;
    if (mtheta == 1) {
      return phi * poisson_rng(mu / phi);
    }
    if (mtheta == 2) {
      return gamma_rng(1 / phi, beta);
    }
    if (N * alpha == 0) {
      return 0.;
    }
    return gamma_rng(N * alpha, beta);
  }
"

gen_qt <- "
  vector[N] mu = rep_vector(0.0, N);
  vector[N] r_tweedie = rep_vector(0.0, N);
  // generate values for each mu
  mu += Intercept + Xc * b;
  for (n in 1 : N) {
    r_tweedie[n] = tweedie_rng(mu[n], phi, mtheta);
  }
"

tweedie_lpdf <- function(y, mu, phi, mtheta) {
  out <- numeric(length = length(mu))
  for (i in seq_along(out)) {
    out[i] <- tweedie::dtweedie(
      y = y, mu = mu[i], phi = phi[i], power = mtheta[i]
    )
  }
  out
}

log_lik_tweedie <- function(i, prep) {
  mu <- prep$dpars$mu[, i]
  phi <- prep$dpars$phi
  mtheta <- prep$dpars$mtheta
  y <- prep$data$Y[i]
  tweedie_lpdf(y, mu, phi, mtheta)
}

tweedie_rng <- function(mu, phi, mtheta) {
  out <- numeric(length = length(mu))
  for (i in seq_along(out)) {
    out[i] <- tweedie::rtweedie(
      n = 1, mu = mu[i], power = mtheta[i], phi = phi[i]
    )
  }
  out
}

posterior_predict_tweedie <- function(i, prep, ...) {
  mu <- prep$dpars$mu[, i]
  phi <- prep$dpars$phi
  mtheta <- prep$dpars$mtheta
  tweedie_rng(mu, phi, mtheta)
}

posterior_epred_tweedie <- function(prep) {
  prep$dpars$mu
}

set.seed(10)
testdf <- data.frame(x = rnorm(100)) |>
  dplyr::mutate(
    mu = 15 + x * 5,
    y = tweedie::rtweedie(n(), mu = mu, power = 1.5, phi = 0.1)
  )
plot(y ~ x, testdf)

options(mc.cores = parallel::detectCores())
stanvars <- brms::stanvar(scode = stan_funs, block = "functions") +
  brms::stanvar(30L, "M", scode = "int<lower=1> M;", block = "data") +
  brms::stanvar(scode = gen_qt, block = "genquant")
fit <- brms::brm(
  y ~ x, data = testdf, family = tweedie, stanvars = stanvars,
  backend = "cmdstanr"
)

r2fit <- brms::bayes_R2(fit)
brms::conditional_effects(fit) |>
  plot(points = TRUE)

brms::conditional_effects(fit, method = "posterior_predict") |>
  plot(points = TRUE)

fit <- add_criterion(fit, "loo")
ppa <- brms::pp_check(fit, type = "dens_overlay", ndraws = 1e3)
ppb <- brms::pp_check(fit, type = "scatter_avg", ndraws = 1e3)
ppa + ppb

# similar plot to above, but using posterior_predict equivalent
#  generated from within the stan code.
brms::as_draws_df(fit) |>
  dplyr::select(starts_with("r_tweedie")) |>
  colMeans() |>
  (\(x, y)data.frame(pred = x, obs = y))(y = testdf$y) |>
  (\(x)ggplot(data = x))() +
    geom_point(mapping = aes(x = pred, y = obs)) +
    geom_abline(linetype = 2) +
    labs(x = "Mean prediction", y = "Observed")

# check range of distributional parameters
range_phi <- brms::as_draws_df(fit) |>
  dplyr::pull(phi) |>
  range()
range_mtheta <- brms::as_draws_df(fit) |>
  dplyr::pull(mtheta) |>
  range()
5 Likes