# Compound Poisson–gamma distribution

Hi,

I have zero-inflated continuous response data that I’d like to model using the compound Poisson–gamma distribution, which is a special case of the Tweedie distribution (https://en.wikipedia.org/wiki/Tweedie_distribution).

There’s an old forum post (https://groups.google.com/forum/#!topic/stan-users/XyFv7ERq0oA) on this topic, as well as a closed GitHub issue (https://github.com/stan-dev/math/issues/417) and linked Gist (https://gist.github.com/MatsuuraKentaro/952b3301686c10adcb13).

I am wondering if someone could clarify if and how one can fit such models with Stan.

Thanks!
Dan.

I don’t know much about the Tweedie distribution and I haven’t read the links you posted, but I recently did a model for zero inflated continuous data from a chemical gas sensor with a detection limit. I’ll post some of that here.

I used the inverse logit function to map partial pressure being measured to the probability of the sensor not detecting any amount of the chemical. The function looked like this:

prob_{non-detect} = logit^{-1}\left(log(2.0/1.0e6) * {\hat P \over P_{det.lim.}} + log(1.0e6)\right)

Where \hat P is the partial pressure being measured and P_{det.lim.} is the partial pressure at which the sensor is twice as likely to measure zero than it is to make a non-zero measurement. The log(1.0e6) is the log-odds of making a zero measurement when \hat P = 0; I set the odds to 1 million to make the logistic curve steep.

Here are some plots for \hat P = 25, P_{det.lim.} = 25, and \sigma = 10, where \sigma is the standard deviation of the measurement error.

Here’s how I implemented the data likelihood in Stan. Notice how the zeroes in the data increment the log-posterior with the log probability of a non-detect, while the non-zeroes in the data increment the log-posterior with the log probability of a detection using log1m_inv_logit(...), as well as the log density of the noisy observation. It’s important to use the log probability in both places.

for (n in 1:N) for(m in 1:3) { // Trial, chemical species

if (P_obs[n, m] == 0) {
// Zero measurement process
target +=
// Log-probability of zero measurement
log_inv_logit(
log(2.0/1.0e6) * P_hat[n,m]/det_lim +
log(1.0e6) // Log-odds of a zero measurement at P_obs == 0
);

} else {
// Non-zero measurement process
target +=
// Log-probability of non-zero measurement
log1m_inv_logit(
log(2.0/1.0e6) * P_hat[n,m]/det_lim +
log(1.0e6) // Log-odds of a zero measurement at P_obs == 0
);

P_obs[n, m] ~
// Noisy observation of a positive quantity
gamma(
square(P_hat[n, m] / sigma), // Shape
P_hat[n, m] / square(sigma)  // Inverse scale
);

}

}


And here’s how I implemented the Generated Quantities block:

real P_rep[N, 3];
int <lower=0, upper=1> non_detect;

for (n in 1:N) for (m in 1:3) { // Trial, chemical species
// Probabilistically decide which measurement process to sample
non_detect =
bernoulli_logit_rng(
log(2.0/1.0e6) * P_hat[n,m]/det_lim +
log(1.0e6) // Log-odds of a zero measurement at P_obs == 0
);

if (non_detect == 1) {
// Zero measurement process
P_rep[n, m] = 0.0;

} else {
// Non-zero measurement process
P_rep[n, m] =
gamma_rng(
square(P_hat[n, m] / sigma), // Shape
P_hat[n, m] / square(sigma)  // Inverse scale
);

}

}

2 Likes

Hi! I was trying to model a Tweedie distribution in Stan and came across your post (and the links you posted). Was this issue solved? There seems to be something in the stan users mailing list but I don’t have access to it. Thank you!

It’s for sure possible.

The gist you’ve posted properly implements the likelihood of the response variable with no additional covariates. What remains to be specified in code is the link function, or relationship between predictors and the likelihood. I took a look at the SAS documentation but it was vague (mentioned the likelihood but no mention of how covariates are related to the likelihood’s parameters).

One could specify a linear link to model the relationship between tweeties distribution’s parameters of interest, for example the mean, dispersion or power parameters.

I think designing the model depends more on what you would like to know. If there’s a more concrete question we have, I could probably code something up.