Compound Poisson–gamma distribution


#1

Hi,

I have zero-inflated continuous response data that I’d like to model using the compound Poisson–gamma distribution, which is a special case of the Tweedie distribution (https://en.wikipedia.org/wiki/Tweedie_distribution).

There’s an old forum post (https://groups.google.com/forum/#!topic/stan-users/XyFv7ERq0oA) on this topic, as well as a closed GitHub issue (https://github.com/stan-dev/math/issues/417) and linked Gist (https://gist.github.com/MatsuuraKentaro/952b3301686c10adcb13).

I am wondering if someone could clarify if and how one can fit such models with Stan.

Thanks!
Dan.


#2

I don’t know much about the Tweedie distribution and I haven’t read the links you posted, but I recently did a model for zero inflated continuous data from a chemical gas sensor with a detection limit. I’ll post some of that here.

I used the inverse logit function to map partial pressure being measured to the probability of the sensor not detecting any amount of the chemical. The function looked like this:

prob_{non-detect} = logit^{-1}\left(log(2.0/1.0e6) * {\hat P \over P_{det.lim.}} + log(1.0e6)\right)

Where \hat P is the partial pressure being measured and P_{det.lim.} is the partial pressure at which the sensor is twice as likely to measure zero than it is to make a non-zero measurement. The log(1.0e6) is the log-odds of making a zero measurement when \hat P = 0; I set the odds to 1 million to make the logistic curve steep.

Here are some plots for \hat P = 25, P_{det.lim.} = 25, and \sigma = 10, where \sigma is the standard deviation of the measurement error.

Here’s how I implemented the data likelihood in Stan. Notice how the zeroes in the data increment the log-posterior with the log probability of a non-detect, while the non-zeroes in the data increment the log-posterior with the log probability of a detection using log1m_inv_logit(...), as well as the log density of the noisy observation. It’s important to use the log probability in both places.

for (n in 1:N) for(m in 1:3) { // Trial, chemical species
  
  if (P_obs[n, m] == 0) { 
    // Zero measurement process
    target += 
      // Log-probability of zero measurement
      log_inv_logit( 
        log(2.0/1.0e6) * P_hat[n,m]/det_lim + 
        log(1.0e6) // Log-odds of a zero measurement at P_obs == 0
      );
    
  } else {                
    // Non-zero measurement process
    target += 
      // Log-probability of non-zero measurement
      log1m_inv_logit(
        log(2.0/1.0e6) * P_hat[n,m]/det_lim + 
        log(1.0e6) // Log-odds of a zero measurement at P_obs == 0
      );
      
    P_obs[n, m] ~ 
      // Noisy observation of a positive quantity 
      gamma(
        square(P_hat[n, m] / sigma), // Shape
        P_hat[n, m] / square(sigma)  // Inverse scale
      );
       
  }
  
}

And here’s how I implemented the Generated Quantities block:

real P_rep[N, 3];
int <lower=0, upper=1> non_detect;

for (n in 1:N) for (m in 1:3) { // Trial, chemical species
  // Probabilistically decide which measurement process to sample
  non_detect = 
    bernoulli_logit_rng(
      log(2.0/1.0e6) * P_hat[n,m]/det_lim + 
      log(1.0e6) // Log-odds of a zero measurement at P_obs == 0
     );
  
  if (non_detect == 1) {
    // Zero measurement process
    P_rep[n, m] = 0.0;
    
  } else {
    // Non-zero measurement process
    P_rep[n, m] = 
      gamma_rng(
          square(P_hat[n, m] / sigma), // Shape
          P_hat[n, m] / square(sigma)  // Inverse scale
      );
    
  }
  
}