this is not specifically a Stan issue, more a “What do smart Bayesians think about it?” type of question.

My research field (international trade/investment) is dominated by frequentists and the workhorse model is using the Poisson Quasi-ML estimator. Here is a website describing it.

More generally, I think the tenor is to fit a Poisson and the use robust standard errors. See this blog post for example (also check the comment by Gary King at the bottom).

What would be a clever Bayesian alternative to this approach? I started fiddling with the Gamma distribution and later found that Martyn Plummer (the JAGS creator if I’m not mistaken) also suggested this on the JAGS forum.

This one is more for my intuition:

Since Stan only needs the likelihood up to an additive constant, the Poisson likelihood could be fitted to non-integer data with a user-defined function. I tried it and using rstan::optimizing got the same results as the glm implementation for Poisson in R, which allows non-integer data. Since this is not a proper likelihood for non-integer data, how do I interpret the results using NUTS? Are they just non-sense, because the posterior is not “proper”?

I think this is terrifying. I remember someone who works at Stata did a FAQ post on this idea a few years back, but I can’t find it now. The gist of it is that a Poisson model with a log link does better on these gravity models than does a Gaussian model with a log link because zero is in the sample space in the case of the former but not the latter. And then you can comma robust the thing to overcome the fact that the original standard errors and everything that depends on them make absolutely no sense.

There is no Bayesian analogue to the applied frequentist idea of “let’s estimate a model that is wrong but is a consistent estimator and use then use those estimates to obtain residuals in order to consistently estimate the variance-covariance matrix of the estimates”. The closest thing is the Bayesian Bootstrap of a linear model, which amounts to repeatedly doing weighted least squares with weights that are random draws from an exponential distribution. But even if you do that, a Bayesian would not have a (correct) way to draw from the predictive distribution, and without that, there is not much that you can (correctly) say.

The (not all that) clever Bayesian alternative to this is a zero-inflated mixture model, where the probability distribution of the data factors into a probability of the outcome being non-zero and a conditional distribution given that it is non-zero. This is pretty much the same as zero-inflated Poisson or zero-inflated negative binomial but without the restriction that the data be integers.

Some of my colleagues actually mock me with “just use robust standard errors” because they know how much this annoys me. I think a lot of methodological papers (not all of course) in my field could be shortened substantially by putting “because STATA” somewhere.

So my best idea was also to use a Bernoulli part for the zeros and a Gamma or Lognormal for the non-zero values. I guess this is sort of what you described more generally, right? I just wanted to make sure that I didn’t miss something. I have some more questions, but I will post them later in different threats.

I’m curious whether you were able to get the “Bernoulli part for the zeros and a Gamma or Lognormal for the non-zero values” model working? I’ve been working on the same issue, but am now nervous to continue after reading this excerpt from the Stan User’s Guide:

“Zero inflation and hurdle models can be formulated for discrete distributions other than the Poisson. Zero inflation does not work for continuous distributions in Stan because of issues with derivatives; in particular, there is no way to add a point mass to a continuous distribution, such as zero-inflating a normal as a regression coefficient prior.”

It would give me motivation to continue if I knew you didn’t run into the derivative issue described.

You should be fine. The manual refers to the fact that you can not “inflate” one point of a continuous distribution, because each point has actually zero probability. So the manual implicitly argues against naïve regularization implementations (reg coefs with point mass at zero). In a Gamma hurdle or LogNormal hurdle model this is not a problem, because zero is not in the sample space. I think brms has implementations for those kind of hurdle models now, so you might want to check those out.

You can pm me if you have more questions, so we can let this thread rest in peace (although Bens answer is pure gold, haha).