While using a custom probability density function, I’m seeing different behaviour from Stan (cmdstanr interface on an M1 Mac) depending on whether I specify an integer term explicitly as a local variable declaration or simply use the value implicitly without storing it as a local variable. Specifically, since I always need the function in vectorised form, common normalisation terms (e.g., the scale parameter) need to be multiplied by the size of the input vector.
I’m starting here because I’m not sure whether this is a bug I should report on Github or whether this is actually desired behaviour. In general, of course, Stan sampling doesn’t work well with integers.
For quick background, I’m implementing a gamma-exponential distribution (a.k.a. a generalised Gumbel distribution) with a fixed mean of zero (see Gavin Crooks’s amazing Field Guide to Continuous Probability Distributions, Chapter 8):
where \psi represents the digamma function.
The implementation in Stan that does work is the following (taking advantage of the relationship to the gamma distribution to get – I hope! – smarter autodiff):
real gamma_exp_lpdf(vector x, real lambda, real alpha) {
int K = size(x);
return
-K * log(lambda)
- sum(x) / lambda
+ gamma_lpdf(exp(-x / lambda) | alpha, exp(digamma(alpha)));
}
There is nothing wrong with that, but because I only need K
once, I originally had the function written like this:
real gamma_exp_lpdf(vector x, real lambda, real alpha) {
return
-size(x) * log(lambda)
- sum(x) / lambda
+ gamma_lpdf(exp(-x / lambda) | alpha, exp(digamma(alpha)));
}
In principle, these functions should be identical, but in practice, the sampler goes bananas when using the second form, throwing all sorts of divergences and landing nowhere near the correct distribution even when there is nothing else in the model.
Should I have expected this? If so, is there a bit of the documentation that should have made that clear? If not, what would be the most helpful tag or keyword for submitting a bug?