Skellam distribution: overflow in modified_bessel_first_kind


I read Ben Goodrich’s post from last summer on stan-dev about reimplementing the modified bessel function of the first kind (even more specific: its logarithm).
Ben was curious in what statistical contexts this Bessel function comes up.

Well …
Last week I started fitting Stan models with the Skellam distribution, and this needs …. The modified Bessel of the first kind!

The Skellam distribution is the distribution for the difference of two independent Poisson distributed variables.

See e.g. here

It appears well suited for modelling football score differences:

During fitting I ran into the overflow in boost::math::cyl_bessel_i problem.
This was reported on the Stan users list before, e.g. in august 2016 by Mike Lawrence for the von mises dist.

It appears only at the start of a new chain (but not always), the chains that get past this point behave as expected.

For now, I hard coded a ceiling at x = 700:

if (x < 700)
log_prob = total + log(modified_bessel_first_kind(k, x));
log_prob = total + log(modified_bessel_first_kind(k, 700));

This doesn’t feel right, but it seems to work.
I’m a bit unsure how to proceed, and would very much appreciate some advice.


I would try setting init_r to be some value that is less than its default of 2.0.

But, yes we need to merge log_modified_bessel_first_kind. I know the derivatives but was a bit unsure how best to deal with them.

Do scores themselves look like they’re Poisson? I’d think they’d be overdispersed to be modeled well as Poisson.

Rather than switching the program density the way you did, you could also constrain the value of x. Or try initializing less broadly as @bgoodri suggests.

@bgoodri : Setting init_r to 1 appears to solve my problem. I was not aware of this option, thanks.

@Bob_Carpenter : I don’t know about the scores themselves. However, Karlis and Ntzoufras have a paper where they show that the distribution of the predicted goal differences looks very similar to the actual observed goal difference distribution. (They also have a zero inflated model that can capture excess draws, btw). The paper is called “Bayesian modelling of football outcomes: using the Skellam’s distribution for the goal difference”.

Both: thanks for your help!

The paper’s paywalled. I found a handout:

and it looks like they predict too many ties with the model. This looks like their original paper:

Karlis D. and Ntzoufras, I. (2006). Bayesian analysis of the differences of count data. Statistics in Medicine, 25, 1885-1905.

1 Like

What exactly is our policy about having functions in Stan Math that are only defined for double and / or int? AFAIK, the only place that the logarithm of the Bessel I function is used in statistics is in the Von Mises - Fisher distribution. In Stan, that is implemented in a numerically antistable fashion but with operands_and_partials

Thus, to fix the only case we know of where this is a problem, we just need a numerically stable log_modified_bessel_first_kind function (which I have already implemented) whose two arguments are doubles and we only ever need the derivative with respect to kappa_vec[i], which is computed as another Bessel function.

However, if we are going to say that a function in Stan Math must be implemented when its arguments can be any two types, then we are looking at a ton more work. The derivative with respect to the order involves generalized hypergeometric functions that we would have to implement. Thusfar, we have worked around this in Stan Math by saying that the order can only be an integer, which is fine for the von_mises_lpdf but we need half-integer orders to implement the Fisher generalization.

Now that we have a data (only) argument qualifier for functions, we can start marking arguments as data-only. So there’s a way to do this going forward both notationally for the manual in a consistent way and in the parser.

Up until now, all functions have taken all possible arguments. The apparent exceptions for some of the higher-order operations, like integate_ode, aren’t ordinary functions—they’re specialized expressions.