Skellam distribution: overflow in modified_bessel_first_kind

Gertjan · October 2, 2017, 7:48pm

Hello,

I read Ben Goodrich’s post from last summer on stan-dev about reimplementing the modified bessel function of the first kind (even more specific: its logarithm).
Ben was curious in what statistical contexts this Bessel function comes up.

Well …
Last week I started fitting Stan models with the Skellam distribution, and this needs …. The modified Bessel of the first kind!

The Skellam distribution is the distribution for the difference of two independent Poisson distributed variables.

See e.g. here

It appears well suited for modelling football score differences:

http://www2.stat-athens.aueb.gr/~jbn/papers/files/20_Karlis_Ntzoufras_2009_IMA_presentation_handouts_v01.pdf

During fitting I ran into the overflow in boost::math::cyl_bessel_i problem.
This was reported on the Stan users list before, e.g. in august 2016 by Mike Lawrence for the von mises dist.

It appears only at the start of a new chain (but not always), the chains that get past this point behave as expected.

For now, I hard coded a ceiling at x = 700:

if (x < 700)
log_prob = total + log(modified_bessel_first_kind(k, x));
else
log_prob = total + log(modified_bessel_first_kind(k, 700));

This doesn’t feel right, but it seems to work.
I’m a bit unsure how to proceed, and would very much appreciate some advice.

Gertjan

bgoodri · October 2, 2017, 7:51pm

I would try setting init_r to be some value that is less than its default of 2.0.

But, yes we need to merge log_modified_bessel_first_kind. I know the derivatives but was a bit unsure how best to deal with them.

Bob_Carpenter · October 3, 2017, 12:17am

Do scores themselves look like they’re Poisson? I’d think they’d be overdispersed to be modeled well as Poisson.

Rather than switching the program density the way you did, you could also constrain the value of x. Or try initializing less broadly as @bgoodri suggests.

Gertjan · October 3, 2017, 9:37am

@bgoodri : Setting init_r to 1 appears to solve my problem. I was not aware of this option, thanks.

@Bob_Carpenter : I don’t know about the scores themselves. However, Karlis and Ntzoufras have a paper where they show that the distribution of the predicted goal differences looks very similar to the actual observed goal difference distribution. (They also have a zero inflated model that can capture excess draws, btw). The paper is called “Bayesian modelling of football outcomes: using the Skellam’s distribution for the goal difference”.

Both: thanks for your help!

Bob_Carpenter · October 3, 2017, 10:18pm

The paper’s paywalled. I found a handout:

http://www2.stat-athens.aueb.gr/~jbn/papers/files/20_Karlis_Ntzoufras_2009_IMA_presentation_handouts_v01.pdf

and it looks like they predict too many ties with the model. This looks like their original paper:

Karlis D. and Ntzoufras, I. (2006). Bayesian analysis of the differences of count data. Statistics in Medicine, 25, 1885-1905.

bgoodri · October 5, 2017, 12:29am

What exactly is our policy about having functions in Stan Math that are only defined for double and / or int? AFAIK, the only place that the logarithm of the Bessel I function is used in statistics is in the Von Mises - Fisher distribution. In Stan, that is implemented in a numerically antistable fashion but with operands_and_partials

github.com

stan-dev/math/blob/develop/stan/math/prim/scal/prob/von_mises_lpdf.hpp#L70


scalar_seq_view<T_scale> kappa_vec(kappa);


VectorBuilder<true, T_partials_return, T_scale> kappa_dbl(length(kappa));
VectorBuilder<include_summand<propto, T_scale>::value, T_partials_return,
              T_scale>
    log_bessel0(length(kappa));
for (size_t i = 0; i < length(kappa); i++) {
  kappa_dbl[i] = value_of(kappa_vec[i]);
  if (include_summand<propto, T_scale>::value)
    log_bessel0[i]
        = log_modified_bessel_first_kind(0, value_of(kappa_vec[i]));
}


operands_and_partials<T_y, T_loc, T_scale> ops_partials(y, mu, kappa);


size_t N = max_size(y, mu, kappa);


for (size_t n = 0; n < N; n++) {
  const T_partials_return y_ = value_of(y_vec[n]);
  const T_partials_return y_dbl = y_ - floor(y_ / TWO_PI) * TWO_PI;
  const T_partials_return mu_dbl = value_of(mu_vec[n]);

Thus, to fix the only case we know of where this is a problem, we just need a numerically stable log_modified_bessel_first_kind function (which I have already implemented) whose two arguments are doubles and we only ever need the derivative with respect to kappa_vec[i], which is computed as another Bessel function.

However, if we are going to say that a function in Stan Math must be implemented when its arguments can be any two types, then we are looking at a ton more work. The derivative with respect to the order involves generalized hypergeometric functions that we would have to implement. Thusfar, we have worked around this in Stan Math by saying that the order can only be an integer, which is fine for the von_mises_lpdf but we need half-integer orders to implement the Fisher generalization.

Bob_Carpenter · October 6, 2017, 2:13pm

Now that we have a data (only) argument qualifier for functions, we can start marking arguments as data-only. So there’s a way to do this going forward both notationally for the manual in a consistent way and in the parser.

Up until now, all functions have taken all possible arguments. The apparent exceptions for some of the higher-order operations, like integate_ode, aren’t ordinary functions—they’re specialized expressions.

Topic		Replies	Views
Skellam distribution Modeling	1	286	August 3, 2023
Problem with sampling in a variation of skellam (Rstan) Modeling	31	1451	April 6, 2019
Underflow? in custom distribution function Modeling	5	650	June 12, 2018
How define generalized inverse gaussian distribution Modeling	3	683	September 16, 2019
Mixed discrete-continuous parametre Modeling	4	626	December 27, 2018

Skellam distribution: overflow in modified_bessel_first_kind

Related Topics