Divergent transitions with bivariate negative binomial regression

games_data.csv (3.8 KB)
negbin.R (576 Bytes)
bnb1.stan (1.8 KB)
I’m trying to fit a bivariate negative binomial regression (“BNB-1” described here https://aip.scitation.org/doi/pdf/10.1063/1.4903663) to a dataset of soccer results. I keep getting warnings about divergent transitions, and the estimation results also seem to be incorrect as according to them all teams are equal in skill.

Previously when I have encountered divergent transitions it has been in context of hierarchical regressions and following various tutorials it has been a fairly straightforward matter to apply non-centered re-parametrization. Now however I’m at a loss because divergent transitions are not related to hierarchical structure, and the distribution I’m working with is not covered by tutorials I know of.

I attached a runnable example (btw the code runs for few iterations in the example I attached, I actually used 20k iterations), and if someone has an idea how to fix it that would be great (by the way I’m aware of the fact that vectorizing operations would be best practice, but I’ll worry about efficiency once I’m able to estimate the parameters at all). However, the model being somewhat complicated I understand that debugging it is much to ask, so more general pointers would be much appreciated:

  1. Are there any suggested readings on how to gain intuition on what it is about the model that causes the posterior geometry to be problematic for sampling? Or for that matter introductory readings to thinking about posterior geometry?

  2. Once I have some idea about what’s wrong, how do I go about reparametrizing a model? Is there any more systematic method than staring at the log-likelihood function and thinking carefully?

Also now that I’m posting I have a question that does not have to do with divergences but with the bivariate negative binomial itself:

The expected values of marginal distributions are according to the paper just mu[1] and mu[2]. When I fit the model without any regression parameters, i.e. just setting a prior on mu directly I avoid all warnings about divergences and things seem to work out fine. However, the posterior mean estimates I get for mu[1] and mu[2] are 1.4 and 1.6, whereas corresponding sample means are like 1.2 and 1.4 or so. The posterior variability is small enough that the actual sample means are quite outside of any reasonable probability interval. Now, to my understanding the posteriors should be centered around sample means? Does this mean BNB is fundamentally unsuited for describing these data? Or maybe that my implementation is incorrect? The latter would seem much more likely, if it were not for the fact that I also implemented separately another variant of bivariate negative binomial from another paper and got essentially the exact same results. EDIT: having contemplated, is it that sample mean of first component is not an unbiased estimate of the expected value of corresponding marginal when it comes to BNB?

Thanks!

Numerical precision issues in evaluating lpdfs/lpmfs is another thing that can cause divergences. Since that’s what you’re doing here, that’s probably where I’d start. Pows, exps, logs, and lgammas can all get pretty fragile.

Does this mean BNB is fundamentally unsuited for describing these data?

I don’t know. But developing a new pmf at the same time as a new model isn’t gonna be easy :D.

If it’s the bivariate negative binomial regression you care about, rrobably the best way to figure out what’s happening is by putting numbers into the pmf and see what happens to the internal values. Might be easier to do this in R than Stan.

And presumably you’ll go through and use as many numerical tricks as you can in the lpmf evaluations. Like replacing log(1 / (a_inv[t] + 1)) with -log1p(a_inv[t]), for instance.

If it’s the soccer you care about, might be worth just working with a different/simpler model until you understand the dataset better (maybe you don’t need the bivariate negative binomial to do the modeling you want).