I’m modeling handball match outcomes using an approach commonly applied in football analysis (based on this paper: https://www.tandfonline.com/doi/full/10.1080/02664760802684177). However, I’m having difficulty modeling the number of goals scored by the away team. The distribution of away‑team goals looks like this:
The Poisson model fails to account for the spike at 28 goals.
I tried to include several covariates, but they did not improve the fit. I also tried a negative binomial model, which likewise failed to capture the spike at 28 goals.
Should I consider a different distribution? If so, which one?
Can you think of any explanatory reason why it would be likely for some numbers (like 28) to be much more likely than either of their neighbors? It’s one thing for the model to fail to account for the spike at 28, but I also notice that if we just reallocated a bit of mass on 28 over to 29, everything would look completely fine. This suggests to me that there is either some thing special about 28 and 29 in particular, or that this is just a fluke and the Poisson is fine. When the Poisson is a bad fit, it’s usually because it doesn’t fit the tails well, not because of extra-Poisson variability in the frequency of adjacent counts.
No, it’s more likely just a natural consequence of the tempo of play and tactics employed in high level handball games.
The model’s fit for the home team’s goals looks much better overall, but it still appears to underestimate the frequencies of the two most common values (30 and 31).