One can also use the Skellam distribution to model the (discrete) goal difference. It’s a bit slower than other approaches, but it worked kind of nicely whn I tried it. I think not much is gained in terms of predictive power though. Here’s a pretty straightforward Stan function for the lpmf:
functions{
real skellam_lpmf(int k, real mu1, real mu2){
int abs_k = abs(k);
real lp = -mu1 - mu2 + 0.5*k*(log(mu1) - log(mu2)) + log(modified_bessel_first_kind(abs_k, 2*sqrt(mu1*mu2)));
return lp;
}
}