Given a standard model like y_i \sim \mathcal{Poisson}\left(\lambda\right), then the calculation of the log-likelihood function for using it with the loo package is straightforward:
vector[N] log_lik;
for (n in 1:N) {
log_lik[n] = poisson_lpmf(lambda);
}
with x_i and y_i jointly distributed.
In this case, how can I write the log-likelihood calculation in the generated quantities of my model?
Thank you in advance!
Thanks @avehtari for the quick answer. I would like to compare the two models proposed by jake thompson to model soccer scores.
Here are the two models:
As I expected, the Poisson terms are independent conditional on the parameters and thus the likelihood is product of those terms and thus log likelihood is sum of these terms. This
Given vector/array arguments, _lpmf and _lpdf functions sum together the log probabilities / densities, and to get the individual terms we need to use for loop in the generated quantities.
This will compute the joint likelihood given h_goals[n] and a_goals[n]. PSIS-LOO computes what would happen if nth log_lik term is removed form the target, and in this case it would correspond to leaving out both observations h_goals[n] and a_goals[n].
Here’s my take at bivariate poisson (with a lambda_common common rate) which is a bit more tedious but IMO more precise. It does not treat the goals themselves as independent.
real bivariate_match_lpdf_f(int goal1, int goal2, real lambda1,
real lambda2, real lambda_common) {
int m = min(goal1, goal2);
vector[m + 1] log_terms;
for (z in 0 : m) {
int r1 = goal1 - z;
int r2 = goal2 - z;
log_terms[z + 1] = r1 * log(lambda1) - lgamma(r1 + 1)
+ r2 * log(lambda2) - lgamma(r2 + 1)
+ z * log(lambda_common) - lgamma(z + 1);
}
return -(lambda1 + lambda2 + lambda_common) + log_sum_exp(log_terms);
}
My version did not assume independence either, but certainly it’s likely that numerical accuracy can be improved by re-arranging the computation. Thanks for sharing the code
I might be deeply wrong about this, and misunderstand something about the topic. But here are the joint distributions I get from just using both approaches (with a big lambda_common = 4, and lambda1, lambda2 = 1.2, 0.9). The left-side approach considers that the common goals would be same for any given match (which is how I understand the correlation between the home/away goals). It is my understanding that the right-side approach (original approach from this topic) ignores the correlation. I found the correlation essential for football modeling. I don’t think the common lambda is even identifiable without it (well, maybe as an intercept). I probably have misunderstood the code or the approach of the topic, but those are two very different distributions.
The terms in the first model are independent conditional on the parameters, and thus when you fix the parameters you should see an independent bivariate distribution. They still have dependency due to the common lambda3 parameter (and the prior structure affects the strength of the dependency).
Looking at the Bivariate Poisson in Wikipedia we see where the difference comes. To make the difference more clear, the original model in the question could in theory be rewritten as
We really can’t write the model like this in Stan as Y1, Y2, and Y3 are unknown discrete variables and Stan does not support sampling of discrete variables. However, as Y1, Y2, and Y3 have to be smaller than h_goals and a_goals, they have compact support, and we can integrate them out by enumerating all possible combinations with the equation on Wikipedia and with your code. This model has stronger dependency and dependency also conditional on fixed parameters.
I don’t know which one of these models is more realistic for soccer scores.
I think my approach is a proper marginalized version of bivariate poisson, and it’s been working well for me for massive models (80k matches, hierarchical bivariate poisson regression (here I’m just trying to say that they’re converging very well etc)). To be honest, I don’t really understand the point of the original model in this topic, because two independently sampled sums of Poissons is the same thing as just two independent Poissons of the sums of the arguments.
I commented on this topic to make sure that whoever wants to implement Bivariate Poisson in Stan takes this into consideration if they stumble upon this page.
I don’t think anyone is disagreeing with that, and that is what I also said in my previous message.
Cool
I’m not an expert in match modeling, so I’m not commenting on the point of that model, but I agree that it would be better to not call that as bivariate Poisson since that term is commonly used for that other model.
Thanks again for doing that! Could you also tell your favorite paper / blog post for bivariate Poisson? I did link to Wikipedia, but Wikipedia has quite strange choice for the reference for bivariate Poisson.
@valyagolev I might have mentioned this before, but I was trying to reproduce some soccer scores models as a way to train myself and deepen my understanding of both statistics and Bayesian modeling. I’m genuinely glad if your approach outperforms others—thanks a lot for sharing your ideas and code with the community!
Please be aware that what is called bivariate poisson under that link is not an actual implementation of bivariate poisson, it’s an implementation of independent poisons for home and away goals.
There are some papers by Karlis and Ntzoufras that define and use it, I will do a bit more research to edit Wikipedia and help other prevent this unfortunate confusion. (Wikipedia’s formulation is correct, but I will add a couple of plots to make it more clear what’s the difference between the bivariate case and simply having two Poissons like above)