Hi all,
This is the first hierarchical model that I’ve attempted, so please pardon my inexperience. I am attempting to model sports outcomes to find the latent strength of the participating teams. To do so, I am specifying each game in terms of the home team, away team, and a binary indicator that equals 1 when the home team wins and 0 when the away team wins. I assume the data are Bernoulli(win_prob) conditioned on win_prob, which I choose to be Beta(strength[i], strength[j]) where i is the home team and j is the away team. (As a quick-and-dirty solution, I picked a Gamma prior centered at gamma_mean with a controllable spread as to make it nearly symmetric.)
I would like the win_prob matrix to be symmetric, with one side being determined by the other since, for example, if team 1 as 70% of beating team 2, then team 2 should have 1 - .7 = 30% of beating team 1. However, I have been unable to specify a model such that one portion of the matrix is Beta(-,-) and the other is 1 - (value sampled from that Beta(-,-)). After many failed attempts to do this, and many Google searches, I decided on the temporary solution to define the reverse probabilities simply as Beta random variables with the strength parameters swapped, which is causing the model to have low n_eff, Rhat > 1, and other diagnostics fail the eye check. Here is the .stan:
data {
int<lower=0> teams;
int<lower=0> N;
int<lower=0,upper=1> home_win[N];
int<lower=1,upper=teams> home_team[N];
int<lower=1,upper=teams> away_team[N];
real<lower=0> gamma_mean;
real<lower=0> spread;
}
parameters {
vector<lower=0.1>[teams] strength;
matrix<lower=0,upper=1>[teams, teams] win_prob;
}
model {
for (team in 1:teams) {
strength[team] ~ gamma(spread, spread/gamma_mean);
}
for (i in 1:teams-1) {
for (j in i+1:teams) {
win_prob[i, j] ~ beta(strength[i], strength[j]);
win_prob[j, i] ~ beta(strength[j], strength[i]);
}
}
for (n in 1:N) {
home_win[n] ~ bernoulli(win_prob[home_team[n], away_team[n]]);
}
}
Data would crudely look like:
Here team 1 beats 7 at home, but loses to 5 and 26 at home.
One of my attempts was to define a transformed parameter, det_win_prob (det for deterministic), in the following way. However it caused the stan() call to run forever without output:
transformed parameters {
matrix<lower=0,upper=1>[teams, teams] det_win_prob;
for (i in 1:teams) {
for (j in 1:teams) {
if (i < j) {
det_win_prob[i, j] = win_prob[i, j];
} else if (i > j) {
det_win_prob[j, i] = 1 - win_prob[i, j];
}
}
}
}
Thank you for taking the time to read this, and I apologize if it is trivial.