Hi everyone. First of all I want to say how much I love brms. I use it for so much now: including non-linear hierarchical models for my PhD, Bayesian meta-analysis, and other Biostatistics stuff. I couldn’t possibly imagine how I could I have conducted my research without it - so a big thanks to Paul for making my life so much easier.
Was wondering if someone could help me. I am trying to implement a hierarchical model specifically in brms to predict football results over a Premier League season. The idea is taken from the Stan’s video on Hierarchical Modelling in Stan: Predicting the Premier League.
This model itself is loosely based on the paper describing a Bayesian hierarchical model for the prediction of football results by Baio and Blangiardo in 2010. This is inspired by a project by Theo Rashid; you can see his work here and his website is here.
I am trying doing this to try and work backwards to understand how brms uses its multivariate syntax and how it works when specifying more complex multivariate models (i.e. those models that cannot just use mvbind). I have read the excellent vignette here, but I am still a bit confused, particular when it comes to setting priors for the model.
The model
The vector of observed responses predicts the final match scores, and is modelled as a bivariate vector \pmb{y} = (y_{home}, y_{away}) is modelled as independent Poisson:
y_{gj} | θ_{gj} ∼ Poisson(θ_{gj} )
where the parameters θ = (θ_{g1}, θ_{g2}) represent the scoring intensity in the g-th game for the team playing at home (j = 1) and away (j = 2), respectively.
These parameters are modelled by
log θ_{g1} = home + att_{h(g)} + def_{a(g)}
log θ_{g2} = att_{a(g)} + def_{h(g)}
att_{h(g)} is a parameter that represents the home (denoted by h) teams ability score goals, whereas def_{a(g)} is the “defensive” ability of the away team
home represents a home advantage intercept.
For each team in a football league (t = 1, . . . , T), the team-specific effects are modelled as exchangeable from a common distribution:
att_t ∼ Normal(µ_{att}, \tau_{att})
def_t ∼ Normal(µ_{def} , \tau_{def} )
witht the following priors/hyperpriors:
\mu_{att} \sim normal(0,0.1);
\tau_{att} \sim normal(0,1);
\mu_{def} \sim normal(0,0.1);
\tau_{def} \sim normal(0,1);
home \sim normal(0,0.0001);
The DAG from the paper looks like this:
Maggie Lieu’s Rstan code reads as follows:
//
// This Stan program defines a simple model, with a
// vector of values 'y' modeled as normally distributed
// with mean 'mu' and standard deviation 'sigma'.
//
// Learn more about model development with Stan at:
//
// http://mc-stan.org/users/interfaces/rstan.html
// https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started
//
// The input data is a vector 'y' of length 'N'.
data {
int<lower=0> nt; //number of teams
int<lower=0> ng; //number of games
int<lower=0> ht[ng]; //home team index
int<lower=0> at[ng]; //away team index
int<lower=0> s1[ng]; //score home team
int<lower=0> s2[ng]; //score away team
int<lower=0> np; //number of predicted games
int<lower=0> htnew[np]; //home team index for prediction
int<lower=0> atnew[np]; //away team index for prediction
}
parameters {
real home; //home advantage
vector[nt] att; //attack ability of each team
vector[nt] def; //defence ability of each team
//hyper parameters
real mu_att;
real<lower=0> tau_att;
real mu_def;
real<lower=0> tau_def;
}
transformed parameters {
vector[ng] theta1; //score probability of home team
vector[ng] theta2; //score probability of away team
theta1 = exp(home + att[ht] - def[at]);
theta2 = exp(att[at] - def[ht]);
}
model {
//hyper priors
mu_att ~ normal(0,0.1);
tau_att ~ normal(0,1);
mu_def ~ normal(0,0.1);
tau_def ~ normal(0,1);
//priors
att ~ normal(mu_att, tau_att);
def ~ normal(mu_def, tau_def);
home ~ normal(0,0.0001);
//likelihood
s1 ~ poisson(theta1);
s2 ~ poisson(theta2);
}
generated quantities {
//generate predictions
vector[np] theta1new; //score probability of home team
vector[np] theta2new; //score probability of away team
real s1new[np]; //predicted score
real s2new[np]; //predicted score
theta1new = exp(home + att[htnew] - def[atnew]);
theta2new = exp(att[atnew] - def[htnew]);
s1new = poisson_rng(theta1new);
s2new = poisson_rng(theta2new);
}
Ok so now for brms! Ideally I’d like to specify the same model in brms (although I understand that brms would implemenent a centred parameterization)
Assuming my input data looks like this:
I thought the way to specify this is brms would be:
bf_score_home <- bf(scorehome ~ 1 + (1 | HomeTeam) - (1 | AwayTeam))
bf_score_away <- bf(scoreaway ~ 0 + (1 | AwayTeam) - (1 | HomeTeam))
fit <- brm(
bf_score_home + bf_score_away + set_rescor(FALSE),
family = poisson,
data=epl_20_21
)
However when I tried to edit the priors with:
get_prior(
bf_score_home + bf_score_away + set_rescor(FALSE),
family = poisson,
data=epl_20_21
)
Then I get this:
The priors don’t match what I had expected. Can anyone help me figure out how I would specify a model like this with brms so that the model would be the same (although reparameterised) as Maggie Lieu’s stan code?
Thanks so much for your help!
Alex