Hello everybody, I am working on my first Stan project and I have couple of questions. It’s going to be a long post, I don’t want to waste anybody’s time because, for sure, there will be silly and time waste questions for Stan masters. I am looking for enthusiasts’ helps for this rookie lad. :)
First of all, I haven’t run the model yet. My questions maybe be odd and silly or I may have done rookie mistakes in this post, in advance my apologies. :) I am not a statistician and it’s my first experience with a statistics based problem/project.
So, I am basically working on a paper about football predictions. The main model is as following:
y_{m1} | \theta_{m1},\lambda_{m1} \sim poisson(p_{m1}*\theta_{m1} + (1-p_{m1})*\lambda_{m1})) \\y_{m2} | \theta_{m2},\lambda_{m2} \sim poisson(p_{m2}*\theta_{m2} + (1-p_{m2})*\lambda_{m2}))
m’s are the matches and y_{m1} and\ y_{m2} are home team’s goals and away team’s goals respectively.
Parameter p_{m1},p_{m2} are non-informative priors and defined as p_{m} \sim beta(a,b).
The point I’m having an issue in modeling is writing the log-linear model for total of T teams. The model is as following:
log(\theta_{m1}) = \mu + att_{t[m]1} + def_{t[m]2} \\ log(\theta_{m2}) = att_{t[m]2} + def_{t[m]1}\ for\ t=1,..,T
where t[m]1 means home team’s attack power at game m and t[m]2 for away’s team.
Attack and defensive powers of teams have following distributions for T teams and Tau seasons:
att_{t,\tau} \sim N(\mu_{att} + att_{t,\tau-1},\sigma_{att}^2)\ for\ t=1,..T\ , \tau=2,..,Tau\\
def_{t,\tau} \sim N(\mu_{def} + def_{t,\tau-1},\sigma_{def}^2)\ for\ t=1,..T\ , \tau=2,..,Tau \\
att_{t,1} \sim N(\mu_{att} ,\sigma_{att}^2)\ for\ \tau=1\ ,t=1,..,T \\
def_{t,1} \sim N(\mu_{def} ,\sigma_{def}^2)\ for\ \tau=1\ ,t=1,..,T
with a zero-sum constraint as:
\sum_{t=1}^{T} att_{t,\tau}=0\ ,
\sum_{t=1}^{T} def_{t,\tau}=0\ for\ t=1,..T\ , \tau=1,..,Tau
Priors are defined also in the following, where cauchy^+ refers half-cauchy.
\mu,\ \mu_{def},\ \mu_{att}\ \sim N(0,10) \\
\sigma_{att},\ \sigma_{def}\ \sim Cauchy^+(0,2.5)
And the remaining distributions are defined as following but I haven’t come to that part yet. Just giving here to cover every variables in the model.
\hat{\theta}_{m1}^{1},...,\hat{\theta}_{m1}^{S}\ \sim truncN(\lambda_{m1},\tau_{1}^2,0,\infty)\\
\hat{\theta}_{m2}^{1},...,\hat{\theta}_{m2}^{S}\ \sim truncN(\lambda_{m2},\tau_{2}^2,0,\infty)\ for\ s=1,..,S
\lambda_{m1}\ \sim truncN(\alpha_{1}, 10, 0, \infty)\\ \lambda_{m2}\ \sim truncN(\alpha_{2}, 10, 0, \infty)
Then my data vector for every match becomes as in the following, where s refers bookmakers
\mathbf{(y,\hat{\theta^s})}=(y_{m1},y_{m2},\hat{\theta}_{m1}^{s},\hat{\theta}_{m2}^{s}, s=1,..,S)
Finally, the code I have been writing:
data {
int <lower=1> n_games; //380 matches every season
int <lower=1> n_teams; //20 teams for premier league
int <lower=1> n_seasons; //9 seasons training set, 1 test set total 10
int <lower=1, upper=n_teams> home_team[n_games,n_seasons];
int <lower=1, upper=n_teams> away_team[n_games,n_seasons];
int <lower=0> home_goals[n_games];
int <lower=0> away_goals[n_games];
int <lower=0> n_bookmakers; // # of bookmakers
real<lower=0> theta_hat1[n_games,n_bookmakers];
real<lower=0> theta_hat2[n_games,n_bookmakers];
}
parameters{
real alpha1;
real alpha2;
real<lower=0> mu;
real<lower=0> mu_att;
real<lower=0> sigma_def;
real<lower=0> sigma_att;
real<lower=0> mu_def;
real lambda1[n_games];
real lambda2[n_games];
vector[20] att_season1;
vector[20] def_season1;
real<lower=0> p1[n_games];
real<lower=0> p2[n_games];
real att_raw[n_teams,n_seasons];
real def_raw[n_teams,n_seasons];
real<lower=0> tau_1;
real<lower=0> tau_2;
vector[n_games] theta1;
real theta2[n_games];
}
transformed parameters {
// sum-to-zero constraints
matrix[n_teams,n_seasons] att;
matrix[n_teams,n_seasons] def;
for (z in 1:n_seasons){
for (tt in 1:(n_teams-1)) {
att[tt,z] = att_raw[tt,z] - mean(att_raw[,z]);
def[tt,z] = def_raw[tt,z] - mean(def_raw[,z]);
}
}
}
model{
//priors
target += normal_lpdf(mu |0, 10);
target += normal_lpdf(mu_att | 0,10);
target += normal_lpdf(mu_def | 0,10);
target += cauchy_lpdf(sigma_att | 0,2.5);
target += cauchy_lpdf(sigma_def | 0,2.5);
target += uniform_lpdf(tau_1 | 0,10);
target += uniform_lpdf(tau_2 | 0,10);
for (s in 1:n_seasons){
for(t in 1:n_teams){
if (s==1){
target += normal_lpdf(att_raw[t,s] | mu_att, pow(sigma_att,2));
target += normal_lpdf(def_raw[t,s] | mu_def, pow(sigma_def,2));
}
else{
target += normal_lpdf(att_raw[t,s] | att_raw[t,s-1],pow(sigma_att,2));
target += normal_lpdf(def_raw[t,s] | def_raw[t,s-1],pow(sigma_def,2));
}
}
for (m in 1:n_games){
target += cauchy_lpdf(p1[m]|2, 2);
target += cauchy_lpdf(p2[m]|2, 2);
theta1[m] = exp( mu + att[home_team[m],s] + def[away_team[m],s]);
theta2[m] = exp(att[away_team[m],s]] + def[home_team[m],s]]);
lambda1[m] ~ normal(alpha1, 10) T[0,positive_infinity()];//choose alphas
lambda2[m] ~ normal(alpha2, 10) T[0,positive_infinity()];
for (b in 1:n_bookmakers){
theta_hat1[m,b] ~ normal(lambda1[m],tau_1) T[0,positive_infinity()];
theta_hat2[m,b] ~ normal(lambda2[m],tau_2) T[0,positive_infinity()];
}
target += poisson_lpmf(home_goals[m] | p1[m] * theta1[m] + (1-p1[m])* lambda1[m]);
target += poisson_lpmf(away_goals[m] | p2[m] * theta2[m] + (1-p2[m])* lambda2[m]);
}
}
}
generated quantities{
}
-
I couldn’t solve the issue about definition of log-linear model
theta1[m] = exp( mu + att[home_team[m],s] + def[away_team[m],s]);
. Any ideas how to define it in this context? I also think that I don’t define \theta_{m} properly in parameter part properly. -
When defining truncated normal distribution, is there any difference between
a ~normal(b,c) T[0,positive_infinity()];
, wherereal a;
anda~normal(b,c)
, wherereal<lower=0,upper=positive_infinity()> a
? -
I also look forward to any comments/ideas/feedbacks".
I would like to thank everybody, who follows this newbie post until this point!