# Coding related questions from a rookie's first project

Hello everybody, I am working on my first Stan project and I have couple of questions. It’s going to be a long post, I don’t want to waste anybody’s time because, for sure, there will be silly and time waste questions for Stan masters. I am looking for enthusiasts’ helps for this rookie lad. :)

First of all, I haven’t run the model yet. My questions maybe be odd and silly or I may have done rookie mistakes in this post, in advance my apologies. :) I am not a statistician and it’s my first experience with a statistics based problem/project.

So, I am basically working on a paper about football predictions. The main model is as following:
y_{m1} | \theta_{m1},\lambda_{m1} \sim poisson(p_{m1}*\theta_{m1} + (1-p_{m1})*\lambda_{m1})) \\y_{m2} | \theta_{m2},\lambda_{m2} \sim poisson(p_{m2}*\theta_{m2} + (1-p_{m2})*\lambda_{m2}))
m’s are the matches and y_{m1} and\ y_{m2} are home team’s goals and away team’s goals respectively.

Parameter p_{m1},p_{m2} are non-informative priors and defined as p_{m} \sim beta(a,b).

The point I’m having an issue in modeling is writing the log-linear model for total of T teams. The model is as following:
log(\theta_{m1}) = \mu + att_{t[m]1} + def_{t[m]2} \\ log(\theta_{m2}) = att_{t[m]2} + def_{t[m]1}\ for\ t=1,..,T
where t[m]1 means home team’s attack power at game m and t[m]2 for away’s team.

Attack and defensive powers of teams have following distributions for T teams and Tau seasons:
att_{t,\tau} \sim N(\mu_{att} + att_{t,\tau-1},\sigma_{att}^2)\ for\ t=1,..T\ , \tau=2,..,Tau\\ def_{t,\tau} \sim N(\mu_{def} + def_{t,\tau-1},\sigma_{def}^2)\ for\ t=1,..T\ , \tau=2,..,Tau \\ att_{t,1} \sim N(\mu_{att} ,\sigma_{att}^2)\ for\ \tau=1\ ,t=1,..,T \\ def_{t,1} \sim N(\mu_{def} ,\sigma_{def}^2)\ for\ \tau=1\ ,t=1,..,T
with a zero-sum constraint as:
\sum_{t=1}^{T} att_{t,\tau}=0\ , \sum_{t=1}^{T} def_{t,\tau}=0\ for\ t=1,..T\ , \tau=1,..,Tau

Priors are defined also in the following, where cauchy^+ refers half-cauchy.
\mu,\ \mu_{def},\ \mu_{att}\ \sim N(0,10) \\ \sigma_{att},\ \sigma_{def}\ \sim Cauchy^+(0,2.5)

And the remaining distributions are defined as following but I haven’t come to that part yet. Just giving here to cover every variables in the model.
\hat{\theta}_{m1}^{1},...,\hat{\theta}_{m1}^{S}\ \sim truncN(\lambda_{m1},\tau_{1}^2,0,\infty)\\ \hat{\theta}_{m2}^{1},...,\hat{\theta}_{m2}^{S}\ \sim truncN(\lambda_{m2},\tau_{2}^2,0,\infty)\ for\ s=1,..,S

\lambda_{m1}\ \sim truncN(\alpha_{1}, 10, 0, \infty)\\ \lambda_{m2}\ \sim truncN(\alpha_{2}, 10, 0, \infty)

Then my data vector for every match becomes as in the following, where s refers bookmakers
\mathbf{(y,\hat{\theta^s})}=(y_{m1},y_{m2},\hat{\theta}_{m1}^{s},\hat{\theta}_{m2}^{s}, s=1,..,S)

Finally, the code I have been writing:

data {
int <lower=1> n_games; //380 matches every season
int <lower=1> n_teams; //20 teams for premier league
int <lower=1> n_seasons; //9 seasons training set, 1 test set total 10
int <lower=1, upper=n_teams> home_team[n_games,n_seasons];
int <lower=1, upper=n_teams> away_team[n_games,n_seasons];
int <lower=0> home_goals[n_games];
int <lower=0> away_goals[n_games];
int <lower=0> n_bookmakers; // # of bookmakers
real<lower=0> theta_hat1[n_games,n_bookmakers];
real<lower=0> theta_hat2[n_games,n_bookmakers];
}

parameters{
real alpha1;
real alpha2;
real<lower=0> mu;
real<lower=0> mu_att;
real<lower=0> sigma_def;
real<lower=0> sigma_att;
real<lower=0> mu_def;
real lambda1[n_games];
real lambda2[n_games];
vector[20] att_season1;
vector[20] def_season1;
real<lower=0> p1[n_games];
real<lower=0> p2[n_games];
real att_raw[n_teams,n_seasons];
real def_raw[n_teams,n_seasons];
real<lower=0> tau_1;
real<lower=0> tau_2;
vector[n_games] theta1;
real theta2[n_games];
}

transformed parameters {
// sum-to-zero constraints
matrix[n_teams,n_seasons] att;
matrix[n_teams,n_seasons] def;

for (z in 1:n_seasons){
for (tt in 1:(n_teams-1)) {
att[tt,z] = att_raw[tt,z] - mean(att_raw[,z]);
def[tt,z] = def_raw[tt,z] - mean(def_raw[,z]);
}
}
}

model{
//priors
target += normal_lpdf(mu |0, 10);
target += normal_lpdf(mu_att | 0,10);
target += normal_lpdf(mu_def | 0,10);
target += cauchy_lpdf(sigma_att | 0,2.5);
target += cauchy_lpdf(sigma_def | 0,2.5);
target += uniform_lpdf(tau_1 | 0,10);
target += uniform_lpdf(tau_2 | 0,10);

for (s in 1:n_seasons){
for(t in 1:n_teams){
if (s==1){
target += normal_lpdf(att_raw[t,s] | mu_att, pow(sigma_att,2));
target += normal_lpdf(def_raw[t,s] | mu_def, pow(sigma_def,2));
}
else{
target += normal_lpdf(att_raw[t,s] | att_raw[t,s-1],pow(sigma_att,2));
target += normal_lpdf(def_raw[t,s] | def_raw[t,s-1],pow(sigma_def,2));
}
}
for (m in 1:n_games){
target += cauchy_lpdf(p1[m]|2, 2);
target += cauchy_lpdf(p2[m]|2, 2);
theta1[m] = exp( mu + att[home_team[m],s] + def[away_team[m],s]);
theta2[m] = exp(att[away_team[m],s]] + def[home_team[m],s]]);

lambda1[m] ~ normal(alpha1, 10) T[0,positive_infinity()];//choose alphas
lambda2[m] ~ normal(alpha2, 10) T[0,positive_infinity()];

for (b in 1:n_bookmakers){
theta_hat1[m,b]  ~ normal(lambda1[m],tau_1) T[0,positive_infinity()];
theta_hat2[m,b]  ~ normal(lambda2[m],tau_2) T[0,positive_infinity()];
}

target += poisson_lpmf(home_goals[m] | p1[m] * theta1[m] + (1-p1[m])* lambda1[m]);
target += poisson_lpmf(away_goals[m] | p2[m] * theta2[m] + (1-p2[m])* lambda2[m]);
}
}
}

generated quantities{

}

• I couldn’t solve the issue about definition of log-linear model theta1[m] = exp( mu + att[home_team[m],s] + def[away_team[m],s]);. Any ideas how to define it in this context? I also think that I don’t define \theta_{m} properly in parameter part properly.

• When defining truncated normal distribution, is there any difference between a ~normal(b,c) T[0,positive_infinity()];, where real a; and a~normal(b,c), where real<lower=0,upper=positive_infinity()> a ?

• I also look forward to any comments/ideas/feedbacks".

I would like to thank everybody, who follows this newbie post until this point!