Coding related questions from a rookie's first project

CanYolal · June 5, 2019, 9:17pm

Hello everybody, I am working on my first Stan project and I have couple of questions. It’s going to be a long post, I don’t want to waste anybody’s time because, for sure, there will be silly and time waste questions for Stan masters. I am looking for enthusiasts’ helps for this rookie lad. :)

First of all, I haven’t run the model yet. My questions maybe be odd and silly or I may have done rookie mistakes in this post, in advance my apologies. :) I am not a statistician and it’s my first experience with a statistics based problem/project.

So, I am basically working on a paper about football predictions. The main model is as following:
y_{m1} | \theta_{m1},\lambda_{m1} \sim poisson(p_{m1}*\theta_{m1} + (1-p_{m1})*\lambda_{m1})) \\y_{m2} | \theta_{m2},\lambda_{m2} \sim poisson(p_{m2}*\theta_{m2} + (1-p_{m2})*\lambda_{m2}))
m’s are the matches and y_{m1} and\ y_{m2} are home team’s goals and away team’s goals respectively.

Parameter p_{m1},p_{m2} are non-informative priors and defined as p_{m} \sim beta(a,b).

The point I’m having an issue in modeling is writing the log-linear model for total of T teams. The model is as following:
log(\theta_{m1}) = \mu + att_{t[m]1} + def_{t[m]2} \\ log(\theta_{m2}) = att_{t[m]2} + def_{t[m]1}\ for\ t=1,..,T
where t[m]1 means home team’s attack power at game m and t[m]2 for away’s team.

Attack and defensive powers of teams have following distributions for T teams and Tau seasons:
att_{t,\tau} \sim N(\mu_{att} + att_{t,\tau-1},\sigma_{att}^2)\ for\ t=1,..T\ , \tau=2,..,Tau\\ def_{t,\tau} \sim N(\mu_{def} + def_{t,\tau-1},\sigma_{def}^2)\ for\ t=1,..T\ , \tau=2,..,Tau \\ att_{t,1} \sim N(\mu_{att} ,\sigma_{att}^2)\ for\ \tau=1\ ,t=1,..,T \\ def_{t,1} \sim N(\mu_{def} ,\sigma_{def}^2)\ for\ \tau=1\ ,t=1,..,T
with a zero-sum constraint as:
\sum_{t=1}^{T} att_{t,\tau}=0\ , \sum_{t=1}^{T} def_{t,\tau}=0\ for\ t=1,..T\ , \tau=1,..,Tau

Priors are defined also in the following, where cauchy^+ refers half-cauchy.
\mu,\ \mu_{def},\ \mu_{att}\ \sim N(0,10) \\ \sigma_{att},\ \sigma_{def}\ \sim Cauchy^+(0,2.5)

And the remaining distributions are defined as following but I haven’t come to that part yet. Just giving here to cover every variables in the model.
\hat{\theta}_{m1}^{1},...,\hat{\theta}_{m1}^{S}\ \sim truncN(\lambda_{m1},\tau_{1}^2,0,\infty)\\ \hat{\theta}_{m2}^{1},...,\hat{\theta}_{m2}^{S}\ \sim truncN(\lambda_{m2},\tau_{2}^2,0,\infty)\ for\ s=1,..,S

\lambda_{m1}\ \sim truncN(\alpha_{1}, 10, 0, \infty)\\ \lambda_{m2}\ \sim truncN(\alpha_{2}, 10, 0, \infty)

Then my data vector for every match becomes as in the following, where s refers bookmakers
\mathbf{(y,\hat{\theta^s})}=(y_{m1},y_{m2},\hat{\theta}_{m1}^{s},\hat{\theta}_{m2}^{s}, s=1,..,S)

Finally, the code I have been writing:

data {
	int <lower=1> n_games; //380 matches every season
	int <lower=1> n_teams; //20 teams for premier league
	int <lower=1> n_seasons; //9 seasons training set, 1 test set total 10
	int <lower=1, upper=n_teams> home_team[n_games,n_seasons];
	int <lower=1, upper=n_teams> away_team[n_games,n_seasons];
	int <lower=0> home_goals[n_games];
	int <lower=0> away_goals[n_games];
	int <lower=0> n_bookmakers; // # of bookmakers
	real<lower=0> theta_hat1[n_games,n_bookmakers];
	real<lower=0> theta_hat2[n_games,n_bookmakers];
}

parameters{
	real alpha1;
	real alpha2;
	real<lower=0> mu;
	real<lower=0> mu_att;
	real<lower=0> sigma_def;
	real<lower=0> sigma_att;
	real<lower=0> mu_def;
	real lambda1[n_games];
	real lambda2[n_games];
  vector[20] att_season1;
  vector[20] def_season1;
  real<lower=0> p1[n_games];
  real<lower=0> p2[n_games];
  real att_raw[n_teams,n_seasons];
  real def_raw[n_teams,n_seasons];
  real<lower=0> tau_1; 
  real<lower=0> tau_2;
  vector[n_games] theta1;
  real theta2[n_games];
}

transformed parameters {
  // sum-to-zero constraints
  matrix[n_teams,n_seasons] att;
  matrix[n_teams,n_seasons] def;
  
  for (z in 1:n_seasons){
    for (tt in 1:(n_teams-1)) {
      att[tt,z] = att_raw[tt,z] - mean(att_raw[,z]);
      def[tt,z] = def_raw[tt,z] - mean(def_raw[,z]);
    }
  }
}

model{
  //priors
	target += normal_lpdf(mu |0, 10);
	target += normal_lpdf(mu_att | 0,10);
	target += normal_lpdf(mu_def | 0,10);
	target += cauchy_lpdf(sigma_att | 0,2.5);
	target += cauchy_lpdf(sigma_def | 0,2.5);
	target += uniform_lpdf(tau_1 | 0,10);
	target += uniform_lpdf(tau_2 | 0,10);
	
	for (s in 1:n_seasons){
	  for(t in 1:n_teams){
	    if (s==1){
	      target += normal_lpdf(att_raw[t,s] | mu_att, pow(sigma_att,2));
	      target += normal_lpdf(def_raw[t,s] | mu_def, pow(sigma_def,2));
	    }
	    else{
	      target += normal_lpdf(att_raw[t,s] | att_raw[t,s-1],pow(sigma_att,2));
	      target += normal_lpdf(def_raw[t,s] | def_raw[t,s-1],pow(sigma_def,2));
	    }
	  }
	  for (m in 1:n_games){
	    target += cauchy_lpdf(p1[m]|2, 2);
	    target += cauchy_lpdf(p2[m]|2, 2);
	    theta1[m] = exp( mu + att[home_team[m],s] + def[away_team[m],s]);
	    theta2[m] = exp(att[away_team[m],s]] + def[home_team[m],s]]);
	    
	    lambda1[m] ~ normal(alpha1, 10) T[0,positive_infinity()];//choose alphas
	    lambda2[m] ~ normal(alpha2, 10) T[0,positive_infinity()];
	    
      for (b in 1:n_bookmakers){
        theta_hat1[m,b]  ~ normal(lambda1[m],tau_1) T[0,positive_infinity()];
        theta_hat2[m,b]  ~ normal(lambda2[m],tau_2) T[0,positive_infinity()];
      }
	    
	    target += poisson_lpmf(home_goals[m] | p1[m] * theta1[m] + (1-p1[m])* lambda1[m]);
	    target += poisson_lpmf(away_goals[m] | p2[m] * theta2[m] + (1-p2[m])* lambda2[m]);
	  }
	}
}

generated quantities{
  
}

I couldn’t solve the issue about definition of log-linear model theta1[m] = exp( mu + att[home_team[m],s] + def[away_team[m],s]);. Any ideas how to define it in this context? I also think that I don’t define \theta_{m} properly in parameter part properly.
When defining truncated normal distribution, is there any difference between a ~normal(b,c) T[0,positive_infinity()];, where real a; and a~normal(b,c), where real<lower=0,upper=positive_infinity()> a ?
I also look forward to any comments/ideas/feedbacks".

I would like to thank everybody, who follows this newbie post until this point!

Topic		Replies	Views
One-off STAN tutoring for a small research project Jobs	9	670	April 15, 2020
Help with creating desired vector Modeling specification	3	411	January 8, 2021
Doubts when modelling the probability of a team beating another team Modeling specification	3	583	November 23, 2020
Log probability evaluates to log(0), i.e. negative infinity Modeling	3	1649	December 20, 2019
Use log form in the parameters block and unlog in the transformed parameters block Modeling techniques	4	596	December 17, 2020

Coding related questions from a rookie's first project

Related topics