Hierarchical mixture models in Stan

Hi everybody!
I was trying to replicate the hierarchical mixture model proposed in “Bayesian hierarchical model for the prediction of football results” by G.Baio and M.Blangiardo (2010), but I have some trouble in writing the model in Stan (this is the first time that I use Stan and I am not familiar with c++ language).

Breafly, there are 3 different generating mechanism, one for the top teams, one for the mid-table teams and one for the bottom-table teams. For each team (t) 2 latent variables
grp^{att}(t) and grp^{def}(t) , which take on the value 1,2,3 identifying the bottom-, mid- or top-table performances in terms of attack and defence. These are given suitable categorical distributions, each depending on a vector of prior probabilities
\pi^{att} = (\pi^{att}_{1t}, \pi^{att}_{2t}, \pi^{att}_{3t}) and
\pi^{def} = (\pi^{def}_{1t}, \pi^{def}_{2t}, \pi^{def}_{3t}) which follow a Dirichlet distribution with parameters (1,1,1). They represent the prior change that each team is in one of the 3 categories.

Let y1 be the number of goals scored by the home teams, the likelihood is
y1 | \theta1 \sim Poisson(\theta1) with
log(\theta1) = home + att_{hometeam} + def_{visitorteam}

Let y2 be the number of goals scored by the away teams, the likelihood is
y2 | \theta2 \sim Poisson(\theta2) with
log(\theta2) = att_{visitorteam} + def_{hometeam}


  • home \sim Normal(0, 0.0001) is the advantage for teams playing at home
  • att_t \sim nct(\mu^{att}_{grp(t)}, \tau^{att}_{grp(t)}, \nu) (nct = non centered student t). In a mixture model, the parameter att is computed as follows:
    att_t = \sum_{k=1}^3 \pi^{att}_{kt} * nct(\mu^{att}_k, \tau^{att}_k, \nu=4)
  • def_t \sim nct(\mu^{def}_{grp(t)}, \tau^{def}_{grp(t)}, \nu) (nct = non centered student t). In a mixture model, the parameter def is computed as follows:
    def_t = \sum_{k=1}^3 \pi^{def}_{kt} * nct(\mu^{def}_k, \tau^{def}_k, \nu=4)

If a team has poor performance, then it is likely to show low propensity to score and an high propensity to concede goals, and this is represented using the following truncated normal distributions:

  • poor teams
    \mu^{att}_1 \sim truncNormal(0, 0.001, -3, 0)
    \mu^{def}_1 \sim truncNormal(0, 0.001, 0, 3)
  • top teams
    \mu^{att}_3 \sim truncNormal(0, 0.001, 0, 3)
    \mu^{def}_3 \sim truncNormal(0, 0.001, -3, 0)
    *mid teams
    \mu^{att}_2 \sim Normal(0, \tau^{att}_2)
    \mu^{def}_2 \sim Normal(0, \tau^{def}_2)
    *for all the 3 groups, the precisions are modelled using independent minimally informative Gamma distributions
    \tau^{att}_k \sim Gamma(0.01 , 0.01)
    \tau^{def}_k \sim Gamma(0.01 , 0.01)

I’ve done my best writing the below stan model, but it does not work, obviously.
I have questions to ask and explanations to give regarding the model I wrote.

  1. Since Stan does not allow the use of discrete parameters, I found an altenrative way on internet: based on alpha. Are there any other ways?
  2. mu_att and mu_def vary over groups. I tried to capture this behavour using a for loop. Fitting the model on R, it gives me a warning saying that sampling was not possible because I do not include the case k=0, but k=1,2,3! How can I write this part in a suitable way?
  3. in the last for loop, I wanted to express these:
    att_t = \sum_{k=1}^3 \pi^{att}_{kt} * nct(\mu^{att}_k, \tau^{att}_k, \nu=4)
    def_t = \sum_{k=1}^3 \pi^{def}_{kt} * nct(\mu^{def}_k, \tau^{def}_k, \nu=4)
    but I don’t really think that this is the right way of doing so.

I’ve tried a lot of codes but I no longer know what to try.

data {
  // Dimensions
  int<lower=1> N; // number of matches
  int<lower=1, upper=20> J; // number of teams
  int<lower=1, upper=3> K; // number of groups
  // Variables
  int<lower=0> y1[N]; // number of goals of home: outcomes
  int<lower=0> y2[N]; // number of goals of visitor: outcomes
  int<lower=1, upper=J> hteams_idx[N]; //identifier for each teams in home
  int<lower=1, upper=J> vteams_idx[N]; //identifier for each teams in away
  real<lower=0.5> alphaNu; //degrees of freedom for alpha0N = J*2
  vector<lower=0.5>[K] alpha0mu; // prior for alpha0 = c(0.5, 0.5, 0.5)
parameters {
  real<lower=0, upper=1> home;
  vector[J-1] att_start;
  vector[J-1] def_start;
  simplex[K] pi_att;
  simplex[K] pi_def; 
  real<lower=0, upper=N> alphaN;
  simplex[K] alpha0;
  ordered[K] mu_att;
  ordered[K] mu_def;
  real<lower=0> tau_att;
  real<lower=0> tau_def;
transformed parameters{
    vector[J] att = append_row(att_start, -sum(att_start));
    vector[J] def = append_row(def_start, -sum(def_start));
    vector<lower=1>[K] alpha = alpha0*alphaN;
  home ~ normal(0, 0.1); //flat prior, fixed effect
  alphaN ~ chi_square(alphaNu);
  alpha0 ~ dirichlet(alpha0mu);
  pi_att ~ dirichlet(alpha+0.01);
  pi_def ~ dirichlet(alpha+0.01);
  tau_att ~ gamma(0.01, 0.01);
  tau_def ~ gamma(0.01, 0.01);

 for(k in 1:K){
  mu_att ~ normal(0, 0.001); 
  if (mu_att[k==1] < -3 || mu_att[k==1] > 0)
  target += negative_infinity();
  target += -log_diff_exp(normal_lcdf(0 | 0, 0.001),
                          normal_lcdf(-3 | 0, 0.001));
  if (mu_att[k==2])
  target += normal_lpdf(mu_att[k==2] | 0, tau_att);
  if (mu_att[k==3] < 0 || mu_att[k==3] > 3)
  target += negative_infinity();
  target += -log_diff_exp(normal_lcdf(3 | 0, 0.001),
                          normal_lcdf(0 | 0, 0.001));
   for(k in 1:K){
  mu_def ~ normal(0, 0.001);
  if (mu_def[k==1] < 0 || mu_def[k] > 3)
  target += negative_infinity();
  target += -log_diff_exp(normal_lcdf(3 | 0, 0.001),
                          normal_lcdf(0 | 0, 0.001));
  if (mu_def[k==2])
  target += normal_lpdf(mu_def[k] | 0, tau_def);
  if (mu_def[k==3] < -3 || mu_def[k] > 0)
  target += negative_infinity();
  target += -log_diff_exp(normal_lcdf(0 | 0, 0.001),
                          normal_lcdf(-3 | 0, 0.001));

for(j in 1:J){
  vector[J] att_mixt;
  vector[J] def_mixt;
  for(k in 1:K){
    att_mixt[j] += log(pi_att[k]) + 
    student_t_lpdf(att[j]|4, mu_att[k], tau_att);
    def_mixt += log(pi_def[k]) + 
    student_t_lpdf(def[j]| 4, mu_def[k], tau_def);
  target += poisson_log_lpmf(y1| home + att_mixt[hteams_idx] 
  - def_mixt[vteams_idx]);
  target += poisson_log_lpmf(y2| att_mixt[vteams_idx] - def_mixt[hteams_idx]);

I thank in advance anyone who takes the time to help me understanding where I am going wrong and how could I improve.