Problem with sampling in a variation of skellam (Rstan)


I want to fit a variation of skellam which is diinflated at 0 and bounded to {-3,-2,-1,1,2,3}. For this reason in second function skellam_without_lpmf I give value -700 in log probability of these not allowed values in order to have these values a probability almost equal to 0. The probability mass function is converted to $\log(f(x)/f(-3)+f(-2)+f(-1)+f(1)+f(2)+f(3)). The problem is that when sampling finishes a message for divergent transitions is coming. In 3000 postwarmum iterations 2987 are the divergent transitions. Furthermore, if adapt delta is increased at 0.99, no improvement happens. Which can be the problem with this model? To mention that a corresponding model only "“di-inflated at 0”, has not problem in convergence. The code is similar with this. Below is presented my stan code:

functions {
  
  real skellam_lpmf(int k, real mu1, real mu2) {
    real total;
    real log_prob;

    total = (- mu1 - mu2) + (log(mu1) - log(mu2)) * k / 2;
    log_prob = total + log(modified_bessel_first_kind(k, 2 * sqrt(mu1*mu2)));

    return log_prob;
  }
  real skellam_without_lpmf(int k, real mu1, real mu2) {
    real log_prob_new;
    real f_one;
    real f_two;
    real f_three;
    real f_neg_one;
    real f_neg_two;
    real f_neg_three;
    
    
    f_one=exp(skellam_lpmf(1|mu1,mu2));
    f_two=exp(skellam_lpmf(2|mu1,mu2));
    f_three=exp(skellam_lpmf(3|mu1,mu2));
    f_neg_one=exp(skellam_lpmf(-1|mu1,mu2));
    f_neg_two=exp(skellam_lpmf(-2|mu1,mu2));
    f_neg_three=exp(skellam_lpmf(-3|mu1,mu2));
    
    if (k > 3 )  {
        log_prob_new=-700;
        return log_prob_new;
    } else if (k==0) {
      log_prob_new=-700;
      return log_prob_new;
    } else if (k <(-3)){
       log_prob_new=-700;
      return log_prob_new;
    } else {
        log_prob_new=skellam_lpmf(k|mu1,mu2)-log(f_one+f_two+f_three+f_neg_one+f_neg_two+f_neg_three);
        return log_prob_new;
    }
    
    
  }
  
}

data {
  int <lower=1> n_games; //number of games 132
  int <lower=1> n_teams; //number of teams 12
  int <lower=1,upper=n_teams> home_team[n_games];
  int <lower=1,upper=n_teams> away_team[n_games];
  int <lower=0,upper=3> home_sets[n_games];//0-3 sets can have each team
  int <lower=0,upper=3> away_sets[n_games];//0-3 sets can have each team
}

parameters {
  real mu;
  real home;
  real attack_raw[n_teams - 1];
  real defense_raw[n_teams - 1];
}

transformed parameters {
  // Enforce sum-to-zero constraints
  vector[n_teams]    attack;
  vector[n_teams]   defense;
  vector[n_games]   lambda1;
  vector[n_games]   lambda2;  
  

  for (t in 1:(n_teams-1)) {
    attack[t] = attack_raw[t];
    defense[t] = defense_raw[t];
  }
  
  attack[n_teams] = -sum(attack_raw);
  defense[n_teams] = -sum(defense_raw);
  
  //likelihood-systematic component
  for (g in 1:n_games) {
    lambda1[g]=exp(mu+home+attack[home_team[g]]+defense[away_team[g]]);
    lambda2[g]=exp(mu+attack[away_team[g]]+defense[home_team[g]]);
  }
}


model {
  
  
  int    sets_diff[n_games];
  
  //Priors
  target+=normal_lpdf(mu|0,100);
  target+=normal_lpdf(home|0,100);
  target+=normal_lpdf(attack|0,100);
  target+=normal_lpdf(defense|0,100);
  
  //likelihood-systematic component
  for (g in 1:n_games) {
    
    sets_diff[g]=home_sets[g]-away_sets[g];
    target+=skellam_without_lpmf(sets_diff[g]|lambda1[g],lambda2[g]);
    
  }
}

If these values have a probability zero but they’re in your data, isn’t that a problem? They aren’t actually probability zero you measured them.

If you wanted to ignore values of the data that weren’t -3, -2, -1, 0, 1, 2, 3, then you’d just remove them from your dataset, but that isn’t a great idea. There’s probably something else going on here.

These values (0, < (-3) and >3) do not exist in our data. For this reason i fitted such a model because i want to give zero probability in these values.

Oh okay so if it’s just some sort of truncated distribution thing, then maybe:

vector[6] lpmfs;
for(i in 1:3) {
  lpmfs[i] = skellam_lpmf(i - 4 | mu1, mu2);
  lpmfs[i + 3] = skellam_lpmf(i | mu1, mu2);
}

target += skellam_lpmf(y | mu1, mu2)
target += - N * log_sum_exp(lpmfs); // Assuming y is a vector and there are N of them

Does this look right to you? log_sum_exp does the normalization a little cleaner.

I find this suggestion useful although I have to “vectorize” some elements of data in order to run this suggestion because my model is based on iterations. Another fear of mine is that with this suggestion I dont see the zero probability in other values.

If there are values other than -3, -2, -1, 1, 2, 3, then I don’t think this is a valid modeling assumption, and you’d wanna do something else. Maybe a mixture would be the simplest thing?

The values contained in our data belong to this subset {-3,-2,-1,1,2,3} and for this reason i was thinking to give zero probability in other values. Because of this reason in my subset values I have defined the pmf with this way:

. My main problem is the vectorization and I cannot understand why “vectorization” with your way may solve the problem of sampling.

The vectorization itself shouldn’t affect the samplling in any meaningful way.

Using log_sum_exp for the normalization might help avoid underflows.

It might not affect it either – this was just me guessing at what the problem might be :D. Did you get a chance to try it?

Not yet because I think that for your version will be need for “vectorizing” many parts of my code in order to be syntactically corrrect and compatitible with your way. Am I right?Or I need only to change the model block?

If you want a non-vectorized version of the code I wrote above you can do:

vector[6] lpmfs;
for(i in 1:3) {
  lpmfs[i] = skellam_lpmf(i - 4 | mu1, mu2);
  lpmfs[i + 3] = skellam_lpmf(i | mu1, mu2);
}

{
  real normalization = log_sum_exp(lpmfs);
  
  for(n in 1:N) {
    target += skellam_lpmf(y[n] | mu1, mu2) - normalization;
  }
}

edit: added a bit of code

2 Likes

Thank you, I will try as soon as possible and I will sent you immediately my results or problems.

1 Like

vector[6] lpmfs;
for(i in 1:3) {
lpmfs[i] = skellam_lpmf(i - 4 | mu1, mu2);
lpmfs[i + 3] = skellam_lpmf(i | mu1, mu2);
}

{
real normalization = log_sum_exp(lpmfs);

for(n in 1:N) {
target += skellam_lpmf(y[n] | mu1, mu2) - normalization;
}
}
It extracts me a kind of this error:

Unknown variable: vector
variable “vector” does not exist.
error in ‘model11b47f25f5_trial’ at line 104, column 9

102: }
103:
104: vector[6] lpmfs;
^
105:

Error in stanc(filename, allow_undefined = TRUE) :
failed to parse Stan model ‘trial’ due to the above error.

I think that error is due to difinition of skellam _lpmf which has real mu1,real mu2 and not in vector scale. If this is the problem, I do not know how to define a vector of integers k. I have thought to define int k[132] (132 are my number of observations) but it does not accept that.

Variable definitions need to be at the top of the model block they are in. Give that a try!

I have tried the below 2 versions according to your instructions. These models were syntactically correct but again I have the same problem. When sampling finishes, it extracts me an error for divergent transitions as first. Here I present you the two stan codes I have run. Do you see any problem?

code1

functions {

real skellam_lpmf(int k, real mu1, real mu2) {
real total;
real log_prob;

total = (- mu1 - mu2) + (log(mu1) - log(mu2)) * k / 2;
log_prob = total + log(modified_bessel_first_kind(k, 2 * sqrt(mu1*mu2)));

return log_prob;

}

real skellam_without_lpmf(int k, real mu1, real mu2) {
real log_prob_new;
vector[6] lpmfs;
real normalization;
for(i in 1:3) {
lpmfs[i] = skellam_lpmf(i - 4 | mu1, mu2);
lpmfs[i + 3] = skellam_lpmf(i | mu1, mu2);
}
normalization = log_sum_exp(lpmfs);

log_prob_new=skellam_lpmf(k|mu1,mu2)-normalization;//log(f(x))-log(f(-3)+f(-2)+f(-1)+f(1)+f(2)+f(3))
return log_prob_new;

}

}

data {
int <lower=1> n_games; //number of games 132
int <lower=1> n_teams; //number of teams 12
int <lower=1,upper=n_teams> home_team[n_games];
int <lower=1,upper=n_teams> away_team[n_games];
int <lower=0,upper=3> home_sets[n_games];//0-3 sets can have each team
int <lower=0,upper=3> away_sets[n_games];//0-3 sets can have each team
}

parameters {
real mu;
real home;
real attack_raw[n_teams - 1];
real defense_raw[n_teams - 1];
}

transformed parameters {
// Enforce sum-to-zero constraints
vector[n_teams] attack;
vector[n_teams] defense;
vector[n_games] lambda1;
vector[n_games] lambda2;

for (t in 1:(n_teams-1)) {
attack[t] = attack_raw[t];
defense[t] = defense_raw[t];
}

attack[n_teams] = -sum(attack_raw);
defense[n_teams] = -sum(defense_raw);

//likelihood-systematic component
for (g in 1:n_games) {
lambda1[g]=exp(mu+home+attack[home_team[g]]+defense[away_team[g]]);
lambda2[g]=exp(mu+attack[away_team[g]]+defense[home_team[g]]);
}
}

model {

int sets_diff[n_games];

//Priors
target+=normal_lpdf(mu|0,100);
target+=normal_lpdf(home|0,100);
target+=normal_lpdf(attack|0,100);
target+=normal_lpdf(defense|0,100);

//likelihood-systematic component
for (g in 1:n_games) {
sets_diff[g]=home_sets[g]-away_sets[g];
target += skellam_without_lpmf(sets_diff[g] | lambda1[g],lambda2[g]);
}

}

and code 2

functions {

real skellam_lpmf(int k, real mu1, real mu2) {
real total;
real log_prob;

total = (- mu1 - mu2) + (log(mu1) - log(mu2)) * k / 2;
log_prob = total + log(modified_bessel_first_kind(k, 2 * sqrt(mu1*mu2)));

return log_prob;

}

}

data {
int <lower=1> n_games; //number of games 132
int <lower=1> n_teams; //number of teams 12
int <lower=1,upper=n_teams> home_team[n_games];
int <lower=1,upper=n_teams> away_team[n_games];
int <lower=0,upper=3> home_sets[n_games];//0-3 sets can have each team
int <lower=0,upper=3> away_sets[n_games];//0-3 sets can have each team
}

parameters {
real mu;
real home;
real attack_raw[n_teams - 1];
real defense_raw[n_teams - 1];
}

transformed parameters {
// Enforce sum-to-zero constraints
vector[n_teams] attack;
vector[n_teams] defense;
vector[n_games] lambda1;
vector[n_games] lambda2;

for (t in 1:(n_teams-1)) {
attack[t] = attack_raw[t];
defense[t] = defense_raw[t];
}

attack[n_teams] = -sum(attack_raw);
defense[n_teams] = -sum(defense_raw);

//likelihood-systematic component
for (g in 1:n_games) {
lambda1[g]=exp(mu+home+attack[home_team[g]]+defense[away_team[g]]);
lambda2[g]=exp(mu+attack[away_team[g]]+defense[home_team[g]]);
}
}

model {

vector[6] lpmfs;
real normalization;
int sets_diff[n_games];

//Priors
target+=normal_lpdf(mu|0,100);
target+=normal_lpdf(home|0,100);
target+=normal_lpdf(attack|0,100);
target+=normal_lpdf(defense|0,100);

//likelihood-systematic component
for (g in 1:n_games) {

sets_diff[g]=home_sets[g]-away_sets[g];
for(i in 1:3) {
  lpmfs[i] = skellam_lpmf(i - 4 | lambda1[g], lambda2[g]);
  lpmfs[i + 3] = skellam_lpmf(i | lambda1[g], lambda2[g]);

}
normalization = log_sum_exp(lpmfs);
target+=skellam_lpmf(sets_diff[g]|lambda1[g],lambda2[g])-normalization;
}
}

The priors should be on attack_raw and defense_raw - otherwise the induced prior structure is very weird.

Also I don’t think you need the normalising constant here, so you can try without the normalization computation.

Lastly, the \text{attack}_{n_{\text{teams}}} = - \sum\limits_{i = 1}^{n_{\text{teams}}}\text{attack}_{i} construct can be a little numerically finicky. You can use the ideas of West & Harrison and Knorr-Held for better ways to implement this constraint.
There are some examples of sports models that use this idea out there.

1 Like

I tried to set priors on attack_raw and defense_raw and again I have divergent transitions.

This normalizing constant is necessary to run the variation of skellam model that I want.

The sum to zero constraint I have implemented does not have probably the problem because when I fit the simple skellam with this constraint , I have a good sampling without divergent transitions

Hmm, maybe open this model in Shinystan and see if any of the parameters are going crazy.

Like is something is getting really large? Or maybe there are weird looking correlations in the posterior samples for mu/home/attack/defense?

Maybe look at plots of where the divergences are happening (like here: http://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html)?

I checked through shinystan my parameters and the diagnostics concerning these parameters were not good. A solution to this apart from increasing adapt delta (which I have tried without solving the problem) is to reparameterize the model from centered to non-centered parametrization. I saw some examples but I could not see how this can be done to my model.

P.S. However, I have to say that fitting a skellam model without value 0 in its support (with normalizing constand log(1-f(0)) had not any problem with sampling. Only this trancation extracts a divergent transitions error.

So getting rid of the normalization makes this model work? If that’s the case then the issue is probably some sort of numerical loss of precision thing happening in the normalization.

Yes if I define my log_prob_new equal to skellam_lpmf(k|mu1,mu2)-log(1-exp(skellam_lpmf(0|mu1,mu2))) (with log(1-exp(skellam_lpmf(0|mu1,mu2))) being the normalization) the sampling has not any problem with divergent transitions.