Simple intercept only model with poor chain mixing

Hi everyone!

I have been working with @lizzieinvancouver to understand why simple intercept-only models with two grouping factors would result in poorly mixing chains and low neffective values.

This issue came to our attention when working on a more complex model, but in running a similar model on a much smaller dataset we found the same thing.

The data we used for this example was of scores from the 1932 Olympic Games in Lake Placid. OlympicGames_1932.csv (1.8 KB)

The stan code for our intercept-only model is the following:

data {
  int<lower=0> N;//number of observation
  int<lower=0> n_pair;//number of pairs
  int<lower=0> n_judge;//number of judges
  int<lower=1,upper = n_pair> pair[N];//vector of pairs
  int<lower=1,upper = n_judge> judge[N];//vector of judges
  vector[N] y;

parameters {
  vector[n_pair] gamma;
  vector[n_judge] delta;
  real<lower=0> mu;
  real<lower=0> sigma_gamma;
  real<lower=0> sigma_delta;
  real<lower=0> sigma_y;

transformed parameters {
  vector[N] y_hat;

  for (i in 1:N)
    y_hat[i] = mu + gamma[pair[i]] + delta[judge[i]];

model {
  mu ~ normal(1,10);
  sigma_gamma ~ normal(1,10);
  sigma_delta ~ normal(1,10);
  sigma_y ~ normal(1,1);

  gamma ~ normal(0, sigma_gamma);
  delta ~ normal(0, sigma_delta);
  y ~ normal(y_hat, sigma_y);
generated quantities {
  real y_rep[N];
  for(n in 1:N)
    y_rep[n] = normal_rng(y_hat[n],sigma_y);

The R code to run the model is:

prog <- subset(dat, criterion == "Program")

prog_data <- list(y = prog$score, 
                   N = dim(prog)[1], 
                   pair = prog$pair, 
                   judge = prog$judge, 
                   n_pair = length(unique(prog$pair)),
                   n_judge = length(unique(prog$judge))) 

mdl<- stan('stan/OlympicGames_1932mdl.stan',
                      data = prog_data)

Even with this simple example, we observe poorly miking chains and neffective values as low as 465:

Any thoughts on what processes are underlying this result and how they might be addressed would be much appreciated.



You don’t seem to have a prior on mu.

And on a code/compute efficiency note, you can do:

vector[N] y_hat = mu + gamma[pair] + delta[judge];

Also, take a look at the bayesplot package for a variety of diagnostic plots you might find useful. I’d be keen to see a pairs plot for example.

Thanks @mike-lawrence for taking a look at the model.

I have edited the above code, adding a prior for mu as you suggested. The chains are still poorly mixed and some n_eff are still low. The pairs plot looks like this:

I also ran the model using rstanarm and had similar results.

mdl.arm <- stan_lmer(score ~ 1 + (1 | judge) + (1 |pair), data = prog)

Thanks for your help!

1 Like

i’m away from my computer, so I can’t check the example data you linked; how many judges and pairs are there? If few, I think a fully-non-centered parameterization of the judge and pair parameters might be best, which is happily easily achieved these days through use of the multiplier declaration:

parameters {
  real<lower=0> sigma_gamma;
  real<lower=0> sigma_delta;
  vector<multiplier=sigma_gamma>[n_pair] gamma ;
  vector<multiplier=sigma_delta>[n_judge] delta ;

Note that your first model would be considered partially non-centered because it is non-centered with regards to the intercept mu but not the variance parameters. Which actually highlights that the term non-centered is not a great one 😛

Also, I find your priors a little odd; specifically you seem to be centring them on a mean of one (that’s what normal(1,…) does). Is this what you intend?

1 Like

Thanks @mike-lawrence for your response.

In this example the number of judges and pairs is pretty small. The non-centered parameterization did improve the low neffective in this dataset. I was not familiar with the multiplier declaration, thanks for brining it to my attention.