Help with multi-level models and Rhat / correlated posterior distriubtions


first of all, apologies if this has all been discussed elsewhere. If somebody could point me to a nice (accessible?) tutorial offering workable advice on how to deal with these issues, that would be much appreciated…

The issue:

I have a fairly complex (non-linear) multi-level model and I have a few Rhats > 1.01. Overall, the fit is pretty good, but I am assuming it would be nice to tighten these up a little before publishing.

From looking at the pairs plot + some intuition, the issue is that my fixed effects are generally negatively correlated with my random effects. For example, in the plot below, you can see some of the parameters in a pairs plot. We can see that the z parameter is negatively correlated with the corresponding fixed effect param.

This makes sense to me. In the model’s parameter blocks, I have:

parameters {
  // These are all the parameters we want to fit to the data

  // fixed effects

  /* in order to allow for correlations between the
  variables, these are all stored in a list
  these include bA, bS (stick weight), and the two spatial 
  sigmas, along with the floor (chance of selectin an 
  item at random)
  array[K] real bA; // weights for class A compared to B  
  array[K] real b_stick; // stick-switch rates 
  array[K] real<lower = 0> rho_delta; // distance tuning
  array[K] real rho_psi; // direction tuning

  // random effects

  array[K] vector[L] zA; // weights for class A compared to B  
  array[K] vector[L] z_stick; // stick-switch rates 
  array[K] vector[L] z_delta; // distance tuning
  array[K] vector[L] z_psi; // direction tuning

  real<lower = 0> sig_a;
  real<lower = 0> sig_stick;
  real<lower = 0> sig_delta;
  real<lower = 0> sig_psi;

transformed parameters {

  // combine fixed and random effects
  array[K] vector[L] uA; 
  array[K] vector[L] u_stick; 
  array[K] vector[L] u_delta; 
  array[K] vector[L] u_psi; 

  for (kk in 1:K) {
    uA[kk] = bA[kk] + sig_a*zA[kk];
    u_stick[kk] = b_stick[kk] + sig_stick*z_stick[kk];
    u_delta[kk] = rho_delta[kk] + sig_delta*z_delta[kk];
    u_psi[kk]   = rho_psi[kk] + sig_psi*z_psi[kk];

All of the `u_’ parameters have Rhat < 1.01. So great, the uncertainty in the b and u parameters cancels out to some extent?

I’m sure this must happen quite often with multi-level models.


Are there any tricks to improve this?

At one point, I was using an LJK-prior and modeling the full covariance matrix. I removed this in the hope that a simplified model structure would decrease the time taken to fit the model!

Should I add it back in?


This is also a modeling question. Sometimes the additional structure can help speed things up, but usually it will just make it slower. It will usually lead to a better answer if there is correlation and enough data to estimate it.

Sometimes what you’ll see is higher R-hat values if you have lots of random effects. I’ve always assumed this was because there was less data to inform each of the values. You can just keep running longer.

One way to get around this problem is to center or soft-center the random effects. Generally, models with multiple random effects are not identified unless one of the values for each effect is pinned to 0, or if the overall vector is constrained to sum to 0 (either way, one fewer degrees of freedom).

That can happen, so that when you run longer and get R-hat < 1.01, nothing changes in your estimates. I don’t know how to check that without just running it, though.

With the new(ish) affine transform, you can code up non-centered parameterizations without the transformed parameter block. But your varying scales for each row of u_stick etc. make this tricky to do elegantly. It’s also not the cause of any slowdown.

1 Like