Question about Rhat statistics

Gen_Y · June 25, 2020, 12:29am

Hi, I’m new to stan. My stan model code is like:

data {
  int<lower=1> N; // Number of subjects
  int<lower=1> ms; // Number of measurements
  int subject[ms]; // subject ID
  real Y[ms];
  real X[ms];
}
parameters {
  real alpha;
  real<lower=0> sigma_e;
  real mu;
  real<lower=0> sigma;
  vector u[N];
}
transformed parameters {
  vector[ms] theta;
  // construct the  theta
  for (i in 1:ms) {
    theta[i] <- alpha*X[i]+u[subject[i]];
  }
}
model {
  // construct random effect
  u ~ normal(mu, sigma);
  for (i in 1:ms) {
    Y[i] ~ normal(theta[i],sigma_e);
  }
  // construct the priors
  alpha ~ normal(0,10);
  sigma_e ~ inv_gamma(0.01,0.01);
  mu ~ normal(0,10);
  sigma ~ inv_gamma(0.01,0.01);
}

Something like linear mixed effects model + measurement error.
I run for 4 chains and 2,000 iterations. The Rhat of alpha and u are not good, sometimes >2 (I tried to increase iterations but not helpful). And I checked for all estimated theta, the Rhat is ranging from 1.00 to 1.04, I assume it is acceptable.
So my question is, if I only need theta for subsequent analysis, is it safe to use theta, even though some other parameters’ Rhat not good?

jonah · June 25, 2020, 3:06am

Welcome to the Stan forums.

It’s not advisable to ignore those high Rhats, but I think we can probably help get those Rhat values down. Are you also getting warnings about divergences when fitting this?

A few tips about your Stan code (the first few aren’t related to Rhat but the ones towards the end might be):

This is fine but I would change it to

  vector[ms] Y;
  vector[ms] X;

so that you can do vector multiplication later in the code (see below).

The syntax for vectors (as opposed to arrays) is vector[N] u;. What you have should result in a parser error and it shouldn’t even let to run this model. Are you sure this code runs?

If you change Y and X to vectors as I recommended above then you can vectorize this and replace the loop with this:

vector[ms] theta = alpha * X + u[subject];

In the model block:

can be vectorized and will be faster as

Y ~ normal(theta, sigma_e);

Also, although these inv_gamma(0.01,0.01) used to be popular (maybe still are) we recommend against them (see e.g. section 4.3 of http://www.stat.columbia.edu/~gelman/research/published/taumain.pdf). You could try an exponential distribution or even a normal distribution (Stan will automatically make that a half-normal since you have <lower=0>).

This is known as the centered parameterization of a hierarchical model. This can be fine if your data is highly informative, but that is often not the case and it will help to use the non-centered parameterization (here’s the relevant section of the Stan user’s guide: Stan User’s Guide). In your case the non-centered parameterization (which is statistically equivalent but often easier to fit) would look like this (also adding in my recommendations from above):

data {
  int<lower=1> N; 
  int<lower=1> ms; 
  int subject[ms]; 
  vector[ms] Y;
  vector[ms] X;
}
parameters {
  real alpha;
  real<lower=0> sigma_e;
  real mu;
  real<lower=0> sigma;
  vector[N] u_raw;  // this will be N(0,1) and scaled and shifted to N(mu, sigma)
}
transformed parameters {
  vector[ms] u = mu + sigma * u_raw; // construct u by scaling and shifting
  vector[ms] theta = alpha * X + u[subject];
}
model {
  Y ~ normal(theta, sigma_e);
  u_raw ~ normal(0, 1);  // implies u ~ normal(mu, sigma)
  alpha ~ normal(0,10);
  mu ~ normal(0,10);

  // these aren't wrong but I recommend changing them (see comment above)
  sigma_e ~ inv_gamma(0.01,0.01); 
  sigma ~ inv_gamma(0.01,0.01);  
}

Gen_Y · June 25, 2020, 4:46am

Hi, jonah. Thanks so much for your detailed response.
Your revised code looks more harmonious.

Yes, I also got warnings about BFMI was low, and ESS is too low.

This model could run, with a lot warnings and long computation time.

Your recommended references on inverse gamma and non-centered parameterization are quite helpful.

Acutally my model is a little more complicated than this. If my random effects term u = (u0, u1, u2), should I define u like matrix[N,3] u_raw, or like vector[N] u_raw[3]? In this case, how should I define theta?
The formula is like

theta_i(t) = alpha * X_i + u_{i0} + u_{i1}*t + u_{i2}*t^2

(there is a real<lower=0> time[ms] in data for this model)
Now my code is like

for (i in 1:ms) {
  theta[i] <- alpha*X[i]+u[subject[i],1]+u[subject[i],2]*time[i]+u[subject[i],3]*time[i]^2;
}

jonah · June 25, 2020, 1:39pm

Assuming each of your u vectors has its own mu and sigma (is that true?), how about this?

data {
  int<lower=1> N; // Number of subjects
  int<lower=1> ms; // Number of measurements
  int subject[ms]; // subject ID
  vector[ms] Y;
  vector[ms] X;
  vector<lower=0>[ms] time; // declare as a vector so you can do multiplication in transformed params
}
parameters {
  real alpha;
  real<lower=0> sigma_e;
  vector[3] mu;
  vector<lower=0>[3] sigma;
  vector[N] u_raw[3];
}
transformed parameters {
  vector[ms] u[3]; 
  vector[ms] theta;
  
  // each u[j] is a vector, each mu[j] and sigma[j] is a scalar
  for (j in 1:3) {
    u[j] = mu[j] + sigma[j] * u_raw[j];
  }
  
  // using .* for element-wise multiplication
  theta = alpha * X + u[1][subject] + u[2][subject] .* time + u[3][subject] .* square(time);
}
model {
  Y ~ normal(theta, sigma_e); // data model
  for (j in 1:3) u_raw[j] ~ normal(0, 1);   
  
  alpha ~ normal(0,10);
  mu ~ normal(0,10);

  // these aren't wrong but I recommend changing them (see comment above)
  sigma_e ~ inv_gamma(0.01,0.01); 
  sigma ~ inv_gamma(0.01,0.01);

Gen_Y · June 25, 2020, 2:27pm

Yes, this is exactly what I want to ask. Thanks again.
BTW, I modified the earlier simple model as you suggested, by vectorization and non-centered parameterization (and half-normal prior for variance), now the Rhat for all the parameters is below 1.01, looks great. Computation time also reduced. Your post helped a lot.

jonah · June 25, 2020, 2:49pm

Great, glad that helped!

Topic		Replies	Views
Real vs Vector parameter(s): one is working and the other is not (as expected). Why? Modeling rstan , specification	6	519	August 3, 2022
Setting up priors correctly Modeling rstan , fitting-issues , specification	3	889	May 15, 2023
Rhat in R? RStan	2	1067	May 8, 2017
Sample size, divergence and Rhat in hierarchical model Modeling	2	1443	December 21, 2017
Rhat of one estimate always close to 1 Modeling fitting-issues	2	190	February 29, 2024

Question about Rhat statistics

Related topics