data {
int<lower=1> N; // Number of subjects
int<lower=1> ms; // Number of measurements
int subject[ms]; // subject ID
real Y[ms];
real X[ms];
}
parameters {
real alpha;
real<lower=0> sigma_e;
real mu;
real<lower=0> sigma;
vector u[N];
}
transformed parameters {
vector[ms] theta;
// construct the theta
for (i in 1:ms) {
theta[i] <- alpha*X[i]+u[subject[i]];
}
}
model {
// construct random effect
u ~ normal(mu, sigma);
for (i in 1:ms) {
Y[i] ~ normal(theta[i],sigma_e);
}
// construct the priors
alpha ~ normal(0,10);
sigma_e ~ inv_gamma(0.01,0.01);
mu ~ normal(0,10);
sigma ~ inv_gamma(0.01,0.01);
}

Something like linear mixed effects model + measurement error.
I run for 4 chains and 2,000 iterations. The Rhat of alpha and u are not good, sometimes >2 (I tried to increase iterations but not helpful). And I checked for all estimated theta, the Rhat is ranging from 1.00 to 1.04, I assume it is acceptable.
So my question is, if I only need theta for subsequent analysis, is it safe to use theta, even though some other parameters’ Rhat not good?

It’s not advisable to ignore those high Rhats, but I think we can probably help get those Rhat values down. Are you also getting warnings about divergences when fitting this?

A few tips about your Stan code (the first few aren’t related to Rhat but the ones towards the end might be):

This is fine but I would change it to

vector[ms] Y;
vector[ms] X;

so that you can do vector multiplication later in the code (see below).

The syntax for vectors (as opposed to arrays) is vector[N] u;. What you have should result in a parser error and it shouldn’t even let to run this model. Are you sure this code runs?

If you change Y and X to vectors as I recommended above then you can vectorize this and replace the loop with this:

vector[ms] theta = alpha * X + u[subject];

In the model block:

can be vectorized and will be faster as

Y ~ normal(theta, sigma_e);

Also, although these inv_gamma(0.01,0.01) used to be popular (maybe still are) we recommend against them (see e.g. section 4.3 of http://www.stat.columbia.edu/~gelman/research/published/taumain.pdf). You could try an exponential distribution or even a normal distribution (Stan will automatically make that a half-normal since you have <lower=0>).

This is known as the centered parameterization of a hierarchical model. This can be fine if your data is highly informative, but that is often not the case and it will help to use the non-centered parameterization (here’s the relevant section of the Stan user’s guide: Stan User’s Guide). In your case the non-centered parameterization (which is statistically equivalent but often easier to fit) would look like this (also adding in my recommendations from above):

data {
int<lower=1> N;
int<lower=1> ms;
int subject[ms];
vector[ms] Y;
vector[ms] X;
}
parameters {
real alpha;
real<lower=0> sigma_e;
real mu;
real<lower=0> sigma;
vector[N] u_raw; // this will be N(0,1) and scaled and shifted to N(mu, sigma)
}
transformed parameters {
vector[ms] u = mu + sigma * u_raw; // construct u by scaling and shifting
vector[ms] theta = alpha * X + u[subject];
}
model {
Y ~ normal(theta, sigma_e);
u_raw ~ normal(0, 1); // implies u ~ normal(mu, sigma)
alpha ~ normal(0,10);
mu ~ normal(0,10);
// these aren't wrong but I recommend changing them (see comment above)
sigma_e ~ inv_gamma(0.01,0.01);
sigma ~ inv_gamma(0.01,0.01);
}

Hi, jonah. Thanks so much for your detailed response.
Your revised code looks more harmonious.

Yes, I also got warnings about BFMI was low, and ESS is too low.

This model could run, with a lot warnings and long computation time.

Your recommended references on inverse gamma and non-centered parameterization are quite helpful.

Acutally my model is a little more complicated than this. If my random effects term u = (u0, u1, u2), should I define u like matrix[N,3] u_raw, or like vector[N] u_raw[3]? In this case, how should I define theta?
The formula is like

Assuming each of your u vectors has its own mu and sigma (is that true?), how about this?

data {
int<lower=1> N; // Number of subjects
int<lower=1> ms; // Number of measurements
int subject[ms]; // subject ID
vector[ms] Y;
vector[ms] X;
vector<lower=0>[ms] time; // declare as a vector so you can do multiplication in transformed params
}
parameters {
real alpha;
real<lower=0> sigma_e;
vector[3] mu;
vector<lower=0>[3] sigma;
vector[N] u_raw[3];
}
transformed parameters {
vector[ms] u[3];
vector[ms] theta;
// each u[j] is a vector, each mu[j] and sigma[j] is a scalar
for (j in 1:3) {
u[j] = mu[j] + sigma[j] * u_raw[j];
}
// using .* for element-wise multiplication
theta = alpha * X + u[1][subject] + u[2][subject] .* time + u[3][subject] .* square(time);
}
model {
Y ~ normal(theta, sigma_e); // data model
for (j in 1:3) u_raw[j] ~ normal(0, 1);
alpha ~ normal(0,10);
mu ~ normal(0,10);
// these aren't wrong but I recommend changing them (see comment above)
sigma_e ~ inv_gamma(0.01,0.01);
sigma ~ inv_gamma(0.01,0.01);

Yes, this is exactly what I want to ask. Thanks again.
BTW, I modified the earlier simple model as you suggested, by vectorization and non-centered parameterization (and half-normal prior for variance), now the Rhat for all the parameters is below 1.01, looks great. Computation time also reduced. Your post helped a lot.