Sensitivity of Stan in hierarchical (hyper-) priors

I try to fit a simple hierarchical linear regression example (implemented in BUGS) in RStan but I meet a few divergent transitions and the results are a little different than BUGS results. I think that the problem concerns the priors’ specification, especially in scale parameters (\sigma_{a}, \sigma_{y}). Apart from the half-normal distribution (used here) for the scale parameters, I also used the half-cauchy distribution without solving this problem.

So, the following question arise:

  1. Why is the prior specification so complicated for hierarchical models in RStan? In BUGS, a very weak informative prior distribution for the precision parameter (precision inverse of variance) gives us credible results instead of Rstan in which the prior specification of these parameters is more complicated and sensitive.

  2. Is this issue due to the different algorithm used in BUGS (Gibbs sampler) compared with Stan (Hamiltonian Monte Carlo)?

  3. If I want to spcify very weakly informative prior distributions in similar cases, which are the proper priors distributions for scale parameters? For this issue, I checked this link https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations and I tried to specify the proposed prior choices without avoiding the divergent transitions and the different results.

The model I fit in Stan is:

data{

  int<lower=0> K;
  int<lower=0> nsim;
    matrix[nsim, K] y;
}

parameters{
  real m;
  real a[nsim];
  real<lower=0> sigma_alpha;
  real<lower=0> sigma_y;
}

transformed parameters{
  matrix[nsim, K] mu;
  	for (i in 1:nsim){
	  	for (j in 1:K){
			  mu[i,j]=m+a[i];
	  	}
    }
}

model {
	  	target += normal_lpdf(m | 0, 10);
	  	target += normal_lpdf(sigma_y | 0,3);
  		target += normal_lpdf(sigma_alpha | 0,1);

	target += normal_lpdf(a | 0, sigma_alpha);


	for (i in 1:nsim){
		for (j in 1:K){
				target += normal_lpdf(y[i,j] | mu[i,j],sigma_y);
		}
  	}
		
}

while the corresponding code in BUGS is here: bugscode

. The data are given by the below R code:

####---------Data Specification--------####
y<-matrix(c(108, 98, 91, 94, 93, 96, 104, 99, 99, 97, 95, 98, 93, 97, 99, 96, 90, 100, 92, 95, 101, 89, 97, 97, 97,
100, 96, 95, 106, 100, 100, 98, 90, 99, 88, 98, 92, 92, 100, 101),nrow=20,ncol=2,byrow=T)

model_data<-list(y)

model_data$K<-2
model_data$nsim<-20
names(model_data)[1]<-“y”

Try rewriting this with a non-centered parameterization: https://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html

That’ll mean adding a new parameter:

real az[nsim];

with prior:

az ~ normal(0, 1);

And moving a to the transformed parameters block:

for(i in 1:nsim) {
  a[i] = az[i] * sigma_alpha;
}

(Edit: a_z -> az)


Also I don’t think you need mu.

I think you can just do:

for (j in 1:K) {
  target += normal_lpdf(y[,j] | a + m, sigma_y);
}

(Edit: I screwed this up the first time and included an unnecessary loop, fixed now)

1 Like