The example you link won’t work as written. Stan programs require support everywhere parameters satisfy their declared constraints. But sigma is declared without lower=0 and the other parameters are declared without constraints and given fixed uniform priors. I would also recommend at least weakly informative priors, whereas it looks like this example is following the old BUGS pattern of wide priors because they didn’t really want to be Bayesian.

What you sent is not well formed Stan code. There’s a close brace after the sampling statement with no open brace. You need to set this up to vectorize, so this distinction is important. What you want is this:

for (j in 1:n_county) {
YY[j] = [a_county[j], b_floor[j]]';
}
YY ~ multi_normal(MU , quad_form_diag(Rho , sigma_county));

I only say this so pedantically, because it’s a very common bug to see this, which is suggested by your stray bracket and indentation.

for (j in 1:n_county) {
YY[j] = [a_county[j], b_floor[j]]';
YY ~ multi_normal(MU , quad_form_diag(Rho , sigma_county));
}

I’m not sure what that means. You are using the index variable to pull out the appropriate intercept here. The index itself is just an integer so you presumably don’t want to include that in the location. Can you write down in math what it is that you want? The Stan code you give is doing roughly what the math wrote with the location parameter of the normal.

P.S. I see you’re using \sigma^2 in your notation. Stan is set up to take the scale of the normal as the argument, not the variance. Stan’s multi-normal takes covariance.

P.P.S. It’s more efficient to parameterize with Cholesky factors. Then the solve’s quadratic rather than cubic.

You’re right, I had to make few changes to run it.

I wanted to model an equivalent of the below lmer code where there is no intercept and each floor level has each own estimate (like using an index variable I guess?)

lmer(log_radon ~ -1 + floor + (1 + floor | county),
data = radon %>% mutate(floor = as.factor(floor)))

I’m not sure how to do that, is this related to the central/non-central parameterisation? Is the following example applies the parameterisation with Cholesky factors?

transformed parameters {
// this transform random effects so that they have the correlation
// matrix specified by the correlation matrix above
matrix[2,J] u;
u = diag_pre_multiply(sigma_u, L_u) * z_u;
}

With

//likelihood
for (i in 1:N){
mu = beta[1] + u[1,Subject[i]] + (beta[2] + u[2,Subject[i]])*Days[i];
RT[i] ~ normal(mu, sigma_e);
}