Simple question on hierarchical non-centered parameterization

Hello everybody,
This is my first Stan question—I’m still learning, so please excuse me if the question is basic: this is about non-centered parameterization in a hierarchical setting. I want to know if my modelling was done correctly.

I was wondering about an extension of the 8-school example. Suppose we have three schools instead with classes nested in schools, and we want to estimate treatment effects per class given this knowledge. So for example, let the following be our data:

schools_df <- data.frame(
  school = c("A", "A", "A", "B", "B", "C", "C", "C"),
  classes = c(1, 2, 3, 1, 2, 1, 2, 3), 
  y = c(28,  8, -3,  7, -1,  1, 18, 12),
  sigma = c(15, 10, 16, 11,  9, 11, 10, 18))

I set up the following Stan program and would like to know if I’ve approached it correctly:

data {
  int           N;        // Number of observations
  int<lower=0>  J;        // Number of schools (now 3)
  int<lower=0>  g[N];     // Group assignment
  real          y[N];     // Estimated treatment effects
  real<lower=0> sigma[N]; // s.e. of effect estimates 
}
parameters {
  vector[J] mu;          // By-school mean
  real<lower=0> tau[J];  // By-school s.d.
  real eta[N];           // Individual class errors
}
transformed parameters {
  real theta[N];     // School effects
  for (n in 1:N)
    theta[g[n]] = mu[g[n]] + tau[g[n]] * eta[n];
}
model {
  target += normal_lpdf(eta | 0, 1);
  target += normal_lpdf(y | theta, sigma);
}

Also, in the case that we would want to estimate school effects pooled within school across classes, the following should be implemented—right?.

data {
  int           N;        // Number of observations
  int<lower=0>  J;        // Number of schools (now 3)
  int<lower=0>  g[N];     // Group assignment
  real          y[N];     // Estimated treatment effects
  real<lower=0> sigma[N]; // s.e. of effect estimates 
}
parameters {
  real mu;           // Population mean
  real<lower=0> tau; // Population s.d.
  real eta[J];       // school-level errors
}
transformed parameters {
  real theta[J];     // School effects
  for (j in 1:J)
    theta[j] = mu + tau * eta[j];
}
model {
  target += normal_lpdf(eta | 0, 1);
  for(n in 1:N){
    y[n] ~ normal(theta[g[n]], sigma[n]);
  }
}

(I’m not too sure how to do the target+= expression for the y here.)

The original Stan model that I used as a baseline, the non-centered parameterization when we had 8 schools (so one per observation) was the following:

data {
  int<lower=0> J;         // Number of schools 
  real y[J];              // Estimated treatment effects
  real<lower=0> sigma[J]; // s.e. of effect estimates 
}
parameters {
  real mu;           // Population mean
  real<lower=0> tau; // Population s.d.
  real eta[J];       // school-level errors
}
transformed parameters {
  real theta[J];     // School effects
  for (j in 1:J)
    theta[j] = mu + tau * eta[j];
}
model {
  target += normal_lpdf(eta | 0, 1);
  target += normal_lpdf(y | theta, sigma);
}

I think I’m doing it right, but I’m still confused by multiple things, so I’d like to clear up my confusions as I go.
Thank you so much.

The original Stan model that I used as a baseline

It’s missing priors on mu and tau, which for a data set this small I think you’d definitely want. (there’s this: Prior Choice Recommendations · stan-dev/stan Wiki · GitHub)

I’m not too sure how to do the target+= expression for the y here.

You don’t have to use the target += to get centered parameterizations. That’s more for getting Jacobian adjustments right or mixture models or other weirdnesses.

Also, in the case that we would want to estimate school effects pooled within school across classes

Looks fine to me (still would want priors on everything)

I set up the following Stan program and would like to know if I’ve approached it correctly

Now there is a different hierarchy for each different school, so across schools things aren’t being grouped together. Part of the original motivation for the hierarchical modeling was bringing these estimates together. It feels like there’d then be another hierarchy on top of this that groups all the schools, but I dunno if that’s a good idea on this dataset. There’s just not much data.

Have you seen the election hierarchical model in BDA3? That one is a little more complex.

Also have you seen the Betancourt divergent transitions case study Diagnosing Biased Inference with Divergences? It’s not really talking about the modeling interpretation so much as why the centered parameterization matters. It’s worth looking at if you haven’t read it already.

Hi Ben,

Thank you so much for your reply—indeed, I did add priors on mu and tau for the script that I was actually running. And you are right, originally in a larger data there would probably be a hierarchy on top of schools such as district.

Did you mean election88.stan?

It’s Chapter 15, page 383 in BDA3. I haven’t actually looked at the actual Stan model before. Just read about it.

But it is about election data collected from 1948 to 1988, so that matches. I just remember most of the variables were about spatial things. So like Southern states were pooled together, etc. This looks like a more complicated model, but along the same lines.