Example hierarchical models with 3+ levels of hierarchy?

mike-lawrence · October 1, 2020, 11:05am

Are there any example hierarchical models anywhere with 3 or more levels of hierarchy? For example, one score from each of multiple students, each in precisely one of multiple schools, each in precisely one of multiple districts?

Embarrassing as it is, I’m having trouble wrapping my head around implementing something like that and I’d like to see how some canonical examples approach it to help get past what I’m sure is going to be a head-smacking block.

cgoold · October 1, 2020, 11:34am

I just coded up a quick example using brms. Does this help you?


# number of students/scores
N_g <- 50
# number of schools
N_j <- 20
# number of districts
N_k <- 10
# total cases
N <- N_g * N_j * N_k

# intercept (e.g. IQ scores)
alpha <- 100
student_vec <- 1:N
school_vec <- rep(1:N_j, each=N_g, times = N_k)
district_vec <- rep(1:N_k, each=N_j*N_g)
# random intercept for schools
r_j <- rnorm(N_j, 0, 10)
# random intercepts for districts
r_k <- rnorm(N_k, 0, 5)
# linear predictor
mu <- alpha + r_j[school_vec] + r_k[district_vec]
# noise
sigma <- 5

# simulate data
y <- rnorm(N, mu, sigma)

# create data frame
d <- data.frame(
  y=y, 
  school = school_vec, 
  district = district_vec
)

# fit the model (can take a few minutes depending on N)
fit <- brms::brm(
  y ~ 1 + (1 | school) + (1 | district), 
  data=d, 
  cores=4
)

summary(fit)

And the output from a representative run:

Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ 1 + (1 | school) + (1 | district) 
   Data: d (Number of observations: 10000) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Group-Level Effects: 
~district (Number of levels: 10) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     3.72      1.06     2.29     6.17        622 1.01

~school (Number of levels: 20) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)    11.44      1.91     8.25    15.86        512 1.00

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept   100.73      2.81    95.08   106.36        369 1.00

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma     4.99      0.03     4.93     5.06       1389 1.00

mike-lawrence · October 1, 2020, 12:00pm

Thanks!!

Where the SD among students within school i is \sigma_i, I’m hoping to have partially-pool the values as log(\sigma_i) \sim N(\mu_{schools\sigma},\sigma_{schools\sigma})

Then, where the SD among schools within district j is \sigma_j, I’m hoping to have partially-pool the values as log(\sigma_j) \sim N(\mu_{districts\sigma},\sigma_{districts\sigma})

Does brms permit that kind of complexity? (I’m still not really familiar with it; more a raw-Stan person)

cgoold · October 1, 2020, 1:30pm

So you want heteroskedastic variances, varying by school and district?

brms can handle a variety of distribution-specific parameter formulas. For a model with a Gaussian likelihood, you could do something like:

fit <- brms::brm(
  brms::bf(
    y ~ 1 + (1 | school) + (1 | district), 
    sigma ~ 1 + (1 | school) + (1 | district),
  ),
  data=d
)

although this is a much more difficult beast to fit, in my experience. I have a two-level hierarchical ordinal probit model with heteroskedastic variances here, if your interested: https://github.com/ConorGoold/GooldNewberry_modelling_shelter_dog_behaviour/blob/master/Stan_full_model.stan

mike-lawrence · October 1, 2020, 1:36pm

Wow, that’s remarkably simple! (Specification-wise; I take your point that the geometry may reflect a difficult space to sample)

cgoold · October 1, 2020, 1:40pm

Yes, brms is excellent! Although I’m not actually a frequent user, tending towards raw Stan too.

I would probably code up those types of models in raw Stan anyway, because they will likely need careful tuning when it comes to fitting on real data sets with other complexities.

mike-lawrence · October 7, 2020, 2:22pm

Hm, on actually inspecting the generated Stan code, I actually don’t think this gets at the kind of nested hierarchy with heterogeneity I was seeking to implement. The specification:

brms::bf(
          y ~ 1 + (1 | school) + (1 | district)
   ,  sigma ~ 1 + (1 | school) + (1 | district)
)

Allows for school and district random effects on the magnitude of variability among students, but still expresses that there is a single SD that conveys how schools vary from one another in their mean in each district. The idea I was trying to implement would have a separate SD per district to capture variability across districts in how schools’ means vary. Ditto separate SD per district to capture variability across districts in how schools’ SDs vary.

To be concrete, here’s some pseudocode of the generative process I’m thinking of (easiest to parse if you start at the innermost for loop):

for(this_district in districts){

	this_district_mean_school_mean = rnorm(1,mean_district_mean_school_mean,sd_district_mean_school_mean)
	this_district_sd_school_mean = exp(rnorm(1,mean_district_logsd_school_mean,sd_district_logsd_school_mean))

	this_district_mean_school_logsd = rnorm(1,mean_district_mean_school_logsd,sd_district_mean_school_logsd)
	this_district_sd_school_logsd = rnorm(1,mean_district_sd_school_logsd,sd_district_sd_school_logsd)

	for(this_school in schools_in_this_district){

		this_school_mean = rnorm(1,this_district_mean_school_mean,this_district_sd_school_mean)
		this_school_sd = exp(rnorm(1,this_district_mean_school_logsd,this_district_sd_school_logsd))

		for(this_student in students_in_this_school){
			score = rnorm(1,this_school_mean,this_school_sd)
		}
	}
}

Guido_Biele · October 7, 2020, 5:51pm

Do you mean something like this:


data {
   int N_districts;
   int N_schools;
   int N_students;
   int school_in_district_idx[N_schools];
   int student_in_school_idx[N_students];
   vector[N_students] score;
}

parameters {
  real top_mu;
  real<lower=0> top_sigma;

  vector[N_districts] district_mu;
  vector<lower=0>[N_districts] district_sigma;
 
  vector[N_schools] school_mu;
  vector<lower=0>[N_schools] school_sigma;

  vector[N_students] student_mu;
  real<lower=0> sigma;
}

model {
  top_mu ~ std_normal()
  top_sigma ~ std_normal()
  district_mu ~ normal(top_mu, top_sigma);

  for (s in 1:N_schools) {
     int idx = school_in_district_idx[s];
     school_mu[s] ~ normal(district_mu[idx],district_sigma[idx]);
  }

  for (k in 1:N_students) {
    int idx = student_in_school_idx[k];
    student_mu[k] ~ normal(school_mu[idx],school_sigma[idx]);
  }
score ~ normal(student_mu, sigma);
}

mike-lawrence · October 7, 2020, 6:01pm

Yes, though I think you’ve left off the specification of partial pooling for district_sigma.

I’m coding this in raw Stan myself right now (with some more complexity in design plus my reduced-computation trick), so I’ll post back when I’m done. I am nonetheless still curious if brms can do it out-of-the-box.

mike-lawrence · October 13, 2020, 12:42pm

FYI, I’ve been playing with this a bunch and have discovered that it really matters whether you employ the centered or non-centered parameterization, and independently so for each level of the hierarchy. For example, I’m finding that with the simulation space I’m exploring right now, the schools have to be non-centered and the students have to be centered.

I’ll be posting a mini case-study on this shortly.

Guido_Biele · October 13, 2020, 12:58pm

@betanalpha made a nice figure giving some guidance when centered/non-centered parameterization work well.

mike-lawrence · October 13, 2020, 1:01pm

Yeah, it’s on my to-do to sit down and put some serious thought into how geometry of the 3-level case I’m exploring relates to that. I think the idea will be that you can have one layer of the hierarchy for which the data strongly inform and one layer where it doesn’t?

Guido_Biele · October 13, 2020, 2:01pm

I have not thought deeply about this, my only intuitions are that the more fewer groups one describes with a higher level prior, the less informed the estimation of that prior is.

On the other hand, if this higher level prior has itself an (even …) higher prior, is estimation could also be informed by other groups on the same level.

I don’t have a strong intuition and would probably resort to simulations to figure this out and sharpen/clarify my intuition…

Stephen_Martin · November 5, 2021, 11:05pm

You can do this with MELSMs, of course; unfortunately, brms does not (currently) support adding scale models to RE-SD parameters, and though my package (LMMELSM) does support adding scale models to RE-SD parameters, it does not (currently) support more than one random factor (e.g., no crossed or nested effects).

But you may want to look at the MELSM literature; I know I’ve seen 3-levels MELSMs. In the end, it’s really not conceptually harder than a 2-level MELSM, and nesting is not really different than just including a A:B interaction of grouping terms. You basically implement it all the same way.

Edit: Also for centering/non-centering, you can write an _lp function that, for each factor, transforms each level based on whether it should be a ncp or cp transform, and increment the target accordingly. The function just needs raw RE parameters, and a specification that indicates whether each level of each factor should be treated as a ncp or cp RE.

Topic		Replies	Views
Help in designing the best model for my data Modeling specification , hierarchical-model	2	385	September 15, 2023
Multilevel model with responses on different levels? brms specification , multivariate-normal , brms	1	682	November 4, 2021
Does this hierarchical brms formula correctly link covariates across the multiple nested levels? brms hierarchical-model , brms	4	841	May 30, 2025
Fit Bayesian hierarchical model with three levels using brms Modeling brms	3	874	March 24, 2021
Multiple correlation structures in hierarchical models using brms brms	2	1865	March 5, 2020

Example hierarchical models with 3+ levels of hierarchy?

Related topics