Arbitrary Number of Levels in Stan?

I’m working with hierarchical models, but I’m having trouble using a different number of levels. I can’t use a tool like rstanarm, otherwise that would make things a lot easier (FWIW I’m using cmdstanpy)

Suppose I’m fitting the following:

outcomes ~ (1 | group1) + (1 | group2)
link = binomial

From what I understand this is roughly written in stan as:

// model_level_2.stan

data {
   // ---- dimensions of things
   int<lower=0> N; 
   int<lower=1> L_group1;
   int<lower=1> L_group2;

   // ---- data
   int<lower=0> trials[N];        // initial trials 
   int<lower=0> outcomes[N];        // initial successes 
   int<lower=1, upper=L_group1> group1[N];
   int<lower=1, upper=L_group2> group2[N];
}
parameters {
  vector[L_group1] beta_group1;
  vector[L_group2] beta_group2;  
}
model {
  // set your priors on beta_ ...

  for (n in 1:N)
    outcomes[i] ~ binomial_logit( trials[n],  beta_group1[ group1[n] ] + beta_group2[ group2[n] ]) 
} 

My question is: What if I wanted to run this on an arbitrary number of groups? For example, if I wanted to fit the following:

outcomes ~ (1 | group1) + (1 | group2) + (1 | group3)
link = binomial

Would I have to write another stan file called model_level_3.stan where I’ve added terms for group3 and so on? That seems like a bad way to do it. Does stan support something like this, or do people have an approach for it?

Apologies if my models are wrong / bad. I’m open to feedback on that, but I feel like my question still stands 😇

Here’s the same model but with any number of groups. The only tricky part is that if you can’t have an array of beta_group vectors (because the Stan arrays must have uniform size). So beta_group is concatenation of all the groups and begin is the index where that group starts.

data {
   // ---- dimensions of things
   int<lower=0> N; 
   int<lower=1> G; // number of groups
   int<lower=1> L_group[G];

   // ---- data
   int<lower=0> trials[N];        // initial trials 
   int<lower=0> outcomes[N];        // initial successes 
   int<lower=1,upper=rep_array(L_group,N)> group[N,G];
}
transformed data {
  int begin[G];
  int B = 0;
  for (g in 1:G) {
    begin[g] = B;
    B = B + L_group[g];
  }
}
parameters {
  vector[B] beta_group;
}
model {
  // set your priors on beta_ ...

  for (n in 1:N) {
    vector[G] betas;
    for (g in 1:G)
      betas[g] = beta_group[begin[g] + group[n,g]];
    outcomes[n] ~ binomial_logit( trials[n],  sum(betas) ) ;
  }
}
6 Likes

Thanks! This did the trick, and I figured out how to set priors. This seems similar to handling ragged data.

This is great and I think this should be in the manual as it’s a topic that comes up often enough.

3 Likes

I give that a 👍. This solution works great and makes sense, but it’s not at all obvious. I was surprised nothing was in the manual. I’m also happy to help/learn more!

I also thought “maybe it is in the manual, but I just don’t know the word for it.” I had a similar experience with ragged arrays. Something something discoverability 🤔 … thankfully the forums are great!

1 Like