Cross-classified multiple membership models with brms

Hi all,

I’d like to model a cross-classified multiple membership model with brms. I have data from students at the beginning and the end of the academic year and want to assess the influence of teaching practices on these variables (either controlling for baseline variables at T1 or creating difference scores T3 - T1).

My data structure is the following: I have students that are nested in classrooms and teachers. Some pupils change classrooms over the course of the year (hence, they are multiple members of classrooms and teachers) and some classrooms change teachers over the course of the year or are taught by multiple teachers simultaneously across the year.

I’m at the very beginning of figuring out how such models are specified (starting with the data set up) and I think I now more or less understand what I would have to do if I would have only the classroom or teacher level. That is, assigning IDs for the first and second classroom as well as weights for these classrooms. But I am unsure what to do because I have two grouping factors. Do I also have to create teacher IDs 1 to k and their weights 1 to k? So basically having column headings like this: Tid1st Tid2nd Tid3rd Tid4th WT1st WT2nd WT3rd WT4th Cid1st Cid2nd Cid3rd Cid4th WC1st WC2nd WC3rd WC4th?

Many thanks,

Hi Pia,

welcome to Stan discourse! Please use the “Interfaces - brms” tag for brms questions otherwise they will likely be overlooked (I have changed it manually this time).

I would suggest you go for a model with the following structure

y ~ ... + (... | mm(<classes>, weights = ...) + (... | mm(<teachers>, weights = ...)

to account for both class and teacher effects via multimemembership terms.

1 Like

Hi Paul,

Thanks so much for letting me know about this. Since I’m new here, I still have to learn how things work (and I will probably have to ask quite a lot of questions since I am also new to brms and bayesian analysis more generally).

Thanks for your answer. So would it look like (… | mm (Tid1, Tid2, Tid3, Tid4, weights = WTid1, WTid2, WTid3, WTid4) and (… | mm (Cid1, Cid2, Cid3, Cid4, weights = WCid1, WCid2, WCid3, WCid4) or do I have to use weights = cbind(WCid1, WCid2, WCid3, WCid4) instead? And the weights are just columns in my data set? Do I have to specify somehow that class and teacher are cross-classified or is this recognised automatically?

I have a few (probably rather basic) questions if that’s OK.

  1. I have individual level variables measured at two time points and want to understand whether class-and teacher-level variables influence these individual variables. That is, I want to predict Time2 variables controlling for Time1 variables. Do I have to treat time as a level as well? Do I then need a long data format?
  2. And another question for the data set up. When I have teaching practices on the classroom level (because different classes can rate practices of the same teacher differently), do I already do the data manipulation myself so that every student has only a single score (so I already do all the weighting taking different teachers into account before loading the data) or does brms do that for me? Because I struggle to see how I have to set up the data file in a way that it is clear what practice score belongs to what teacher and some teachers can have multiple scores depending on the class that rated them.

Many thanks,


This is not proper R syntax. You need to use cbind.

No need to specifiy that your modle is cross-classified. That is implied by the data structure already.

You can include other variables, such as time in multimembership models as you usually do for “standard” multilevel models. Whether the data set needs to be “long” with respect to time depends on the particular model you want to fit that depends on time.

For each row, you need to have one single response. If one row in your data set is a single student, you need to weight the scores somehow before including them into the model in terms of a single value.

If you haven’t done so, reading the multimembership example in one of my brms paper (Advanced Bayesian Multile... The R Journal) may also help.

1 Like

Thank you for the clarification regarding the syntax!

I did read that paper before, which was very helpful but my (general) issue is that most papers include information on the model specification but not the data set-up. Maybe it’s too basic but as a relative newbie, it’s something I struggle with.

In terms of the time variable, where can I find information regarding what models need what set-up? Or is this a matter of trial and error? I have only two timepoints and want to use Time 1 to control for initial values of the dependent variables, so that would be a lagged-dependent model.

I’m not sure if I understand what you mean by having only one response in each row because each row indeed represents one student, so a row would contain many responses (one for each variable). So if I do the weighting before, all the multiple-membership model would do is to make sure that non-independence is being taken into account?


If you want to control your variable at time1 (let’s call it y1) in the response at time 2 (y2), then you would do that via something like

brm(y2 ~ y1 + ....)

I’m not sure if I understand what you mean by having only one response in each row because each row indeed represents one student, so a row would contain many responses (one for each variable). So if I do the weighting before, all the multiple-membership model would do is to make sure that non-independence is being taken into account?

Yes, at least this is my understanding of your data.

Thank you!


A very late follow up question based on a recent post here. LucC wrote

So does this mean that when I want to control for the initial score at T1 when predicting the same variable at T2, I should include a random intercept for the individuals in addition to the class and teacher groupings because the estimated standard errors would be wrong? And this is only for a random intercept, not random slopes?


Controlling for outcome T1 when predicting outcome at T2 by means of including T1 as predictor in the model is flawed, as it assumes that T1 is observed without measurement error, which (depending on the level of measurement error) will lead to regression towards the mean.

The most correct way - that I know of - is to model both T1 and T2 as the response variable, with a predictor T1 vs. T2, grouping variable(s), and interaction between T1/T2 and the grouping variable if you want to quantify potentially different responses over time between groups. Plus a random intercept for individuals. An alternative way would be to use T1 as a predictor, but explicitly modelling the measurement error/variation in T1 (not sure this is possible in brms, but for sure possible with custom Stan code). However, you are then decoupling the estimation of measurement error in T1 from the estimation of measurement error in T2, which is a waste of statistical power. So, the first approach described in this paragraph is parsimonious, in my experience.

See also my follow-up post to the one you are linking to: Different Intercept Terms in Frequentist and Bayesian Regression - #6 by LucC

Not exactly. Model T1 as an outcome, jsut as T2, and include a random intercept. For it to work, you need to organise the data in long format:

ID class teacher time response
1 A Johnson T1 4
1 A Johnson T3 5
2 A Johnson T1 2
2 A Johnson T3 4

response ~ time + (1 | ID) + (1 | class) + (1 | teacher)

But that will only tell you the average change in response over time (with correct standard error for the average change). If you want to know e.g. whether the changes are dependent on the teacher, you would model:

response ~ time * teacher + (1 | ID) + (1 | class)

Note that all of these models use the teacher and class information very simplistically, e.g. as “observed” at the moment of the response. But if a person changed class or teacher one day before a response, the model will not distinguish that change from one that happened earlier during the year. For that to work, if possible, you could quantify the amount of e.g. cumulative teacher-time experienced by an individual up to the point of the response.

You can add random slopes on top of all if this, if you like. It might get quite messy though if you are also modelling interactions. I would then probably recode the interactions into explicit dummy variables, and build custom code to add a random slope effect to all of those interactions at ones (i.e. as an uncentered, scaled random effect).

Thanks for your very detailed reply. It shows me that I still have a lot of reading to do!