Cross-classified multiple membership models with brms

#1

Hi all,

I’d like to model a cross-classified multiple membership model with brms. I have data from students at the beginning and the end of the academic year and want to assess the influence of teaching practices on these variables (either controlling for baseline variables at T1 or creating difference scores T3 - T1).

My data structure is the following: I have students that are nested in classrooms and teachers. Some pupils change classrooms over the course of the year (hence, they are multiple members of classrooms and teachers) and some classrooms change teachers over the course of the year or are taught by multiple teachers simultaneously across the year.

I’m at the very beginning of figuring out how such models are specified (starting with the data set up) and I think I now more or less understand what I would have to do if I would have only the classroom or teacher level. That is, assigning IDs for the first and second classroom as well as weights for these classrooms. But I am unsure what to do because I have two grouping factors. Do I also have to create teacher IDs 1 to k and their weights 1 to k? So basically having column headings like this: Tid1st Tid2nd Tid3rd Tid4th WT1st WT2nd WT3rd WT4th Cid1st Cid2nd Cid3rd Cid4th WC1st WC2nd WC3rd WC4th?

Many thanks,
Pia

#2

Hi Pia,

welcome to Stan discourse! Please use the “Interfaces - brms” tag for brms questions otherwise they will likely be overlooked (I have changed it manually this time).

I would suggest you go for a model with the following structure

y ~ ... + (... | mm(<classes>, weights = ...) + (... | mm(<teachers>, weights = ...)

to account for both class and teacher effects via multimemembership terms.

1 Like
#3

Hi Paul,

Thanks so much for letting me know about this. Since I’m new here, I still have to learn how things work (and I will probably have to ask quite a lot of questions since I am also new to brms and bayesian analysis more generally).

Thanks for your answer. So would it look like (… | mm (Tid1, Tid2, Tid3, Tid4, weights = WTid1, WTid2, WTid3, WTid4) and (… | mm (Cid1, Cid2, Cid3, Cid4, weights = WCid1, WCid2, WCid3, WCid4) or do I have to use weights = cbind(WCid1, WCid2, WCid3, WCid4) instead? And the weights are just columns in my data set? Do I have to specify somehow that class and teacher are cross-classified or is this recognised automatically?

I have a few (probably rather basic) questions if that’s OK.

  1. I have individual level variables measured at two time points and want to understand whether class-and teacher-level variables influence these individual variables. That is, I want to predict Time2 variables controlling for Time1 variables. Do I have to treat time as a level as well? Do I then need a long data format?
  2. And another question for the data set up. When I have teaching practices on the classroom level (because different classes can rate practices of the same teacher differently), do I already do the data manipulation myself so that every student has only a single score (so I already do all the weighting taking different teachers into account before loading the data) or does brms do that for me? Because I struggle to see how I have to set up the data file in a way that it is clear what practice score belongs to what teacher and some teachers can have multiple scores depending on the class that rated them.

Many thanks,

Pia

#4

This is not proper R syntax. You need to use cbind.

No need to specifiy that your modle is cross-classified. That is implied by the data structure already.

You can include other variables, such as time in multimembership models as you usually do for “standard” multilevel models. Whether the data set needs to be “long” with respect to time depends on the particular model you want to fit that depends on time.

For each row, you need to have one single response. If one row in your data set is a single student, you need to weight the scores somehow before including them into the model in terms of a single value.

If you haven’t done so, reading the multimembership example in one of my brms paper (https://journal.r-project.org/archive/2018/RJ-2018-017/index.html) may also help.

1 Like
#5

Thank you for the clarification regarding the syntax!

I did read that paper before, which was very helpful but my (general) issue is that most papers include information on the model specification but not the data set-up. Maybe it’s too basic but as a relative newbie, it’s something I struggle with.

In terms of the time variable, where can I find information regarding what models need what set-up? Or is this a matter of trial and error? I have only two timepoints and want to use Time 1 to control for initial values of the dependent variables, so that would be a lagged-dependent model.

I’m not sure if I understand what you mean by having only one response in each row because each row indeed represents one student, so a row would contain many responses (one for each variable). So if I do the weighting before, all the multiple-membership model would do is to make sure that non-independence is being taken into account?

Thanks

#6

If you want to control your variable at time1 (let’s call it y1) in the response at time 2 (y2), then you would do that via something like

brm(y2 ~ y1 + ....)

I’m not sure if I understand what you mean by having only one response in each row because each row indeed represents one student, so a row would contain many responses (one for each variable). So if I do the weighting before, all the multiple-membership model would do is to make sure that non-independence is being taken into account?

Yes, at least this is my understanding of your data.

#7

Thank you!