Cross-classified logit model with multi-level variable model specification with brms

I’m pretty new to brms. I’m attempting to write a model with crosses random effects and with one of those random effects also nested within another hierarchical category. I think I’m writing basic underlying cross-classified model correctly, but I’m not entirely sure how to add in the multi-level component into the brms model. So I’d greatly appreciate any suggestions for specifying the more complex model.

Some background about the data structrure:
The dataset consists of download actions (did or did not download) for a large number of subjects and preprints. Preprints are categorized by whether they have a COI or not (non-manupulated fixed factor, coi_cond). Subjects are randomly assigned to 1 of 2 visibility conditions which alters whether they see a preprints COI information (vis_cond variable). Subjects can have download action data for >= 1 preprint. Across the preprint they have download data for, subjects stay in the same vis_cond, but may naturally encounter preprints in different coi_cond categories. Preprints stay in the same coi_cond regardless of which subjects have download data for them, but by chance may ended up having data in both vis_conds because of the subjects who view them. I’m interested in estimating the interaction between the coi_cond and vis_cond variables and their simple effects.

I think I have the code for this ‘basic’ model with:

single_level_model ← brm(
download ~ coi_cond + vis_cond + coi_cond*vis_cond + (1|subj_id) + (1|pp_id) + (coi_cond|subj_id) + (vis_cond|pp_id),
family = ‘bernoulli’,
seed = 5)

The part that I’m having trouble with is how to add in a level of nesting when preprints can belong to only 1 service but subjects can potentially have multiple membership in services. Each preprint belongs to one of ~20 different preprint services. Each preprint belongs to only 1 service, but subjects may have download data associated with multiple different services. There is reason to believe that the effects of coi_cond and vis_cond may not be the same in each service, and so I’d like to add this service level in, get an estimate of the degree to which the vis_cond*coi_cond interation and it’s simple effects vary across service, as well as estimates of these effects within each service. Does anyone have suggestions on how to incorprotate the service information into the model?

Sorry, don’t have don’t have time to respond now, but maybe @Guido_Biele is not busy and can help?

This sounds like a complex design/problem.
I understand that the dependent variables of interest are coi_cond and vis_cond, and these might vary in their effect between preprint services.

However, the model formula does not seem to contain a term for preprint service (except pp_id is the the preprint service, but I assume it is the preprint).

Given that I am not sure about what pp_id is, it is hard to say anything specific.

My general approach would be to first model the contrast of interest and some additional basic structure, e.g.

download ~ coi_cond:vis_cond + 
           (coi_cond:vis_cond | pp_server) + 
           (1 | subj_id) + (1 | pp_id)

and then see how I can model additional effects.

One design feature you seem to be concerned with is that the same subject can be in different visibility conditions for the same paper, if the person was on different servers. Here I wonder if the solution is not so much in the model formula, but simply in how one structures the data: my hunch is that it is sufficient to have the same paper in multiple rows of you (long-form) data.

1 Like

@Guido_Biele Thanks so much for the reply! To clarify the design a bit, a subject is always in the same vis_cond [within and across services], and a given preprint is always in the same coi_cond. But a given preprint may be viewed by subjects in both vis_conds, and a subject may encounter preprint that are in different coi_conds. A preprint is only in one service, but subjects can look at preprints in any service (though many people may only show up in 1 service).

The model I wrote in the initial post was the model I would write if there wasn’t a pp_service level (so treating subjects [subj_id] and preprint [pp_id] as random, allowing random intercepts and slope [where the condition is cross with the random factor]). The part I was having trouble with was how to add the pp_server into the model b/c preprint is nested within service, but subj_id is potentially crossed with service. It didn’t seem like adding (coi_cond:vis_cond | pp_server/pp_id) on top of the other random terms would work (see code below), b/c the interaction slope can only be random at the pp_server level.

download ~ coi_cond:vis_cond + 
                   (coi_cond | subj_id) + (vis_cond | pp_id) +
                   (coi_cond:vis_cond | pp_server/pp_id)

Does brms automatically detect nesting/crossed data if you set up the IDs correctly in the dataset, like lme4 does? In lme4 (1 | subj_id) + (1 | pp_id) + (1 | pp_server) and (1 | subj_id) + (1 | pp_server/pp_id) are equivalent if you set up the input datafile correctly, but those generated different Stan code for me in brms. There’s also going to be a lot of ‘missing’ data (e.g. some services may not have any preprints in a given coi_cond, or some preprints may not end up being seen by subjects in both vis_conds) so I’m going to need to take advantage of partial pooling to help the model converse overall.