Hierarchically structured data with sublevels - brms

I have some hierarchically structured data where I’d like to estimate the effect of a categorical variable, “Drug_Class” , where some (but not all) levels of “Drug_Class” have multiple labelled sub-levels.

In the sometimes-frowned-upon nomenclature of fixed/random effects, I’d like to treat “Drug_class” as a fixed effect, and the labelled members of those classes that have them as random effects. To make things concrete, “Drug_Class” has levels for “alcohol” , “tobacco” , “cannabis” , “stimulants” (mix of crack cocaine, powder cocaine, Rx stimulants, other stimulants), “opioids” (heroin and Rx opioids), and “tranquilisers” (Rx tranq and seroquel). So the latter three each have sub-members with labels and I want to acommodate any potential systematic differences between those labelled sub-levels. This is indexed in my data under an 11-level categorical variable named “Index_Motives”.

I should also note that I have uneven outcome data across these levels, as outcome data (a 100mm VAS scale of motives for using particular substances) was only collected when participants reported using each of those drugs in the past 30 days. Some drug categories have low endorsement, but belong to a similar theoretical class of drugs, which is why I want to combine across some specific drug classes.

Any ideas?

My current model specification is:

Outcome ~ 1 + Drug_Class + ( 1 | Subject_ID)

I am new to brms, but greatly appreciate the opportunities it can offer. Many thanks for any insights into this question.

1 Like

@paul.buerkner Mind taking a look at this? This is a scenario where I’d know how to code things by hand in Stan, but don’t know the proper brms syntax.

You could model the specific drug sub-class another random effect, that is add (1 I Index_Motives). The problem is that those sub-classes are unlikey to be a-prioro exchangable, so a random effect might not be a justied assumption.

Perhaps replacing Drug_Class by Index_Motives would be more reasonable. If you want to obtain estimates for the general drug classes, you could still compute the mean (per draw and then summarized) of the respective sub-classes after model fitting.

Hm, if you’re saying he could do:

Outcome ~ 1 + Drug_Class + ( 1 | Index_Motives) + ( 1 | Subject_ID)

doesn’t that cause an identifiability issue for Drug_class levels with only one member? For example, the model would simultaneously be attempting to estimate both a fixed effect of alcohol (or, the alcohol condition’s deviation from the other drug classes) as well as a random effect of alcohol (or, the alcohol condition’s deviation from the mean of all the drugs).

Ha, @R-broadcast came up with this himself while consulting me about this before posting and I waved him off the idea thinking it’d be better to have things in the model formally, but I stand corrected!

Due to the hierarchical shrinkage, Drug_Class + ( 1 | Index_Motives) should be soft-identified (by the prior), but as I sad above, I am not sure if this is a good choice because the exchangability assumption between index-motives is likely violated so that shrinkage over all of these terms may be problematic.

Yes, I agree. I do think there might be decent support for exchangeability within the drug-class subgroups (though an assumption to keep an eye on for sure). How would one express a model in brms’ syntax to capture the existence of such subgroups/nesting?

The nesting is automatically modeled by having both Drug_class and Index_Motives in the model.

Thank you for the suggestions @paul.buerkner and @mike-lawrence. If I am understanding this correctly, having both Drug_Class and Index_Motives in the same model would not be a good idea. Instead, I should summarize across the relevant parameters from the posterior.

I typically use the conditional_effects sytanx to get marginal plots of Drug_Class. Assuming I use Index_Motives as my random factor, can I set up conditional_effects so that it gives me the relevant summaries that I would typically get from Drug_Class? Could I use “make_conditions” for this, or do I need to wrangle and summarize across the relevant parameters outside of the conditional_effects helper?

Thanks for any advice. Brms is a great package, and I want to make sure I get these analyses right.