I have my dataset with different mutations as unit of analysis. These mutations belong to 5 different classes. Also, I have collected, 9 features about these mutations. In other words I have 12 columns:
- First column: mutation ID
- Second column: Mutation class
- Third column to eleven: Features about these mutations (at individual level)
- Twelve column: Drug resistant/susceptible or binary column.
In addition, I have done a survey of experts, asking them the probability of resistance given each mutation class.
Now, I want to model drug resistant as a function of these 9 features using Bayesian Hierarchical model. I want to take into account those mutation classes and the probabilities from experts as prior information. I don’t know if Bayesian Hierarchical model is the right approach. If this is correct, then, how can I parametricize my model. I want to write in my methods section.
Hello, here is my thoughts but I could be wrong - if there’s a clear hierarchical structure in your dataset, for instance, the individual-level records are nested/grouped/clustered within those 5 different mutation classes, then you may want to consider using a 2-level hierarchical logistic regression model. Since you don’t have group-level covariates (i.e., measured on the mutation classes) the real question is now what should you do… should you implement a random-intercept only model (i.e., for the overall means within each mutation class to vary) or a random-slope model (i.e., for the group structure have variation across the means for each mutation class (i.e., intercepts) and for each slope (i.e., coefficients) - this is decision is really up to you.
If you are going to do this in brms package, you might find this 1-hour long video enlightening: Hierarchical Models with brms (GR5065 2019-04-11) - YouTube
Also, I learned a lot from these tutorial papers on how to code basic hierarchical models in Stan and brms. See Sorensen et al. (2016) [LINK] for writing hierarchical regression models in Stan. You can also see Nalborczyk et al. (2019) [LINK] for coding these models in brms.
Thank you so much. There is one more issue maybe you can help. How can I incorporate expert survey information? I asked them what do they think is the probability of resistance for each mutation class. In other words they gave probabilities at group level.
Hello, no worries, if part of the responses from expert survey are measured at a group-level (i.e., mutation class), then you can definitely add that information as a covariate on a group-level as a continuous variable. But I trust these responses should change across the groups? Nevertheless, the individual-level responses will form the level-1 and expert survey information will form the level-2 part of the model.
Small warning though - what happens when you include group-level variable(s) depends on the scenario - 1.) Random-intercept-only: you’ll have a full-on fixed part of the model where you can report the coefficients for each individual-level and group-level variable with just one random effect and error term [simple]; and 2.) depending on the individual-level variable(s) you decide to put a random-slope on… you will get quite a messy model where you will have a fixed part of the model to report the direct coefficients for each individual-level and group-level variables, but you will also have part of the model that has interactions with both the individual- and group-level variables; plus another part of the model with the interactions between the random effects and individual-level covariates, and lastly one overall random effect term with an error term [complex].
I share a lecture slide that I used to show what I mean about how messy things get with an inclusion of group-level variables and random slopes (more generalised way).
Don’t worry about the maths though - code everything in brms with the individual- and group-level specification and R does the rest!
Thank you so much, this is helpful. You came at the right time.