Hi all, I’m an applied researcher working on assessing the effect of a drug on the expression level of approximately 90 genes. The study used quantitative PCR to examine the number of copies of each gene transcript. What we’re interested in is whether the drug group has higher/lower expression levels of any of these specified genes at follow-up, relative to those who received placebo.

The typical way to perform a study like this (about 90% of what occurs in the literature), is to run independent univariable regression models for each gene - with the group variable included as a binary predictor. Then, to account for multiple tests, a p value correction is used to control the false discovery rate.

This strikes me as potentially inefficient. The expression level of many of these genes will be correlated, so the outcomes are not independent. Furthermore, the drug is not ‘clean’ pharmacologically - it has many targets. I would be surprised if it had a large effect on just one gene and no effect on others. For these reasons, I have been considering a Bayesian hierarchical model which allows the assessment of the effect of the drug on each gene to ‘borrow strength’ from the effect of the drug on other genes (i.e. ‘partial pooling’ of drug effect estimates). However, I’ve never modeled multiple correlated outcomes before using Stan or BRMS.

Is my intuition correct that modeling the hierarchical structure of this data might be more efficient? And is such a model structure easy enough to implement in Stan or BRMS? If anyone has any links or texts to recommend which investigate similar problems, I would greatly appreciate it. It always feels risky going against the grain in the literature, so I need to be able to back these methods with a robust understanding of their relative merits. Any help would be much appreciated. Thanks!