Hey
I will first outline briefly the type of data I deal with below:
I have 6 (biological) samples.
3 samples got treatment A and the other 3 treatment B.
In each sample, we can measure the expression level of many proteins (1000+).
We assume that many proteins will not be influenced by the treatments and will therefore have similar expression levels in both treatments.
Other proteins will be influenced and can have drastically different expression levels between the two treatments.
We assume that the expression levels are log normal distributed.
(The whole setup is actually quite a bit more complex but this minimal example will demonstrate the problem I struggle with)
The scientific question we try to answer with our model is which proteins have different expression levels between the two treatments.
This is also referred to as a differential expression analysis.
I use BRMS to model this data, and in my mind the most simple model is expressed by the following formula:
log(expression) ~ protein_id + treatment:protein_id
This allows me to compare posteriors of treatment A vs B for each protein separately.
However, this model take a long time to sample has a lot of sampling issues (low ESS)
Therefore I tried the following hierarchical model instead:
log(expression) ~ (1| protein_id) + (1: treatment:protein_id)
After setting a very narrow prior on the sd parameters, I am able to get a model with almost no convergence issues and high ESS and rhat’s around 1. (still takes a couple of hours to sample)
However, I was wondering If could have done this better?
One of the issues I can see is that most proteins does not show difference in expression levels, so the shrinkage on the proteins that have different expression levels might be too much.
Also the treatment A and treatment B parameters within each protein are obviously correlated.
If there is no difference in expression between A and B, then both parameters will be closed to zero (and vice versa).
If anybody has any tips and/or has pointers to material that deals with these type of analyses in Bayesian setting, I would be very happy. :) (maybe for related data such as gene expression)
Best regards and thanks in advance!