Dear stan community,
I’ve recently started out using stan and brms for modelling. The existing and established workflows for differential expression on RNA-seq apparently call for the use negative binomial distributions since such data is based on counts and cannot be continuous. However, I am working with proteomic data obtained with a mass spectrometer. Hence, the relative intensities of each protein are continuous. Since an identified protein always has a positive intensity y, the data can be log-transformed, such that log(y) ~ N (µ,σ2). If we consider a simple experiment, a treatment is imposed on one group of cells but not the other, such that we have a categorical variable drug that either is 0 or 1. We then extract the proteins from each group, run them on the mass spectrometer and obtain intensity values for all detected proteins.
In my naive approach, to assess the proteins that have been differentially expressed, I would then for each protein compare intensity values obtained for each protein as follows with brms:
fit_0 = brm_multiple( bf(y ~ 1), ...)
fit_1 = brm_multiple( bf(y ~ 0 + drug), ...)
loo_compare( loo (fit_0), loo(fit_1) )
However, this resembles to me as something that appears almost as a frequentist approach as I am doing this model evaluation for each individual protein. I am also not sure if this somehow makes me run into a multiple-testing conundrum (which I’ve understood Bayesian statistics not really causes).
Should I rather do this evaluation as a mixed-effects model with another variable p (protein ID)?
Such as
fit_0 = brm( bf(y ~ (1|protein)), ...)
fit_1 = brm( bf(y ~ 0 + drug+(1|protein)), ...)
With this later approach it is not as clear to me how I would observe the proteins that really are differentially expressed. Any comments and feedback are welcome and appreciated!