Hi,
I am still relatively new to using Bayesian Inference. I use it for the first time now in a real project. I have some surprising findings but I suspect that I maybe made a mistake in the progress. I do not get supervision for this and due to lack of experience I would highly appreciate if you could take a look at what I did. Even if you do not work with BRMS, maybe you could help me by looking if my conceptual understanding is (in)correct. If it helps, I could extract stan code from the model.
I use the following model (BRMS)
# specify formula
f1 <- as.formula("abundance |mi() ~ treatment*time + age_s + mi(covariate) + (1 + time + age_s + mi(covariate)|subject_id)")
f2 <- as.formula("covariate |mi() ~ treatment*time + abundance + age_s + (1 + time + age_s + abundance|subject_id)")
formula <- bf(f1) + bf(f2) + set_rescor(FALSE)
# fit model
brm(
family = skew_normal(), data = data, formula = formula,
chains = 4, warmup = 1000,
control = list(adapt_delta = 0.9999, max_treedepth = 15), prior = c(
set_prior("normal(0, 2)", class = "b", resp = "abundance"),
set_prior("exponential(25)", class = "sd", resp = "abundance"),
set_prior("normal(0, 10)", class = "Intercept", resp = "abundance"),
set_prior("lkj(2)", class = "cor"),
set_prior("normal(0, 2)", class = "b", resp = "covariate"),
set_prior("exponential(25)", class = "sd", resp = "covariate"),
set_prior("normal(0, 10)", class = "Intercept", resp = "covariate")
)
)
Outcome = Continuous measure of bacterial abundance. For simplicity just assume it is any continuous outcome.
Treatment = “yes” or “no”
Time = “pre” or “post”
Age = continuous age standardized
Covariate = Continuous
Note that I used the exponential(25) prior because some models had divergent transitions or did not converge. I use mi() because the covariate has 10 missing values and the model would otherwise disregard those cases, which would mean less power for my comparisons of interests.
I fit this type of model for 130 different bacterial species. Then for each of the 130 models, I hypothesize that the model parameter \mu of the skew normal model is different in the post treatment group compared to the other 3 subgroups. To check this, I did the following:
Given the way I set contrasts, the posterior distribution of the intercept reflects the parameter \mu for the pre-no group. The intercept + the “treatment posterior” is then the pre-yes group. The intercept + time the post-no group and finally intercept + treatment + time + time:treatment posterior is the post-yes group.
So, I added them accordingly and end up with 4 posterior distributions that reflect the parameter \mu for each of my subgroups.
Then to answer my hypothesis, in R I calculated e.g.
mean((yespost - yespre) < 0)
as the probability that the parameter \mu differs between these two groups. Using this approach I also compared the group “yespost” to all other groups and thus end up with 130 probabilities for each of the three comparisons.
Of course, before I went on, I checked the diagnostics. I used shinystan for visualization and checked the nuts diagnostic parameters. Of the 130 models I could include 120 and for the other ones increasting the exponential prior to e.g. 30 might do the trick.
By subtracting the posterior distributions that reflect \mu for my subgroups, I can quantify mean difference with credible interval and also can make a statement about the probability of the difference being greater than 0.
Next I tried to correct for multiple testing. Here I struggle the most. If someone has a good resource where this is explained it would be great. I thought that I could just sort the probabilities for each comparison and then keep the maximum amount of outcomes such that the mean probability of those stays below e.g. 0.05. The latter would mean that 5% of the values in my list are false discoveries. Is this a correct way to account for multiple comparisons?
What is surprising to me is that I find many very small effects. Even though they might not be very meaningful I would expect that if there is no effect at all that zero is very likely to be included in the credible interval most of the time. To compare, I looked at the same question but used lme4. But a direct comparison is difficult because: 1. I cannot use shrinkage for the age and the covariate parameter + as far as I know I only would do posthoc testing if the interaction is significant and this is not nearly as often the case as I found “significant” difference using the model above.
Any thoughts or feedback is welcome!