Very large credible intervals for odds ratio for logistic regression in brms

Hi,
I am new to the bayesian framework and currently trying to fit a logistic regression in the brms package. The data consists of a binary response variable bin (whether a plant produced any pods 0 or 1). The population level effect Pol (2 level factor: exposed to bumblebees (Po+) (n=41) or control (Po-)(n=41)). Finally, group level effects are Colony1 (56 levels: either assigned to a colony, or an observation random effect for the control as they were not pollinated by any bee colonies) nested in Block (3 blocks representing 3 time periods the experiment was carried out). I want to find out whether pollination increased the chance of a plant producing pods. Yd2.csv (1.2 KB)
The model is specified as:

bin_prop1 <- brm (bin ~1+Pol+ (1|Block/Colony1), 
                 prior = c(set_prior ("normal(0,100)",class="b"),
                 set_prior ("normal(0,100)",class="Intercept")),
                 family = "bernoulli",chains = 4,iter = 4000,control = list(adapt_delta =0.99 
                 ,max_treedepth=15),cores=3,
                 warmup = 1000,data = Yd2)
> summary(bin_prop1)
 Family: bernoulli 
  Links: mu = logit 
Formula: bin ~ 1 + Pol + (1 | Block/Colony1) 
   Data: Yd2 (Number of observations: 82) 
Samples: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
         total post-warmup samples = 12000

Group-Level Effects: 
~Block (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     3.66      4.00     0.38    13.45       1044 1.00

~Block:Colony1 (Number of levels: 56) 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)     2.23      1.80     0.12     6.88       1322 1.00

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept    -2.30      3.18    -9.40     3.31       1686 1.00
PolPoP        4.31      2.41     1.72    10.70       1443 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).

When i use the emmeans package to look at contrasts between each level of Pol i get incredibly large odds ratio and credible intervals on the response scale:

U_bin  <- emmeans (bin_prop1,pairwise~Pol,type="response")
U_bin$contrasts
$contrasts
 contrast  odds.ratio lower.HPD upper.HPD
 Po- / Po+      0.025  3.19e-12     0.144

For comparison i carried out the same analysis using glmer:

bin_glm <- lme4::glmer (bin ~1+Pol+ (1|Block/Colony1), data = Yd2,
                        family = "binomial") 

and when i carry out the same contrast the confidence intervals are narrower:

contrast  odds.ratio     SE  df asymp.LCL asymp.UCL
 Po- / Po+     0.0908 0.0658 Inf     0.022     0.375

Confidence level used: 0.95 
Intervals are back-transformed from the log odds ratio scale 

I understand that credible intervals and confidence intervals are different but i am unsure why the credible intervals are so large for the odds ratio. I partially suspect it is to do with the fact that 70% of the Po- response consists of 0 whilst only 20% for the Po+ which could lead to issues with sparse data. I have read shrinkage priors can be a way to deal with this but i am new to Bayesian framework so i am unsure whether this is appropriate. Data is attached.

Advice on what could be driving this effect would be welcome? Thank you.

2 Likes

You need tighter priors. Either use the default by brms, or tighten them up more.

I don’t think I’m allowed to post the image, but I would recommend picking up statistical rethinking by @richard_mcelreath. Page 330 of the book has a nice explanation as to what happens when we put really diffuse priors in a logistic regression. They tend to put almost all of the probability on the extreme ends (0 or 1). I think that’s what’s happening here.

Thanks, i’ve had a look at the book and as you have suggested i’ve tried narrower priors. For example, when i run:

bin_prop1 <- brm(bin ~1+Pol+ (1|Block/Colony1), data = Yd2,
                 prior = c(set_prior ("normal(0,1)",class="b"),
                           set_prior ("normal(0,10)",class="Intercept")),
                 family = "bernoulli",chains = 4,iter = 4000,control = list(adapt_delta =0.99 
                 ,max_treedepth=15),cores=3,
                 warmup = 1000) 

emmeans (bin_prop1,pairwise~Pol,type="response")
$contrasts
 contrast  odds.ratio lower.HPD upper.HPD
 Po- / Po+      0.139    0.0285     0.348

Results are back-transformed from the log odds ratio scale 
HPD interval probability: 0.95 

I get far smaller credible intervals. I’m thinking the most robust way to present this will be to demonstrate how sensitive the estimates are to varying priors. Because evening using a normal (0,2.5) on b leads to reasonably large credible intervals.

$contrasts
 contrast  odds.ratio lower.HPD upper.HPD
 Po- / Po+      0.056  0.000141     0.191

Results are back-transformed from the log odds ratio scale 
HPD interval probability: 0.95 ```

normal(0, 2.5) is still a very wide prior in a logistic regression for most applications. An effect of 2.5 means that the log odds of pods =1 increase with 2.5. Or the odds increase with a factor e^{2.5} \approx 12. This is roughly equivalent to observing pods in 7.5% of plants in the control group and observing pods in 50% of the plants in the treatment.

1 Like