Credible intervals changing depending on the method used to find them

Hello. I am trying to find if there is an interaction between time (called delay in the model) and the number of repetitions that the participants were exposed to the material they had to memorise, but three different methods are giving me different results and I want to know if I did something wrong or which one should I trust.

This is my model:

modexp3 <- brm(
  stringent ~ 1 + delay + repetitions
  + delay:repetitions
  + ( 1 + delay | id )
  + ( 1 + delay + repetitions | item ),
  data = df,
  family = bernoulli(),
  iter = 4000,
  file = "modexp3"
) 

The output of summary(modexp3) says that there is evidence for one of the interactions and no evidence for the other (see the last two lines)

Population-Level Effects: 
                                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                              1.23      0.38     0.50     1.98 1.00     1543     3153
delay1day                             -1.78      0.29    -2.37    -1.23 1.00     4059     4832
delay3days                            -2.96      0.41    -3.83    -2.19 1.00     2895     4049
repetitions2repetitions               -1.68      0.49    -2.67    -0.76 1.00     1496     3159
delay1day:repetitions2repetitions     -1.05      0.48    -2.07    -0.16 1.00     3821     4512
delay3days:repetitions2repetitions    -0.99      0.63    -2.34     0.15 1.00     3000     2829

Then I manually computed the mean, SD and CIs like this:

my_samples <- posterior_samples(modexp3)
a <- my_samples$b_Intercept
b <- my_samples$b_Intercept + my_samples$b_delay1day
c <- my_samples$b_Intercept + my_samples$b_delay3days
d <- my_samples$b_Intercept + my_samples$b_repetitions2repetitions
e <- my_samples$b_Intercept + my_samples$b_delay1day + my_samples$b_repetitions2repetitions +
  my_samples$`b_delay1day:repetitions2repetitions`
f <- my_samples$b_Intercept + my_samples$b_delay3days + my_samples$b_repetitions2repetitions +
  my_samples$`b_delay3days:repetitions2repetitions`

immediate vs one day, 4 vs 2 repetitions

fourrepimm.fourrepday <- a - b
tworepimm.tworepday <- d - e
first_slopes <- fourrepimm.fourrepday - tworepimm.tworepday
meanfirst <- round(mean(first_slopes),2)
sdfirst <- round(sd(first_slopes),2)
cifirst <- round(quantile(first_slopes, probs = c(0.025, 0.975)),2) 

immediate vs 3 days, 4 vs 2 repetitions

fourrepimm.fourrep3days <- a - c
tworepimm.tworep3days <- d - f
second_slopes <- fourrepimm.fourrep3days - tworepimm.tworep3days
meansecond <- round(mean(second_slopes),2)
sdsecond <- round(sd(second_slopes),2)
cisecond <- round(quantile(second_slopes, probs = c(0.025, 0.975)),2) 

one day vs 3 days, 4 vs 2 repetitions

fourrepday.fourrep3days <- b - c
tworepday.tworep3days <- e - f
third_slopes <- fourrepday.fourrep3days - tworepday.tworep3days
meanthird <- round(mean(third_slopes),2)
sdthird <- round(sd(third_slopes),2)
cithird <- round(quantile(third_slopes, probs = c(0.025, 0.975)),2) 

And the results were:

meanfirst
[1] -1.05
sdfirst
[1] 0.48
cifirst
2.5% 97.5%
-2.07 -0.16

meansecond
[1] -0.99
sdsecond
[1] 0.63
cisecond
2.5% 97.5%
-2.34 0.15

meanthird
[1] 0.06
sdthird
[1] 0.68
cithird
2.5% 97.5%
-1.30 1.37

Finally, I learned how to do the same using hypothesis() thanks to this answer here in the forum

h <- c("delay1day < 0",	# evidence	delay1day	is	less	than	immediate	for	the	4 repetitions			
       "delay3days < 0",#	evidence	delay3days	is	less	than	immediate	for	the	4 repetitions			
       "delay3days - delay1day < 0",#	evidence	delay3days	is	less	than	delay1day	for	the	4 repetitions			
       "delay1day + delay1day:repetitions2repetitions < 0",#	evidence	delay1day	is	less	than	immediate	for	the	2 repetitions			
       "delay3days + delay3days:repetitions2repetitions < 0",#	evidence	delay3days	is	less	than	immediate	for	the	2 repetitions			
       "(delay3days + delay3days:repetitions2repetitions) - (delay1day + delay1day:repetitions2repetitions) < 0",#	evidence	delay3days	is	less	than	delay1day	for	the	2 repetitions			
       "repetitions2repetitions < 0",#	evidence	repetitions2repetitions	is	less	than	4 repetitions	at	immediate				
       "repetitions2repetitions + delay1day:repetitions2repetitions < 0",#	evidence	repetitions2repetitions	is	less	than	4 repetitions	at	delay1day				
       "repetitions2repetitions + delay3days:repetitions2repetitions < 0",#	evidence	repetitions2repetitions	is	less	than	4 repetitions	at	delay3days				
       "delay1day:repetitions2repetitions < 0",#	evidence	slope	between	delay1day	and	immediate	is	steeper	for	repetitions2repetitions	than	repetitions4repetitions
       "delay3days:repetitions2repetitions < 0",#	evidence	slope	between	delay3days	and	immediate	is	steeper	for	repetitions2repetitions	than	repetitions4repetitions
       "delay3days:repetitions2repetitions - delay1day:repetitions2repetitions < 0"#	evidence	slope	between	delay3days	and	delay1day	is	steeper	for	repetitions2repetitions	than	repetitions4repetitions
)

hypothesis(modexp3, h)

And these are the results for the last three which are the interactions:

10 (delay1day:repeti... < 0    -1.05      0.48    -1.87    -0.30      95.39      0.99    *
11 (delay3days:repet... < 0    -0.99      0.63    -2.07    -0.02      20.45      0.95    *
12 (delay3days:repet... < 0     0.06      0.68    -1.04     1.16       0.84      0.46     

As you can see there is one of the three interactions that appears and disappears depending on the method I use :(

  • Operating System: Windows 10
  • brms Version: 2.12.0
1 Like

Sorry, your question fell through a bit. Thanks for being investigative about this, is indeed a bit weird and I can’t rule out that it is a symptom of a bug.

First a bit of conceptual suggestion:

Generally we discourage people from boolean “effect exists” vs. “effect doesn’t exist” judgements - the very fact that you put the interaction in you model means that mathematically, the effect can never be estimated as exactly zero. Depending on the posterior CI, what you can say is one of:

  • “we cannot learn much about the effect from the data” (when the posterior is very wide and includes both positive and negative values)
  • “there is some evidence for a positive/negative effect” (when the posterior CI excludes zero) - still, if the posterior includes values close to zero, you can’t rule out the effect is actually negligible in practice
  • “there is evidence against a substantial effect” (when the posterior is tightly concentrated around zero and thus excludes meaningful effect sizes on both sides - what is “meaningful” or “substantial” effect would depend on the domain.

Also remember that the 95% intervals can be a bit sensitive to the stochasticity in MCMC (unless you run a lot of iterations) and to minor changes in the input data, so for practical purposes, the difference between a CI of say -5.1, 0.01 and -5.04, -0.03 would usually be completely irrelevant - in both cases there is substantial posterior probability that the effect is negative, data are consistent with a strongly negative effect but a negligible effect cannot be ruled out.

All that assumes your model is a good model for the data, which you should check (e.g. using the pp_check functionality)

Getting to your actual question:

I would expect hypothesis(param < 0) to return the same CIs as summary for that parameter, so I am not sure what is going on here, I asked a separate question on this at Why does hypothesis return different CIs than summary?

You’ll also notice that for the first and second hypothesis, you get exactly the same CIs as in the summary (because all the terms except for the interaction cancel out).

I would expect hypothesis to do exactly the same thing as you did with the samples, so I think the reason for the discrepancy is the same.

So while I share your concern about the numerically different results, I don’t think the differences in the CIs would warrant notably different conclusions.

3 Likes

Hello! Thank you for your answer. This is a relief, as I was troubled that I was doing something wrong. I will check the separate question you asked on the subject.

Put simply, when you do a directional hypothesis in brms, the summary shows 90% CIs not 95% CIs. If you redo your own calculations with probs = c(0.05, 0.95), they should match the brms hypothesis output.

3 Likes