Credible intervals changing depending on the method used to find them

BigFish_Artwire · August 4, 2020, 11:12am

Hello. I am trying to find if there is an interaction between time (called delay in the model) and the number of repetitions that the participants were exposed to the material they had to memorise, but three different methods are giving me different results and I want to know if I did something wrong or which one should I trust.

This is my model:

modexp3 <- brm(
  stringent ~ 1 + delay + repetitions
  + delay:repetitions
  + ( 1 + delay | id )
  + ( 1 + delay + repetitions | item ),
  data = df,
  family = bernoulli(),
  iter = 4000,
  file = "modexp3"
)

The output of summary(modexp3) says that there is evidence for one of the interactions and no evidence for the other (see the last two lines)

Population-Level Effects: 
                                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                              1.23      0.38     0.50     1.98 1.00     1543     3153
delay1day                             -1.78      0.29    -2.37    -1.23 1.00     4059     4832
delay3days                            -2.96      0.41    -3.83    -2.19 1.00     2895     4049
repetitions2repetitions               -1.68      0.49    -2.67    -0.76 1.00     1496     3159
delay1day:repetitions2repetitions     -1.05      0.48    -2.07    -0.16 1.00     3821     4512
delay3days:repetitions2repetitions    -0.99      0.63    -2.34     0.15 1.00     3000     2829

Then I manually computed the mean, SD and CIs like this:

my_samples <- posterior_samples(modexp3)
a <- my_samples$b_Intercept
b <- my_samples$b_Intercept + my_samples$b_delay1day
c <- my_samples$b_Intercept + my_samples$b_delay3days
d <- my_samples$b_Intercept + my_samples$b_repetitions2repetitions
e <- my_samples$b_Intercept + my_samples$b_delay1day + my_samples$b_repetitions2repetitions +
  my_samples$`b_delay1day:repetitions2repetitions`
f <- my_samples$b_Intercept + my_samples$b_delay3days + my_samples$b_repetitions2repetitions +
  my_samples$`b_delay3days:repetitions2repetitions`

immediate vs one day, 4 vs 2 repetitions

fourrepimm.fourrepday <- a - b
tworepimm.tworepday <- d - e
first_slopes <- fourrepimm.fourrepday - tworepimm.tworepday
meanfirst <- round(mean(first_slopes),2)
sdfirst <- round(sd(first_slopes),2)
cifirst <- round(quantile(first_slopes, probs = c(0.025, 0.975)),2)

immediate vs 3 days, 4 vs 2 repetitions

fourrepimm.fourrep3days <- a - c
tworepimm.tworep3days <- d - f
second_slopes <- fourrepimm.fourrep3days - tworepimm.tworep3days
meansecond <- round(mean(second_slopes),2)
sdsecond <- round(sd(second_slopes),2)
cisecond <- round(quantile(second_slopes, probs = c(0.025, 0.975)),2)

one day vs 3 days, 4 vs 2 repetitions

fourrepday.fourrep3days <- b - c
tworepday.tworep3days <- e - f
third_slopes <- fourrepday.fourrep3days - tworepday.tworep3days
meanthird <- round(mean(third_slopes),2)
sdthird <- round(sd(third_slopes),2)
cithird <- round(quantile(third_slopes, probs = c(0.025, 0.975)),2)

And the results were:

meanfirst
[1] -1.05
sdfirst
[1] 0.48
cifirst
2.5% 97.5%
-2.07 -0.16

meansecond
[1] -0.99
sdsecond
[1] 0.63
cisecond
2.5% 97.5%
-2.34 0.15

meanthird
[1] 0.06
sdthird
[1] 0.68
cithird
2.5% 97.5%
-1.30 1.37

Finally, I learned how to do the same using hypothesis() thanks to this answer here in the forum

h <- c("delay1day < 0",	# evidence	delay1day	is	less	than	immediate	for	the	4 repetitions			
       "delay3days < 0",#	evidence	delay3days	is	less	than	immediate	for	the	4 repetitions			
       "delay3days - delay1day < 0",#	evidence	delay3days	is	less	than	delay1day	for	the	4 repetitions			
       "delay1day + delay1day:repetitions2repetitions < 0",#	evidence	delay1day	is	less	than	immediate	for	the	2 repetitions			
       "delay3days + delay3days:repetitions2repetitions < 0",#	evidence	delay3days	is	less	than	immediate	for	the	2 repetitions			
       "(delay3days + delay3days:repetitions2repetitions) - (delay1day + delay1day:repetitions2repetitions) < 0",#	evidence	delay3days	is	less	than	delay1day	for	the	2 repetitions			
       "repetitions2repetitions < 0",#	evidence	repetitions2repetitions	is	less	than	4 repetitions	at	immediate				
       "repetitions2repetitions + delay1day:repetitions2repetitions < 0",#	evidence	repetitions2repetitions	is	less	than	4 repetitions	at	delay1day				
       "repetitions2repetitions + delay3days:repetitions2repetitions < 0",#	evidence	repetitions2repetitions	is	less	than	4 repetitions	at	delay3days				
       "delay1day:repetitions2repetitions < 0",#	evidence	slope	between	delay1day	and	immediate	is	steeper	for	repetitions2repetitions	than	repetitions4repetitions
       "delay3days:repetitions2repetitions < 0",#	evidence	slope	between	delay3days	and	immediate	is	steeper	for	repetitions2repetitions	than	repetitions4repetitions
       "delay3days:repetitions2repetitions - delay1day:repetitions2repetitions < 0"#	evidence	slope	between	delay3days	and	delay1day	is	steeper	for	repetitions2repetitions	than	repetitions4repetitions
)

hypothesis(modexp3, h)

And these are the results for the last three which are the interactions:

10 (delay1day:repeti... < 0    -1.05      0.48    -1.87    -0.30      95.39      0.99    *
11 (delay3days:repet... < 0    -0.99      0.63    -2.07    -0.02      20.45      0.95    *
12 (delay3days:repet... < 0     0.06      0.68    -1.04     1.16       0.84      0.46

As you can see there is one of the three interactions that appears and disappears depending on the method I use :(

Operating System: Windows 10
brms Version: 2.12.0

martinmodrak · August 13, 2020, 2:59pm

Sorry, your question fell through a bit. Thanks for being investigative about this, is indeed a bit weird and I can’t rule out that it is a symptom of a bug.

First a bit of conceptual suggestion:

Generally we discourage people from boolean “effect exists” vs. “effect doesn’t exist” judgements - the very fact that you put the interaction in you model means that mathematically, the effect can never be estimated as exactly zero. Depending on the posterior CI, what you can say is one of:

“we cannot learn much about the effect from the data” (when the posterior is very wide and includes both positive and negative values)
“there is some evidence for a positive/negative effect” (when the posterior CI excludes zero) - still, if the posterior includes values close to zero, you can’t rule out the effect is actually negligible in practice
“there is evidence against a substantial effect” (when the posterior is tightly concentrated around zero and thus excludes meaningful effect sizes on both sides - what is “meaningful” or “substantial” effect would depend on the domain.

Also remember that the 95% intervals can be a bit sensitive to the stochasticity in MCMC (unless you run a lot of iterations) and to minor changes in the input data, so for practical purposes, the difference between a CI of say -5.1, 0.01 and -5.04, -0.03 would usually be completely irrelevant - in both cases there is substantial posterior probability that the effect is negative, data are consistent with a strongly negative effect but a negligible effect cannot be ruled out.

All that assumes your model is a good model for the data, which you should check (e.g. using the pp_check functionality)

Getting to your actual question:

I would expect hypothesis(param < 0) to return the same CIs as summary for that parameter, so I am not sure what is going on here, I asked a separate question on this at Why does hypothesis return different CIs than summary?

You’ll also notice that for the first and second hypothesis, you get exactly the same CIs as in the summary (because all the terms except for the interaction cancel out).

I would expect hypothesis to do exactly the same thing as you did with the samples, so I think the reason for the discrepancy is the same.

So while I share your concern about the numerically different results, I don’t think the differences in the CIs would warrant notably different conclusions.

BigFish_Artwire · September 12, 2020, 2:36pm

Hello! Thank you for your answer. This is a relief, as I was troubled that I was doing something wrong. I will check the separate question you asked on the subject.

andymilne · September 12, 2020, 11:52pm

Put simply, when you do a directional hypothesis in brms, the summary shows 90% CIs not 95% CIs. If you redo your own calculations with probs = c(0.05, 0.95), they should match the brms hypothesis output.

Topic		Replies	Views
Credible intervals changing after running the model again brms	1	391	April 10, 2021
Subject-specific vs population-averaged effects in brms longitudinal bernoulli regression model brms fitting-issues	17	1372	May 26, 2020
IRT: Discrepancy in estimates hypothesis & conditional effects Modeling irt , conditional-effects , hypothesis-testing , brms	8	1075	May 17, 2021
Inference of the results for Linear Mixed Models computed with brms General rstan	3	736	November 17, 2021
Colinearity issues arise in multilevel multivariate models (but not in submodels) brms	13	2180	March 15, 2019

Credible intervals changing depending on the method used to find them

Related topics