Bayes factors for a moderating effect

Hi,

I am calculating Bayes factors for a moderation effect of ‘past’ on the relationship between ‘pol’ and ‘endorse’.

Please see Case 1 and Case 2( I used ‘update()’ function as you did) below.

Case 1

full_brms_eds2 <- brms::brm(endorse ~ pol * past, data = df2_pw, prior = priors_full_eds1, save_all_pars = TRUE)

null_brms_eds2 <- brms::brm(endorse ~ pol + past, data = df2_pw, prior = priors_full_eds1, save_all_pars = TRUE)

BF_brms_eds2 <- bayes_factor(full_brms_eds2, null_brms_eds2)

BF_brms_eds2

Case 2

full_brms_eds2 <- brms::brm(endorse ~ pol * past, data = df2_pw, prior = priors_full_eds1, save_all_pars = TRUE)

null_brms_eds2 <- update(full_brms_eds2, formula = ~ .-pol:past)

BF_brms_eds2 <- bayes_factor(full_brms_eds2, null_brms_eds2)

BF_brms_eds2

When priors were set, Bayes factors between case 1 and case 2 were very different.

It seems that update() function used previous priors except for the interaction effect.

when I calculated Bayes factors using the priors except for the interaction effect in priors_full_eds1, the results were similar to that calculated using update() function.

priors_full_eds1 <- c(

  set_prior('normal(1.4236, .0713)', class = 'sigma'),

  set_prior('normal(6.1075, .2745)', class = 'Intercept'),

  set_prior('normal(-.2373, .0795)', class = 'b', coef = 'pol'),

  set_prior('normal(-2.0210, .4054)', class = 'b', coef = 'past'),

  set_prior('normal(.4224, .1149)', class = 'b', coef = 'pol:past')  < -the interaction effect

)

priors_null_eds1 <- c(

  set_prior('normal(1.4688, .0761)', class = 'sigma'),

  set_prior('normal(5.5057, .2301)', class = 'Intercept'),

  set_prior('normal(-.0340, .0596)', class = 'b', coef = 'pol'),

  set_prior('normal(-.7256, .2080)', class = 'b', coef = 'past')

)

Priors calculated without the interaction effect (priors_null_eds1) would be different from priors calculated with the the interaction effect.

The difference in priors made different results.

Now, I am not sure which way is more appropriate to test the moderating effect.

May I ask you opinion?

Thanks so much in advance.

Kind regards,
IK KIm

  • Operating System: Window, R
  • brms Version: 2.12

Unfortunately, Bayes factors are very sensitive to prior choice. I personally prefer to use posterior predictive checks to determine if a model fails at capturing some aspect of the data (and if a more complex model is therefore warranted). I understand following all the conflicting statistical advice out there can be difficult, so obviously do what you believe is best for your problem.

I don’t think I can provide a definitive answer: this really depends on the full scientific context of what you are trying to do. The priors you give are however suspiciously precise, where do they come from? Also, if you want to test whether there is an interaction of interest, why would you use a prior that strongly assumes there is a positive interaction? ( normal(.4224, .1149) implies that the probability the interaction coefficent is 0 or less is 0.0001183483 and only 2.5% chance that the interaction coefficient is < 0.2).

2 Likes

Thanks Martinmodark for your reply.

We are directly replicating a study, so the priors are from the results of Bayesian analysis on the original data. We have got the original data from the authors and conducted Bayesian analysis on the data set, using diffuse priors.

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: liking ~ pol * past 
   Data: df1 (Number of observations: 203) 
Samples: 4 chains, each with iter = 1e+05; warmup = 50000; thin = 1;
         total post-warmup samples = 2e+05

Population-Level Effects: 
            Estimate Est.Error   l-95% CI   u-95% CI      Rhat Bulk_ESS Tail_ESS
Intercept  6.1064984 0.2737157  5.5667923  6.6435565 1.0000241   118917   139578
pol       -0.2368219 0.0792713 -0.3921040 -0.0809990 1.0000369   116282   131512
past      -2.0189179 0.4051452 -2.8134176 -1.2239972 1.0000437   105572   118970
pol:past   0.4216265 0.1149064  0.1961594  0.6468109 1.0000657    99993   114999

Family Specific Parameters: 
       Estimate Est.Error  l-95% CI  u-95% CI      Rhat Bulk_ESS Tail_ESS
sigma 1.4234734 0.0717751 1.2908369 1.5719748 1.0000342   151492   132235

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
priors_full_eds1 <- c(

  set_prior('normal(1.4234734,0.0717751)', class = 'sigma'),
  
  set_prior('normal(6.1064984,0.2737157)', class = 'Intercept'),
  
  set_prior('normal(-0.2368219,0.0792713)', class = 'b', coef = 'pol'),
  
  set_prior('normal(-2.0189179,0.4051452)', class = 'b', coef = 'past'),
  
  set_prior('normal(0.4216265,0.1149064)', class = 'b', coef = 'pol:past')
)

I set the priors from the above output, is this okay?

Could you explain more about it? the coefficient seems not small to me.
How did you calculate the percentage?

Kind regards,
IK Kim

> pnorm(0.2, m = .4224, sd = .1149)
[1] 0.02645859

So, technically, a 2.6% probability.

2 Likes

Thanks maxbiostat for your explanation.

I see! If I understand it correctly, you are trying to do some sort of “Bayesian updating”, right? My experience from some attempts of my own and what I get from reading these forums is while this approach is appealing in theory, it is quite hard to do well in practice. Notably, in finite-data circumstances, posterior distributions are often quite non-normal. Also there are often correlations in the posterior, so you may lose information when you approximate the posterior with independent normals the way you did. For those reasons, analyzing the two datasets together is usually preferable. You may try for yourself, how much the results change if you use the priors you gave and only the new data or priors from the original analysis and all the data together. (I would expect it does not differ much, provided the original data did constrain all the parameters well).

It should also be noted that depending on your field, “direct replication” doesn’t necessarily mean that it is safe to just put all the data together - there might be additional sources of variation that are hard to control between replications. A hierarchical model,i.e. something like: liking ~ pol * past + (pol * past || source) where source is a factor distinguishing the original and your study might be sensible.

I however have honestly no idea on how to best use Bayes factors in any of the scenarios… I would guess that computing BF with a prior from a previous analysis gives you a BF that is hard to interpret, as the fit with interaction informs your priors also on the non-interaction coefficients.

If I get it correctly, Max explained the rest, right?

Does that make sense?

Thanks martinmodrak for your kind reply.

As you suggested I conducted the hierarchical model. (please see below)

I have got so many warning messages and I could not get the population-level interaction effect in the second output.

Could you let me know if this is right?
I am confused with the results.

Also, I wonder if it is also okay to add the source as a covariate to control the effect between sources.
i.e. liking ~ pol * past + source

Thanks.

Kind regards,
IK Kim

Warning messages:
1: Rows containing NAs were excluded from the model. 
2: There were 2928 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
3: There were 6 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded 
4: Examine the pairs() plot to diagnose sampling problems
 
5: The largest R-hat is 1.15, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
6: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
7: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess 
> print(full_brms_eds1, digit = 7)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: endorse ~ pol * past + (pol * past || study) 
   Data: df.total (Number of observations: 1281) 
Samples: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
         total post-warmup samples = 20000

Group-Level Effects: 
~study (Number of levels: 3) 
               Estimate Est.Error  l-95% CI  u-95% CI      Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.4171979 0.7361437 0.0077433 3.0558407 1.0338326       91       23
sd(pol)       0.1602248 0.2751186 0.0027786 1.2854136 1.0676664       38       22
sd(past)      0.5923906 0.7727919 0.0113456 2.8034628 1.0769272       35      232
sd(pol:past)  0.1937668 0.2596760 0.0033129 0.8717832 1.1128618       25      120

Population-Level Effects: 
            Estimate Est.Error   l-95% CI   u-95% CI      Rhat Bulk_ESS Tail_ESS
Intercept  6.0893839 0.2730104  5.5680990  6.6241005 1.0234012      145       39
pol       -0.2770785 0.1314440 -0.5764787 -0.0760962 1.1501916       17       26
past      -1.5598754 0.3858469 -2.3368788 -0.7267997 1.0287063      114     2154
pol:past   0.3462971 0.1477290  0.0909803  0.6585179 1.1210709       21       44

Family Specific Parameters: 
       Estimate Est.Error  l-95% CI  u-95% CI      Rhat Bulk_ESS Tail_ESS
sigma 1.4193370 0.0281266 1.3700194 1.4761503 1.0539357       47     4945

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning messages:
1: Parts of the model have not converged (some Rhats are > 1.05). Be careful when analysing the results! We recommend running more iterations and/or setting stronger priors. 
2: There were 2928 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
	Warning messages:
1: There were 292 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: Examine the pairs() plot to diagnose sampling problems
 
> print(full_brms_att1, digit = 7)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: envatt ~ pol * past + (pol * past) || study 
   Data: df.total (Number of observations: 1282) 
Samples: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
         total post-warmup samples = 20000

Group-Level Effects: 
~study (Number of levels: 3) 
               Estimate Est.Error  l-95% CI  u-95% CI      Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.2389374 0.3546239 0.0048782 1.2277446 1.0030825     3124     3180
sd(pol)       0.4786991 0.3189615 0.1651148 1.3698546 1.0029665     3803     4372
sd(past)      0.1743050 0.2607102 0.0043862 0.7918339 1.0003711     6513     8568
sd(pol:past)  0.1253931 0.1790461 0.0079431 0.5321988 1.0006649     2825     2000

Population-Level Effects: 
           Estimate Est.Error  l-95% CI  u-95% CI      Rhat Bulk_ESS Tail_ESS
Intercept 6.8648033 0.2038524 6.4141671 7.2665854 1.0020653     2665     1154

Family Specific Parameters: 
       Estimate Est.Error  l-95% CI  u-95% CI      Rhat Bulk_ESS Tail_ESS
sigma 1.0090904 0.0200546 0.9706724 1.0488752 1.0000885    16750    11701

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning message:
There were 292 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 

Oh, that looks like a sampling problem, the results are no good. My best guess is that the problem is that we have just two levels of source and that doesn’t really let us to estimate the between-study variability You might need to set tighter priors on the sd parameter class, e.g. normal(0, 1) (check out set_prior).

That would partially do - but it assumes that the difference between the two studies is only in the overall mean, but not in the effect of pol or past (or the interaction). You need to decide if that’s sensible in your case. What I aimed for with the formula I proposed is to allow everything to vary by source, but also express that the studies are not independent (that’s why everything is a varying effect)…

Explicitly modelling correlation with liking ~ pol * past + (pol * past | source) (note the single |) could also potentially help…

Hope that makes sense.

Thanks martin again for your kind helps.

I tried the both ways you suggested (setting priors and using single |).
Unfortunately, the outputs do not look good.

Because the three studies we are handling are identical in terms of sampling, survey material, and analysis, we are thinking of selecting fixed effect models.

What do you think about investigating the three-way interactions to deal with the effects between the sources? i.e. liking ~ pol*past*source (study).

Thanks.

Kind regards,
IK Kim

That would be very similar to analysing the two studies completely separately, so I don’t think that’s what you want. On reflection, it might also be possible to just do liking ~ 1 + (pol*past | study).

Sorry that this is confusing - there are just quite a lot of options and what is best depends on the problem and the amount of data you have.

To avoid problems, you should do posterior predictive checks to see if your model is problematic - for example, it might be the case that analyzing the ignoring the source altogether (liking ~ pol * past) is OK - you could check this by looking at posterior predictive checks by source and if they show no problem, than the between-study variability might be low enough to be ignored (see pp_check)

Hi Martin.
the model: liking ~ 1 + (pol*past | study) did not show a good output too.

As you suggested, I have conducted posterior predictive checks.(please see below)

full_brms_eds1 <- brm(endorse~ 1 + (pol*past_cat|study_cat), data = df.total,seed = 2020, save_all_pars = TRUE, iter =1000)

pp_check(full_brms_eds1, type = "violin_grouped", group = "study_cat")

image

full_brms_att1 <- brm(envatt~1+(pol*past_cat|study_cat), data = df.total,seed = 2020, iter =1000)
pp_check(full_brms_att1, type = "violin_grouped", group = "study_cat")

image

First, I wonder if this is a right way of doing it.

The figures between the two studies seem similar in the first dependent variable.
But, the figures between the two studies seem slightly different in the second dependent variable.

Could I ignore the source?
i.e.liking ~ pol * past

Kind regards,
IK Kim

Yes, this is basically what I had in mind :-) I usually also find it useful to also do stats_grouped PP check with both mean, sd (which are usually unproblematic, but it pays to check) and then some extremal statistics (max and min or some low and high quantiles).

Nevertheless, if you got divergences, the PP checks need to be not taken too seriously.

What I think the second plot shows is that for envatt the model is actually quite a bad fit - the shapes of y_rep and y are quite different - it looks like the actual response is capped at 6 - maybe the guassian response (family) is not appropriate? (if this is some kind of likert item, then family=cumulative("logit") might be a better choice).

The think to look at is not primarily whether the distributions are different between sources but whether there is a good match between y and y_rep and - especially if the mismatch is systematically associated with study_cat

I wouldn’t make strong conclusions about the differences between the studies from a model that takes them into account. What you can do is to do the same checks for liking ~ pol * past - if those show good fit and if the good fit is maintained for other groupings (e.g. you could do smaller groups by all past_cat and study_cat combinations) than this it is probably safe to treat the between-study differences as insubstantial.

Does that make sense?