Help in reply to reviewer regarding CI

Hi Everyone,

A question on how to handle reviewers when reporting CI. We are doing RL and in our papers when reporting results I always try to put a figure with the whole posterior. We also tend to report in text the median, 89%CI and probability of direction. We use 89% mainly as a way to avoid 95% which tend to encourage “black and white” thinking (89% is as arbitrary as any other value - but as McElreath said - its a prime number - so it’s easier to remember). A reviewer noted that we should change to 95% (see below). We worked really (really) hard on this piece and I don’t want to see this rejected. On the other hand - I won’t be able to look at myself if I just change to 95%. Obviously - no conclusion is going to change - its more a matter of principle.

Can you help me put forward good arguments?

Best,
Nitzan

Reviewer:
There is only one thing that I would suggest the authors change, which is the use of 89% highest posterior density intervals. The authors cite McElreath’s “Statistical Rethinking”, which essentially states that 95% is arbitrary so we should be free to report any interval we see fit, and 89% has various properties that make it a satisfactory value. However, I would argue that the very fact that 95% is used near-universally (albeit originally quite arbitrarily) makes it the best value to use, as 1) it allows for easy comparison across studies, and 2) it is what people are used to and so facilitates straightforward and intuitive judgements about effects. It is also quite easy to miss that the CIs are 89% CIs and assume that they would be 95% CIs, and hence believe that the effects are stronger than they are.

  • I would argue there is no comparability from the beginning. Reporting 95% instead of 89% creates only a false sense of familiarity. For the sake of comparability, what most people have been used to is not “95%CI” but “95% confidence interval” (from presumably a wald estimator?). In contrast, it’s highest density interval (HDI) from a posterior in your paper after all. These are completely different things that are not meant to be compared with. Their value could be close to each other. Yet the interpretation and philosophical reasoning behind are on different tracks.
  • In practice, if publishing on this journal is of paramount importance to you and the objection from this reviewer is the only thing between you and publication, then here is what I suggest: please the reviewer and retain your integrity. Change your primary metric of reporting to 95% CI. Then clarify this CI is credibility/ probability/ highest density interval, absolutely not confidence interval however. Point out the ambiguity and arbitrariness behind the choice of 95% so that you bring up 89%. Do a little sensitivity analysis on how reporting 89% instead of 95% HDI affects your conclusion. Mention only the conclusion of this sensitivity analysis, but do keep all the process and results in supplementary material.
5 Likes

Very well said. The only thing I would suggest doing differently is not to draw conclusions from the HDI but base conclusions on posterior probabilities. And on that note it’s sometimes better to provide not just the probability of being in the right direction but the probability of the effect being > \epsilon for all \epsilon as a graph.

2 Likes

+1 to reviewer…

1 Like

The Bayesian Analysis Reporting Guidelines (BARG) are intended to be helpful in cases like this. They’re published open-access here: https://www.nature.com/articles/s41562-021-01177-7 (Disclosure: I wrote the article.)

The BARG provide guidance for reporting decisions (among all other components of an analysis). First, do you really need to make decisions, or is the decision a ritual? If you do need to state decisions, then the BARG provide advice either using a credible interval (CI) or using posterior model probability.

  • If using a CI, the BARG are agnostic about what size of interval (or type of interval, HDI vs ETI) to use. The BARG emphasize that CI limits computed from MCMC are very wobbly because the limits are (typically) in low-density regions of posterior distribution; therefore high effective-sample size (ESS) is required. I think to more deeply justify the size of a decision interval you’d have to go full-blown decision theory with specified costs and benefits; for initial thoughts about doing that see the Supplement (to a different open-access article) available here: https://osf.io/fchdr. Meanwhile, in practice, I’d say to go with whatever size of interval you think is most useful for your research and your audience; after all, if they’re not convinced by a weaker decision criterion (i.e., only 89% instead of 95%) then your research will have less impact. I agree with zhez67373 regarding one possible option to satisfy both: put the audience-preferred criterion in the article and put your preferred criterion is supplemental material.

  • If using a Bayesian hypothesis test via model comparison, the BARG recommend reporting posterior model probabilities (not only Bayes factors, BFs) and basing a decision on the posterior model probability exceeding a criterion. Notice you need to specify a decision criterion for the posterior model probability; should it be 95%? 89%? Something else? A key issue here is that the posterior model probabilities depend on the assumed prior model probabilities. Therefore, the BARG recommend reporting the posterior model probabilities for a range of prior model probabilities, and, in particular, reporting the minimum prior probability the model could have and still meet the decision threshold. This concept is illustrated by Figure 2 in the BARG. (All of this assumes you have used appropriate prior distributions on parameters within the models; see the separate section in the BARG regarding prior sensitivity analysis.)

There is lots more discussion in the BARG.

P.S. I gave a talk about this at StanCon 2023.

12 Likes