I have never really gotten into the Bayesian p-value train because of their limitations, but recently had them come up in the back of my mind when I was working on an analysis. One of inferences I would like to make is that, based on the assumption that a meta-analysis model I’ve constructed is valid, it estimates that there is only a 20% chance that the difference between two treatments is at least a clinically significant value. The idea is that I would like raise the question about the value of future research comparing these treatments compared to directing those efforts to identify entirely novel treatments/focusing on implementation.

This all seemed perfectly reasonable to me since it is the same logic used to make my 95% credible intervals around the treatment estimates themselves, but it got me thinking about whether someone might come along and levy valid criticism that this is more appropriately handled through model comparisons? Otherwise, why wouldn’t we just use our models to say that the probability that a difference between treatments is greater than zero is xyz instead of building null models?

If I understand correctly, you’ve run some data through a model, a generating posterior distribution for the difference between treatments, and find that 80% of the samples fall within an interval that you defined ahead of time as reflecting clinically meaningless differences. If that’s the case, then yes, you can go ahead and reframe it as 20% chance that there’s a clinically meaningful difference.

A model comparison would give you a metric of the relative weight of evidence of between the two models being compared, which would speak to the degree to which the amount of data you’ve collected can be trusted to have overcome any misinformed-ness of your priors in each model. But since you are presumably using weakly informed priors and a relatively well-established interval for clinically meaningless differences, the fact that the majority of your posterior samples fall into this range speaks to the same thing (I’d argue) Model comparison enthusiasts are welcome to correct me here!

A searchable/citeable term for the interval you constructed for clinically insignificant differences is “Region of Practical Equivalence” or ROPE, which I think was introduced as a term by Kruschke.

2 Likes

Thanks Mike! Are you still at Dal by the way? Would be great if we could get a little local group of people interested in Stan/Bayesian methods together. It can be lonely sometimes when running into issues/thinking of new ways to solve problems.

Yup, still in Halifax. @Bob_Carpenter and @mitzimorris hosted a Stan workshop over in CS a few weeks ago, so there might even be a decent number of people using it now.

Interpretation of marginal interval of a parameter is easy, if that parameter is not correlated with some other parameter. If that is the case, then you can do as you thought.

The treatment effect can also correlate with some other covariate effects. Here’s an simple example (not treatment effects, but something similar could happen) where marginals are wide and overlap 0, but joint highest posterior density region is far from zero.

In this latter case, model comparison approaches can be useful.

I’ll talk more about this in StanCon in Januray…

1 Like

Sorry to have missed the presentation! News never made it across the street to the IWK I guess.