Presenting Bayesian results to non-Bayesians


I’m curious if anyone has any guidelines or references on presenting inferences from Bayesian models to non-Bayesians or non-statisticians?

For example, I’m trying to present summary statistics, such as mean and standard deviation, for a parameter reflecting pathology, for clinicians who need to make a decision about a surgery. It seems they have a hard time taking the standard deviation into account, and it seems the interpretation of a cumulative survival like statistic (e.g. % above threshold, based on samples from inference) would be easier.

I couldn’t help be think this has been talked about before, but I couldn’t find much on Google about this.


Why not stick to probabilities? Then you can just use quantiles of samples and say there’s a X% chance of the parameter being in a range. Plenty of clinical tests come with a reference range (for better or worse) so you could go that route too. You probably have a good idea (or parameters) describing what the reference range would be.


I think my colleagues are a little stuck on the mean+/-std view of things, and they’re worried about probabilities not communicating certainty or lack thereof. As an analogy, perceptual studies show that pie charts are terrible as is the jet colormap, and those studies often suggest alternatives for presenting the same quantities in a better way. I guess my question is less about the presentation being quantitatively faithful, and more about successfully communicating notion of uncertainty to non-statisticians.


The book [Statistical Rethinking] would help produce such a presentation.


Thanks for the recommendation, the book looks like an interesting read.


What do you do with that view other than mentally construct a (arbitrarily centered) credible interval?

References to perceptual studies would sure be useful.


I agree with you on both accounts, in particular I’m curious what happens when e.g. a statistically naive subject has to choose between alternatives where mean and standard deviations are provided for some notion of pathology, e.g. remove tissue A w/ pathology 3 +/- 1, vs tissue B w/ 2 +/- 2, vs the presentation, remove tissue A w/ pathology 80% vs tissue B 75 %. Which presentation results in choices with better accuracy and/or precision with respect to data distribution?


for clinicians who need to make a decision about a surgery

Why not use a decision analytic framework, particularly if making the decision is the goal.

Here is a simple setup (not sure if it is relevant to your case, but should illustrate what I mean)

Say you have a two-dimensional (simplifying) surface which represents an area of the brain where the

signal will likely propagate. Let’s say a surgeon has to decide where in that area she should

make an incision. Some parameter vector “colors” that region with the likely propagation pattern,

say like a weather map. Over this region, we create a grid of possible incisions. Now we need to

choose the specific one. Can we not define a loss function such that the benefit of making an

incision is offset by the risk/cost? If so, we could compute the expected value in some utility/loss

units for each incision line and pick the one with the max/min expected value?

The benefit here is that we move from the probability scale to risk/benefit scale, which is more

natural for practitioners to think about. The challenge, of course, is to specify an understandable

loss function, which I can see could be hard in this case. (In finance, it is sometimes easier,

particularly when the utility is linear in payoffs, which is often the case, in which case we can just

express everything in expected $$ lost/gained)


hi Eric, thanks for chiming in!

You’re absolutely right: the practitioners we work with are trying to minimize the risk/benefit ratio. The challenge is in the loss function, as perhaps in contrast to other fields, the model makes predictions only about pathology (benefit via surgery), but cannot predict loss of function (risk). A secondary reason we don’t want to make direct recommendations for surgery is that it would likely change the class of the clinical trial, with a heavier regulatory burden ensuing.

That said, this would be a great suggestion for the presurgical implantation of electrodes, where the loss function is more obvious. We don’t have enough data to pursue that in depth for the moment, though.


once you leave p-values and point-estimates you are on territory which is not too familiar for many - at least this is my experience with this audience. Expressing things in a way which connects to your audience is critical. So quoting probabilities about crossing thresholds relevant for safety or efficacy is useful. This again introduces the black-white threshold thinking, but if used carefully this can be useful.

It is really hard - and I am often confused by the fact the outputs of my models are reduced to their point estimate while I always plot the uncertainty along with it. If at all, the 95% interval is given some credit. I don’t know a good solution yet. I thought about not plotting the point estimate anymore and rather show 95% and 50% intervals - but this is too drastic. I am curious if @andrewgelman or others have good suggestions. The least you can do is always communicate the sample size going into whatever you show.

… another attempt is to escape all this using plots, but those usually also show the median estimate or what not (and people ask for tables). However, thinking about the topic is crucial. The approach by @ericnovik to connect our models with real world consequences is a good one, but certainly take out all technicalities. Bayesian models are often complex for many reasons, but we need to adhere to the KIS (keep it simple) concept by all means.

Not sure if this helps…


Yep this is a bummer. As you suggested, translating uncertainty to more familiar quantities like confidence intervals or p-values seems like it’s the best that can be done.


What Eric said. If your collaborators are interested in using this inference for decision making, then it makes sense to take this step. Point estimates, standard errors, and uncertainty intervals don’t do that, so at best they are intermediate steps in this process.

Yes, to make decision recommendations you need to put numerical values on the potential outcomes of the study–but such a step is helpful in thinking through your problem in any case. What’s important is that you make such assumptions clear.

It can be a good idea to try different values for these assumed decision-relevant parameters, then you can show your colleagues the dependence between these inputs and the recommended decisions.

Another potentially useful step is to make the decision recommendation based on your point estimate (for example, posterior mean or median) of the parameters, and compare this to your analysis using the full posterior distribution. In many cases these will be similar.

Finally, I recommend that you be clear on how your prior distribution impacts your posterior inference and decision recommendations. If your data comes from a noisy study, the prior can make a big difference in giving more realistic forecasts.


If we can not do a loss function then I think what Krzysztof is suggesting is better than communicating standard deviations. I really don’t like SDs since it is not easy even for professionals to figure out what they mean unless you are dealing with (approximately) normal distributions.

In our projects, I usually force event probabilities on the consumers of our analysis (they don’t always take the medicine, but I feel like it is on us to show that the method is useful.) The way we do it is by showing an interactive dashboard where they manipulate the values of some theta, and we dynamically compute the probability of a parameter being greater or equal or in range of some values. We have done this in PKPD where say we manipulate some rate parameter (from PK) and compute the Pr of some biomarker (PD) being greater or less than a certain value.

As a side note and from my experience in finance, where some of the models in use are pretty complex, users understood them by using the models repeatably until they got an intuitive feel for a) sensitivities of the model to changes in inputs; and b) comparing model predictions to observed data and recognizing where the model is weak and where it is strong. The famous Black-Scholes model for option pricing have been internalized this way by traders who do not understand (or care) about the differential equations that give rise to the model.