I’m completely new to Bayesian approach and I have a question that might be quite stupid (or even wrong)…How do you report posterior probability distributions, like put into words? I used brms for parameter estimation (I suppose)…and I got some posterior distributions of mean difference between conditions…I searched around and it seems that I can either report median+CI or mode+HDI to describe the probability masses. But I’m just not sure how to write it in text.
Can I describe it as “Given the data, there is 80% (or 95%) certainty that there is a difference between the A and B conditions (80%(or 95%)HDI […, …], Posterior Mode = )”?
Or “We are 80% certain that there is a difference between the conditions (80%HDI […, …], Posterior Mode = )”?
What is the most comprehensive way to write it?
Please help me out (or correct me) and thank you in advance!!
It’s hard to give a one size fits all answer here but here’s some relevant literature (somewhat psychology centric) to get you started:
If you have some sample data and model, and a short example paragraph of your results, feel free to post here and I’ll give feedback when I’ve a moment.
I wouldn’t get hung up on HDI, etc.—these kinds of thing are trying to discretize the posterior in ways that don’t always make a lot of difference. The coverage of a 95% central interval and 95% HDI interval is the same.
This is problematic in that it’s based on the assumption that you have modeled the true data generating process. I’d just talk about estimates made with this model and then I’d try to do things like cross-validation or using held-out evaluations to externally validate.
Save that for an appendix. Usually you want to pick out some summary statistics that make a point. A typical thing to do is report central (or HDI) intervals, means, and medians. Just be careful in that there’s a lot of uncertainty around the endpoints of a 95% interval estimating with MCMC.
Much appreciated!! Just two followup questions:
1.If I have good pp check, does that mean the model is alright? Or I only know that after cross-validating my model (I suppose I can use loo package)…
2. Can I make some kind of decisions/inferences based on where my posterior distributions locate? I know that posterior intervals and point estimates are only for descriptive purposes and I can’t make inferences based on that. The probability mass is what matters but I still need to articulate it…
Some background info: My model is fairly simple: Just my dv ~ Condition, and I used skew-normal family
The study is a between-subject design with four conditions (one of them is a control group), and I wanted to see if there is difference between the experimental groups and the control.
You need a statistic of the data to estimate with a posterior predictive check. They’re designed to test the statistic, not whether the whole inference is correct.
PPCs check within data—that is, they fit the data, then check how well the data fit the model.
Cross-validation is an out-of-sample test, or at least an out-of-training-data test. LOO is really only going to help you compare two models—it doesn’t give an interpretable measure of a single model unless you know what the expected log posterior density should or could be.
Following Andrew Gelman, I think it helps to shift your thinking from “is there a difference” to “estimate the size of the difference”. Significance is basically a function of the magnitude of the effect and the amount of data you have.