Suppose I gather some data from two different groups. My boss tells me I need to do some NHST, so I use Stan to fit a normal distribution to each group in isolation, draw from each of the two posterior predictive distributions, and pass those draws to `ks.test()`

to get a p-value. The p-value is not significant, so I conclude ‘these posterior predictive draws are consistent with having been generated by the same posterior’. My boss is happy.

I’m wondering whether this interpretation of a non-significant p-value seems correct in this context, and more generally, whether this strikes people as a defensible use of the Kolmogorov-Smearnov test.

I haven’t been able to find much written on this topic, but I see one comment on this forum alluding to the use of the KS test in posterior predictive checking.

This seems like a bit of boss pleasing that doesn’t have basis in sound statistical practice. Assuming that the data are different for the two groups, the posteriors are different, and drawing enough posterior predictive samples would be sufficient to observe that. There are better measures than a KS test statistic (or corresponding p-value) to characterize how similar or different two distributions are based on samples from those distributions, but using those methods might run afoul of your boss since they would not be NHSTs.

The standard way to apply NHST to your problem would likely be to test against the null hypothesis that the group means are identical via some parametric model.

Thanks Jacob! This is a helpful sanity check.

There are better measures than a KS test statistic (or corresponding p-value) to characterize how similar or different two distributions are based on samples from those distributions

Are you thinking of things like degree of overlap and probability of superiority? Curious whether any others would be top-of-mind for you.

My first thought might be the KL divergence. You could try to compute the KL divergence in the posterior predictive distributions itself (though I don’t know much about techniques for estimating KL divergence from samples), or you could compute the posterior distribution (over iterations) of the KL divergence between the per-iteration posterior predictive distributions (which iteration-wise are known analytically).

2 Likes