Hi everyone, I’m new to Gaussian Process Regression and Bayesian methods. I’m curious about how to perform hypothesis testing using predictions from a Gaussian Process. For example, the null hypothesis is that \hat{f}(x^{*})=5, I understand that I can simply check if 5 is within the credible interval. However, I think this approach is more aligned with frequentist testing rather than Bayesian methods.
I’ve read about the ROPE (Region of Practical Equivalence), and I’m wondering if ROPE is the appropriate approach in this case. Additionally, I know that Bayes factors can be used for model comparison, but can Bayes factors be applied to hypothesis testing as well?
Hi, @Joyyyyyy: Hypothesis testing isn’t so popular among Bayesian statisticians. What are you hoping to accomplish with a hypothesis test?
You can find some situations in which Bayesian posterior intervals line up with frequentist confidence intervals, but they’re fundamentally different constructions and Bayesian posterior intervals aren’t quite right for creating calibrated hypothesis tests.
If you’re careful, you can use Bayesian methods to create estimates which can be hypothesis tested using frequentist methods, but you have to characterize the variance of the implied frequentist estimator over different data sets to formulate the test.
Bayes factors are for comparing models under the prior predictive distribution, whereas cross-validation is for comparing models under the posterior predictive distribution. They don’t directly correspond to hypothesis tests.
It looks like ROPE is something from Krushcke that depends on highest density intervals. It looks to be as reductive as a standard hypothesis test plus a little fudge factor.
@Bob_Carpenter Thank you for your comment. I use the Bayesian method to estimate the treatment effect, and I would like to test whether this treatment effect is zero or not. In this case, do you have any recommendations for hypothesis testing? Alternatively, what is the most popular way to determine if this treatment effect is zero?
As Bob alludes to above, Bayesian hypothesis testing, in the sense of trying to replicate existing Frequentist tests and their long run error rates, is a complicated and somewhat controversial area. See e.g. [2206.06659] 50 shades of Bayesian testing of hypotheses for an overview.
However, if you are using the term “test” somewhat more loosely and simply want to summarize the evidence (or lack of) for a given effect size the simplest is to just compute the posterior probability that the effect size is greater than some relevant threshold (zero is a common one but sometimes it may be more reasonable to pick a clinically relevant value). If you have access to the MCMC draws you can estimate this by taking the number of draws that exceed the threshold divided by the total number of draws.
This is not a “hypothesis test” in the formal sense of the word but a perfectly fine way of summarizing the estimated effect size.
Sorry to be pedantic here, but this is not quite what hypothesis testing does. When you test against a null hypothesis of an effect being zero and get a significant result (e.g., p < 0.05 or whatever your threshold is), you’re not concluding that an effect is zero. What you’re doing is saying that the data at hand is not sufficient to reject the possibility that it might be zero.
From a probabilistic point of view, if you have a continuous density, any point’s probability is zero, even if the density is non-negative. So you can’t do the Bayesian test \Pr[\alpha \neq 0 \mid y] because it’s always 1. But the interval (0, \infty) is measurable, so you can test \Pr[\alpha > 0].. But, if \alpha is modeled as continuous with support only on non-negative \alpha (e.g., like you might have for a value that’s a concentration) this will again be 1.
The bottom line here is that Bayesian stats isn’t really set up for frequentist hypothesis tests. If you want to compare, the more direct thing to do is use posterior intervals in place of confidence intervals (not for hypothesis tests for the reasons mentioned above, but for summarizing inference).