Bayesian Hypothesis testing

Hello. I’m new to the “Bayesian” world and wanted to ask how hypothesis testing is done properly since p-values are frowned upon amongst Bayesians. I have heard about the Bayes Factor but didn’t see a practical example yet.

Let’s say, I have the model:

y = alpha + beta * x

And I want to know if predictor x has a significant outcome on y. How do I go on from here?
Also should I do this with an R function or directly code it into my Stan file?

Here’s a minimum viable code:

R part:

## Load Data

data("radon", package = "rstanarm")
data <- radon

# Seperate variables
logradon <- data$log_radon
floorind <- data$floor
N <- length(logradon)

# Lists for STAN
cplist <- list(Y = logradon,
               X = floorind,
               N = N)

Stan part

data {
	int<lower=1> N;
	vector[N] X;
	vector[N] Y;

parameters {
  real alpha;
	real beta;
	real<lower=0> sigma_y;

model {
  vector[N] mu_y;

	for (i in 1:N)
	mu_y[i] = alpha + beta * X[i];
	Y ~ normal(mu_y, sigma_y);
	sigma_y ~ cauchy(0, 2.5);
	alpha ~ normal(0, 5);
	beta ~ normal(0, 5);

  //vector[N] log_lik;
  //for (i in 1:N) log_lik[i] = normal_lpdf(Y[i]| alpha + beta * X[i], sigma_y);

generated quantities {
  vector[N] y_pred;
  vector[N] indiv_squared_errors;
  real <lower = 0> sum_of_squares;
  real <lower = 0> root_mean_squared_error;

  for (i in 1:N) {
    y_pred[i] = alpha + beta * X[i];
  indiv_squared_errors[i] = (Y[i] - y_pred[i])^2;
  sum_of_squares = sum(indiv_squared_errors);
  root_mean_squared_error = sqrt(sum_of_squares/N);

You can check if the credible interval of the posterior distribution of your parameters do not include 0.


It seems like the values are far away from 0 but is that “sophisticated” enough to consider them as significant variables?

Yes, as you are working directly with probabilities that’s a pretty direct statement.


Btw in classic inference, when you do an hypothesis test to check if the parameter \beta =0, is equivalent to check if 0 is in the confidence interval of \beta.

So as @stemangiola recommends, checking credible intervals is a similar procedure.

Plus, with credible intervals (CI) you actually see the probability that the real \beta (not the estimated one) is in the CI


Can you clarify this point?

If you have a sample X= X1, X2, X3,…,Xn and a unknown parameter beta then the posterior is

P(beta/ X) the probability that a beta occurs now that you have incorporated the sample info.

A credible interval CI by definition is

P(beta belongs to CI / X)

So a credible interval at 95% can be tell as:
The probability the beta belongs to CI is 0.95

But in classic inference a confidence interval IC is

P( G(X) belongs to IC / beta)

That is the probability that a sample statistic(G(X) ) belongs to the IC, conditioned that there is a fixed beta is 0.95

So in classic a IC is the probability that a sample gives you a desired value. And in bayesian a CI is the probability that beta actually belongs to CI.?


Thanks for the good explanation!

Though, I still have a question:

I’ve learned that for frequentists they regard beta as a fixed parameter and for bayesians they regard beta as a random variable. So i can say with a 95% probability the random variable beta is between a given range (credible interval)? (is that a correct statement?)


That’s my understanding. So when I report out a β (depth to groundwater) it often looks like this:
50% uncertainty interval
l-50% UI u-50% UI
-0.01 0.17


Well yes,
CI95% = [1,2]

Is the probability that beta is between 1 and 2 given the used data is 0.95

1 Like

You might be interested in bayestestR - has many functions for Bayesian inference, and lots of documentation.

Suppose that a model has several variables,

y=\beta_0 + \beta_1x_1+...+\beta_n x_n.

How to test the null hypothesis that \beta_1=..=\beta_n=0?

Is there any alternative method for the above null hypothesis?

Comparison of the Bayes factor of two models y=\beta_0 + 0x_1+...+0x_n.
and y=\beta_0 + \beta_1x_1+...+\beta_n x_n. is a method, is there any other primitive method?

For accepting/rejecting whether a variable has an influence on another using the posterior density interval, you might also be interested to see the approach taken by Kruschke, here:

Kruschke, J. K. (2018). Rejecting or accepting parameter values in Bayesian estimation. Advances in Methods and Practices in Psychological Science , 1 (2), 270-280.

Slightly beyond just saying the credible values exclude zero, he suggests identifying a ‘range of practical equivalence’ where there is essentially no real difference - e.g., a very small value away from 0 might be considered as not convincingly rejecting no effect.


Hi, just a few more thoughts on this, as I’ve just written an answer on a similar topic.

The elephant in the room: All the approaches are conditional on the model(s) you use being 100% correct (including priors etc.). Since (except some parts of physics) your model is basically never even roughly correct, model comparison/selection/hypothesis testing - whatever you call it - can be very brittle. Various approaches fail in different contexts, but there is no general way to do this “best”. You need to be roughly correct about the “important” things, but what is “important” is not fixed. A huge question is what your actual final goal is.

Some of the options you have:

Determine range of practical equivalence (Expanding on what @JimBob wrote). Strictly speaking P(\beta_1 = 0) = 0 for all continuous priors on \beta_1 (and we didn’t even start with combining multiple parameters). And that makes sense - nature doesn’t like zeroes, most things have small and/or highly variable effects, but believing the mean effect is exactly zero makes IMHO little sense. But you can use domain expertise to say that e.g. a difference of 0.5 is practically irelevant. P(|\beta_1| < 0.5) and by extension P(|\beta_1| < 0.5 \cap |\beta_2| < 0.5 \cap ... \cap | \beta_n | < 0.5) can be computed directly from posterior samples (but the probability will shrink hugely as you add more variables). If you don’t want to put a strict threshold (which you IMHO shouldn’t), you can compute the probability for a range of thresholds. Or you can compute the posterior distribution of \max_i{|\beta_i|} or of \sum_i \beta_i^2 and just make decisions based on this.

Compare a simpler model to the full model: Separately fit a model with fewer parameters, omitting some of the effects. You can use the loo package to approximate comparison of predictive performance via leave-one-out crossvalidation. Alternatively you could use Bayes factors to do that, but those can be problematic, as they are very sensitive to the priors you use in your model. Some more interesting criticism by Danielle Navarro or Data Colada. (Disclaimer: I’ve never used bayes factors myself). You get relative expected predictive performance (LOO) or improvement in relative KL-divergence to the true process (BF) of each models. Do you care about those?

Think qualitatively Danielle Navarro has a great essay about model selection and how purely mathematical approaches can fail us: Between the devil and the deep blue sea. Checking whether the models satisfy some qualitative properties can also be of interest.

Hope that makes sense and is at least a bit helpful.


Good summary by @martinmodrak. I just recorded (about hour ago) a video discussing these topics
You can skip to the first slide with title Hypothesis testing.