Computing Bayes Factor in hurdle model

Hi all,

I have a question regarding computing the Bayes Factor for a predictor of interest in a hurdle poisson model. With a linear model, I would fit the model with and without the predictor and use both of them in the bayes_factor() command. However, the 2 models I am effectively fitting simultaneously in a hurdle model make it hard for me to wrap my head around. Is it possible to simply fit the models specified below and calculate the Bayes Factor? And that would then be the Bayes Factor for the addition of the predictor predicting both the 0s and the counts? Or would I only remove the predictor on one of the two processes in fit2 and then have the Bayes Factor for the addition of the predictor on either one of the two processes? Or does this approach not work at all in a hurdle model? I am confused at this point.

fit1 <- brm(bf(outcome ~ 1 + predictor + (1 | pp), hu ~ 1 + outcome + (1 | pp)), 
              data = data, family = hurdle_poisson())
fit2 <- brm(bf(outcome ~ 1 + (1 | pp), hu ~ 1 + (1 | pp)), 
              data = data, family = hurdle_poisson())
bayes_factor(fit1, fit2)

Thanks in advance for any help!

1 Like

Sorry for not getting to you earlier.

Unfortunately Bayes factors are quite tricky to get right and easy to mess up. I admit to not understanding them much. So I think there definitely is a risk when using them for more complex models than linear regressions, although - in theory - they should work even for the hurdle case. Which models to actually compare should be primarily dervied from theory/research question you are investigating, no general rules here.

Tagging @Henrik_Singmann for potential additional insights (hopefully he has time).

Here are my current thoughts on some other options you have beyond Bayes factors, that might be easier to reason about: Hypothesis testing, model selection, model comparison - some thoughts

Best of luck with your model!

There is nothing too concrete I can add here. From a purely technical perspective, the code should run (with enough posterior samples), but if it would lead to any meaningful results is an entirely different question.

The first issue is the question of the priors, as our documentation states (under Warning):

Note that the results depend strongly on the parameter priors. Therefore, it is strongly advised to think carefully about the priors before calculating marginal likelihoods. For example, the prior choices implemented in rstanarm or brms might not be optimal from a testing point of view. We recommend to use priors that have been chosen from a testing and not a purely estimation perspective.

So without careful consideration of the appropriate priors, your approach is not recommended.

In case you have identified appropriate priors, I would probably consider running four different models:

  1. full model (with predictor in both formulas)
  2. predictor in mean formula only
  3. predictor in hurdle formula only
  4. no predictor

Then you can calculate posterior probabilities


Thanks a lot for your responses, Martin and Henrik!
I think I understand better now why BFs may not be the way to go to get at the problem I try to learn about.
I will read some more about model comparison and try to understand better how to approach the question at hand.

Again, thank you very much for the help!


1 Like