Priors for Bernoulli GLMM


I am looking for a good resource/a tutorial on building informative priors - I have data consisting of Presence/Absence outcomes for a certain bacteria. I have two different types of sample types and am interested in evaluating whether this bacteria is more likely to appear in one sample than the other. I have access to data that could help me build an informative prior for the intercept but I’m 1. not sure on the exact steps for doing this and 2. would like to be able to point to a (preferably) well-regarded academic work as the basis for my methodology. I apologize if this is a simple question or if this has been answered elsewhere - I’ve been looking but can’t seem to find a paper or tutorial that goes in depth on using other data to build informative priors. I’m pretty new to the Bayesian world so please also point out if there’s something majorly wrong in my understanding of the above. Any help, guidance, or advice would be greatly appreciated!


Hi. Do you have some example or simulated data? And do you already have a model in mind? Are you using R or python? There are a few R packages like rstanarm, brms, and rstan that can run models like this.

Hello, thank you for taking the time to help me. I’m using the brms package in R -

The data consists of binary outcomes (present/absent) from cultured environmental samples. At each sampling location, multiple environmental samples were taken and the question of interest is whether certain environmental substrates yield higher rates of this organism than others. Therefore the model would be a GLMM, with presence/absence of the organism as the explanatory variable, sample type as the explanatory and sampling location as the grouping factor/random effect. I have access to a broad survey across a larger sampling region which would be helpful in determining an informative prior on the intercept (i.e - the baseline chance of getting a positive result from a generic sample) but I’m not sure how to translate that into a prior. I was wondering whether any resources or tutorial existed for crafting informative priors from prior analyses (most resources I’ve found just mention that its possible rather than detail the process)

I can generate data if that would be helpful in visualizing the issue though - just let me know if that’s needed for this type of question

I don’t think there are particular tutorials on choosing priors unless someone else has done a similar project to yours. The priors usually come from domain knowledge. That said the default priors in brms are a good starting point and you could get the model up and running just to see what it looks like.

My workflow for priors is:
Plot the data

Run a model with slightly informative priors (but know what those are)

Use some combination of looking at the date, previous research, expert opinion, field notes etc to set priors (and note in the R script what the prior means and it’s source)

Run a new model with the more informed priors.

Go back and simulated data, run the better model and check to make sure I can recover my parameters Some folks do this first, I get impatient and want to look at my data. It’s better to do this first. Really way better to do this first.

If the model with the simulated data fails then check the assumptions we made about the model.

I’ll write up a brief example from our riparian habitat model.

Thank you for your detailed answer! When you say recover your parameters - do you mean the parameters you used to simulate the data? I’m at the simulation stage but I think I’ve gotten a bit lost in what to do next in the process. I’d definitely be interested in the example you mention if you wouldn’t mind taking up more of your time to write it up!

Yes the parameters used to simulated the data. So if in my simulated data I have something like depth to ground water predicts tree productivity and I have depth to ground set at 0.5 per unit of productivity, my 50% uncertainty intervals should include 0.5 for that predictor. If not it’s likely my model is messed up.

Over here Stan simple model for pharmacokinetics @martinmodrak talks about another way to set priors if you are missing domain knowledge.