Help understanding and setting informative priors in brms

Note: I’ve marked @martinmodrak’s first post as the solution, but everything he wrote is very informative and helpful!

  • Operating System: Windows 10
  • brms Version: 2.8.0

Hey all,

I’d like some basic guidance on informative priors for a brms model of Weibull-distributed data. I’ve had trouble finding a worked example that a Bayesian neophyte like myself can understand.

I’m modeling ‘percentage of suitable habitat lost’ as a dependent variable for some arboreal lemurs. These percentages are Weibull-distributed. I’d like to know whether traits like body mass, dietary habits (i.e. frugivore, omnivore, folivore), and IUCN Red List threat level (e.g Least Concern, Endangered) are associated with habitat loss.

Onto the prior: Madagascar has lost 44% of its forest cover in the last 60 years (Vielledent et al. 2018). Therefore, I expect arboreal species to have lost at least some habitat, maybe 44% on average. However, there are other factors that can influence habitat loss like human activity and climate change. I want to set pretty weak priors since my predictors definitely do not capture all the variation.

When I learn about my prior options with get_prior, I get these defaults:

student_t(3, 0, 10)        Intercept                                          
student_t(3, 0, 10)        sd                                          
gamma(0.01, 0.01)          shape

Where I get stumped is expressing my prior knowledge in this format given the weibull distribution. Why don’t I set a prior for the population-level effects? Are priors usually the same across intercepts and sd’s, or is that a coincidence? What does that shape prior mean? And finally, what would weakly informative priors for my model, given known forest loss of 44%, look like?

I really like how many assumptions are made explicit in Bayesian work, so if I’ve made some erroneous ones, please let me know! I’d love to actually understand what’s going on with brms’s defaults and how to make them work for my data.

Below please find some example code to get the default priors and the model I ultimately want to run. Thank you in advance for your time!

library(stats)
library(brms)

#make some dummy data
habitat_lost <- rweibull(n = 100, mean = 0.40, s = 0.2)
    while(any(habitat_lost < 0 | habitat_lost > 1)) { habitat_lost <-	rnorm(n = 100, mean = 0.4, s = 0.2) }
mass <- rnorm(n = 100, mean = 2.8, s = 0.6)
IUCN <- as.ordered(as.numeric(sample(c(1:5), 100, replace=TRUE, prob=c(0.1, 0.1, 0.2, 0.2, 0.3))))
diet <- sample(c("frugivore","omnivore","folivore"), 100, replace=TRUE, prob=c(0.35, 0.35, 0.3))
data <- as.data.frame(cbind(habitat_lost, mass, IUCN, diet))
data$mass <- as.numeric(as.character(mass))


#examine the prior options and the brms default
prior = get_prior(habitat_lost ~ mass + diet + (1|IUCN), data = data, 
                  family = weibull())
prior

#the model I ultimately want to run, with a mildly informative prior
full_mod = brm(habitat_lost ~ logmass + diet +(1|IUCN), data = data,
               family = weibull(), prior = prior, chains = 4,
               iter = 110000, warmup = 10000, thin = 200)
1 Like

I think the best general approach are “prior predictive checks” (see [1709.01449] Visualization in Bayesian workflow) Some rough guidelines are also in Prior Choice Recommendations · stan-dev/stan Wiki · GitHub though they are likely to not apply directly. In the end, you just need to think hard about your data and - with the words of Dan Simpson - “Draw pictures and have fear” :-)

If in doubt, run the analysis with multiple prior settings (i.e. “multiverse analysis”) and inquire further if any of your inferences change based on prior choice.

That’s just a default, probably to better match what some frequentist packages do. You should set priors.

If you have information that indicates differences, use it. Frequently, when you only use weak arguments for prior choice (e.g. the coefficients don’t make the estimate blow up), the same arguments apply to all predictors and hence the same priors.

Hard to say without simulation. Note that Andrew is basically against diffuse gamma priors (although in a different setting): Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)

Hope that helps and best of luck!

2 Likes

Hi Martin,

Thank you so much for your response! The papers you linked are very interesting, and I wish I had more Bayesian background so I could fully enjoy them. If you’ve got the time to help me understand better: these prior predictive checks seem like a great idea if you don’t have a great idea of what your prior should be and you’re worried your prior poorly fits your observed data. Is that correct? Is it still a good strategy if I do have some (incomplete) idea of what my prior should be (the 44% forest lost)?

In addition to wishing my stats courses hadn’t been so frequentist-focused, I’m also still struggling with how to translate these ideas into R code. If I wanted to run prior predictive checks, what would the code look like for that? Once I feel good about my prior, what should it look like given the weibull distribution of my dependent variable? Can I still use student_t priors if my data aren’t gaussian?

Again, thank you so much for your time! I am really excited to actually understand this stuff.

The main idea is that a well-formed Bayesian model is usually generative, i.e. you can simulate data straight from the model. For most models, having priors for all parameters is sufficient to be generative. So say my model is:

Y_i \sim Weibull(u_i, v) \\ \log(u_i) = \alpha + \beta X_i \\ \alpha \sim Student_3(0,10) \\ \beta \sim Normal(0, 2) \\ v \sim Gamma(0.01,0.01)

Note: I’ve never used Weibull distribution, so this might be a particularly bad way to use it.

Anyway, I can now work my way from priors to Y_i, first using rt, rnorm and rgamma to draw \alpha, \beta and v respectively. Then I need to choose X_i - as X_i are the predictors, they are not constrained by the model, but you can probably find a way to span the range of X_i you have in your data (or use your actual data). With that I can compute \log(u_i) and then draw X_i using rweibull. This gives me one simulated dataset which I can somehow plot to see if it is sensible - say showing percent of habitat lost per country. I repeat to have a “flipbook” of such plots. If too many of the plots in the flipbook are implausible (say they only predict 0% or 100% habitat lost), the priors are wrong. You still need to think about why they are wrong and what would be a sensible change to make them better behaved.

If I understand you correctly, this is exactly the kind of knowledge you bring to prior predictive checks. 44% forest lost is IMHO not your prior on model parameters, it is your current belief about plausible model outcomes. Prior predictive checks are a way to make sure your priors for model parameters are consistent with your beliefs about plausible outcomes. Does that make sense?

This is in principle perfectly OK, priors have no simple relationship with data “shape”. The hard part is to make sure your inferences are relevant for the real word (which is hard regardless of the distribution families you use for your prior or likelihood or anything else)

Hmm, I think I might be rather confused now. It seems like the fact that I expect to see non-zero habitat loss is not something I should incorporate into my model of % habitat loss as a prior. That makes sense for model terms to me, but I still don’t understand why this wouldn’t be relevant to the intercept. Or maybe because I’m using some categorical variables like diet I don’t need to worry about the intercept so much?

If the forest loss statistic is my only prior knowledge of what I’d expect to see with this model (other than all my responses are percentages), it seems like I should use an uninformative prior. If that’s the case, is it okay to use the defaults in brms? I know @paul.buerkner encourages people to think carefully about their priors and to not be so afraid of informative ones, but perhaps that’s not appropriate for my case?

I might have overstretched my point a bit. What you can (and IMHO should) incorporate in your prior (and hence check for in prior predictive checks) is some rough domain knowledge like “% habitat lost is < 100 for at least some species” or “species tend to differ substantially in their % habitat lost” (or whatever would be considered uncontroversial common knowledge in your domain). The trick is also to not completely forbid configurations inconsistent with this expert knowledge, just make sure they are rare, i.e. put small prior weight on them. It is IMHO good practice to find good priors via prior predictive checks and then make them a bit wider for this case. This way, if the data suggest at an a priori unexpected configuration, it can still be discovered.

In many response families (not sure what Weibull does), the intercept and the model coefficients tend to interact in not-completely-intuitive ways so often considerations for the “effect” are intertwined with considerations for the intercept.

Hard to say - do you have tons of data? Are you sure your model is well identified. If yes, uninformative priors should be OK. If not you can run into trouble. But hey - just try and see what happens :-) As long as you check your fit and don’t ignore warnings you should be on the safe side.

Also now I am confused - if your response is percentages, why would you use Weibull, which has support over all positive reals? Maybe Beta response is more appropriate?

I definitely want to do this! What would the code look for a prior predictive check, or is there a vignette you can direct me to?

I have ~100 species (each species is a sample), which is a ton for a primatologist, but probably not in many other fields.

As to the distribution, I thought beta at first as well, but then I used the excellent descdist function in the fitdistrplus package in R. My data could have been a couple of families based on the Cullen and Frey graph produced, so I used fitdist to compare WAIC values for Weibull, gamma, beta, and some other common fits. Weibull came out with the best WAIC score, so that’s what I’ve proceeded with.

Also, thank you for your continued help, Martin! You are a gem.

I am not aware of a vignette, but it is conceptually quite easy: first you simulate data - either you write your own code in R or you can use prior_only = TRUE with brms. The latter is less work for you, but the simulation then takes more time and sometimes does not work (i.e. the model does not converge). Also when using brms for simulations you usually need to write some code to convert the results from brms format to something you can plot.

Then you plot the simulated data in the same/similar way you would plot your actual data for exploratory analysis (this is very use-case specific), the only difference is that you are creating a plot for each sample from the prior predictive distribution, but that is really just a for loop.

Overall, it tends to be non-negligible amount of work but is IMHO worth it.

(blushes)

1 Like

Unfortunately, this brings me back to my initial question, because I still don’t know how to turn ideas like

“% habitat lost is < 100 for at least some species” or “species tend to differ substantially in their % habitat lost”

into prior notation in brms. What would a prior that conveyed that information to a brmsfit look like?

I don’t know :-) The point is that you setup a prior (say following the rough guidance for non/weakly informative priors in Prior Choice Recommendations · stan-dev/stan Wiki · GitHub), simulate data and see if the simulated data tend to usually have the properties you want. If they don’t, you change the prior and test again.

Any ideas, @paul.buerkner, on how to translate an idea like “percent habitat lost is always between 0-1, but more likely to be above 0.44” or “species tend to differ substantially in their % habitat lost” into a prior for brms?

The only prior I know how to set up for Weibull distributed (or indeed any non-normal) data is the default prior provided by brms. Thus even if I knew how to code a prior predictive check as @martinmodrak thoughtfully suggested, I would still only have one prior to work with, and nothing to compare it to.

Setting reasonable informative priors is hard and an active area of research. I have now other suggestions that what Martin said, which is use prior predictive checks. You can do this via argument sample_prior = "only" and then creating prior predictive plots via pp_check.

3 Likes