Hierarchical Model Design: Predictive Regression

rstan

#1

Hello,

I’m trying to build a hierarchical Bayesian model to analyze some data. The data are a list of predators, observations of their prey, and the groups to which the predators below. They’re formatted roughly like this:

PredatorGroup - PredatorSubgroup - PredatorName - PredatorMass - PreyMass
A - A_1 - Dog - 5 - 1
A - A_1 - Dog - 5 - 2
B - B_1 - Hawk - 2 - 1.25
A - A_2 - Bear - 10 - 5
… - … - … - … - …

What I would ultimate like is to predict the mean & variance of the prey mass distribution, given a particular predator & it’s mass.

Currently, this is the model I’m attempting to use:

bform <- bf(
ppreymass ~ 1 + predmass + (1 + predmass || group/subgroup/predname),
sigma ~ 1 + predmass + (1 + predmass || group/subgroup/predname)
)

Priors <- get_prior(bform, data = Data)
Prior <- c(
prior(student_t(3, -1, 5), class=“Intercept”, coef=""),
prior(normal(0.5,0.2), class=b, coef=“predmass”, dpar=“sigma”),
prior(normal(0.5,0.2), class=b, coef="", dpar=“sigma”),
prior(student_t(3, 0, 1), class=sd, coef="", group=“group:subgroup:predname”),
prior(student_t(3, 0, 1), class=sd, coef="", group=“group:subgroup”),
prior(student_t(3, 0, 1), class=sd, coef="", group=“group”), dpar=“sigma”),
prior(normal(0.5,0.2), class=sd, coef=“predmass”, group = “group:subgroup:predname”, dpar=“sigma”),
prior(normal(0.5,0.2), class=sd, coef=“predmass”, group = “group:subgroup”, dpar=“sigma”),
prior(normal(0.5,0.2), class=sd, coef=“predmass”, group = “group”, dpar=“sigma”),
prior(student_t(3, 0, 1), class=sd, coef=“Intercept”, group=“group:subgroup:predname”),
prior(student_t(3, 0, 1), class=sd, coef=“Intercept”, group=“group:subgroup”),
prior(student_t(3, 0, 1), class=sd, coef=“Intercept”, group=“group”)
)

Fit <- brm(bform, data = Data, family = gaussian(), chains = 4, warmup = 5000, iter = 10000, control=list(adapt_delta = 0.999, stepsize=0.1, max_treedepth = 15), prior = Prior, cores=4, save_model = “diet_Model”, save_dso = TRUE)

This dataset is large (>200,000 rows, 4 groups, 16 subgroups), and so both takes a long time to run, and is giving me convergence problems. That said, I think I can handle those issues in the long run, but if my approach/model/priors are fundamentally flawed, I’d like to know that before I invest too much time going down a flawed path.

Any advice would be greatly appreciated, thank you!

Please also provide the following information in addition to your question:

  • Operating System: Windows 10
  • brms Version: 2.7.0

#2

Some thoughts:

  • I find it non-intuitive that weight of prey would be a linear function of weight of predator - do you have some theory to back this up?
  • Since weight is a positive quantity the normal model might be problematic as it would predict negative weights. Predicting log(ppreymass) or using exponential/gamma response would probably be better - once again you should have some theory to back the choice of the response. Intuitively, I would expect the distribution of prey weights to be skewed, e.g. predators would not shy from ocasionaly hunting prey that is way smaller than them, but rarely go above a certain mass threshold
  • You can test that you’ve encoded your theory/expert knowledge correctly by using prior predictive checks (see the Visualisation paper), brms lets you use sample_prior='only' for this use case.
  • It is advisable to start with a simple model and only add additional terms if posterior predictive checks show some problems with the fit. For example, letting the sigma to vary before understanding why a model with fixed sigma is wrong may lead you to problems.
  • It is also advisable to develop and test the model with a subset of the full dataset to let you try models more quickly

And yes, following this process to develop a model is quite time-consuming - definitely more of a hassle than just writing down a formula that seems OK. But you are much more likely to be correct, so it depends if you are willing to pay the price :-)