I am fitting a gam using brms. Looking at the relationship between day of year (doy) and bat activity (n). The model fits fine, Rhat values and ESS values are good. However, the pp_check is shown below. How concerned should I at the lack of fit? Is there a distributional family that I can try, that would fit better? Thanks for your help.
m1 <- brm(bf(
n ~ s(doy) + (1 | elevation)),
data = alldata, family = gaussian(), cores = 4, seed = 17,
iter = 2000, warmup = 1000, thin = 10, chains = 4, backend = "cmdstanr",
control = list(adapt_delta = 0.99))
Personally I wouldn’t be too happy with this posterior check. Some thoughts:
Is n an integer of sightings or something similar? It appears that you don’t have any observations below zero, but the model expects some. Moving to a Poisson model or a negative binomial could align better with the data-generating process if this is count data.
Your data is bimodal but the model cannot account for it properly. One potential reason is that you have omitted an important variable that is responsible for this large shift in n. Can domain knowledge help here?
It’s also possible that you could account for this as a mixture of two distributions, if that makes theoretical sense for the phenomenon you’re studying.
You are fitting a random intercept for elevation when there are only three levels. This is a very small number of levels for a random intercept, so it may be better to fit that as a fixed effect. Similarly, site could be fit as a random effect to take advantage of partial pooling.
I definitely recommend setting non-flat priors. You want to make sure that you’re current data doesn’t influence the priors that you set as much as possible, so I’d advise looking at some tutorials on Bayesian poisson models to see what weakly-informative priors can look like. Then I’d recommend doing prior predictive checks to see if it matches with previous research into your topic reasonably well.
Here are a few such resources:
Winter, B., & Bürkner, P. C. (2021). Poisson regression for linguists: A tutorial introduction to modelling count data with brms. Language and Linguistics Compass , 15 (11), e12439.
Hey @bhopalstiffs , I agree with @sjp that it’s hard to fit a model with random intercepts when you have only 3 upper-level groups. However, I sometimes run into this issue when I fit models to behavior-analytic data. In those cases, I can still often fit a multilevel model if I (a) use tighter non-default priors on the upper-level \sigma parameters, and (b) adjust control settings, such as increasing adapt_delta to some value greater than the default 0.8, like 0.9 or higher. Though yes, switching to a fixed-effects model is also a viable approach.
Thanks @sjp and @Solomon for the replies. Your suggestions made a world of difference in my model fit. I used a poisson distribution (since this was counts), I had already increased adapt_delta (that didn’t help), I increased the number of iterations (that helped a lot). I will work on using tighter non-default priors. Would having only one replicate in one of the upper-level groups also add to my model fit issues? I have one Site in one group, two Sites in another group, and six Sites in the last group. The travails of ecological data.