Uncertain of model fit based on pp_check (brms, generalized additive model)

bhopalstiffs · May 1, 2024, 6:59pm

Hello,

I am fitting a gam using brms. Looking at the relationship between day of year (doy) and bat activity (n). The model fits fine, Rhat values and ESS values are good. However, the pp_check is shown below. How concerned should I at the lack of fit? Is there a distributional family that I can try, that would fit better? Thanks for your help.

m1 <- brm(bf(
  n ~ s(doy) + (1 | elevation)),
  data = alldata, family = gaussian(), cores = 4, seed = 17,
  iter = 2000, warmup = 1000, thin = 10, chains = 4, backend = "cmdstanr",
  control = list(adapt_delta = 0.99))

pp_check

sjp · May 1, 2024, 7:22pm

Personally I wouldn’t be too happy with this posterior check. Some thoughts:

Is n an integer of sightings or something similar? It appears that you don’t have any observations below zero, but the model expects some. Moving to a Poisson model or a negative binomial could align better with the data-generating process if this is count data.
Your data is bimodal but the model cannot account for it properly. One potential reason is that you have omitted an important variable that is responsible for this large shift in n. Can domain knowledge help here?
It’s also possible that you could account for this as a mixture of two distributions, if that makes theoretical sense for the phenomenon you’re studying.

bhopalstiffs · May 1, 2024, 9:01pm

Hey sjp,

Thanks for the response.

Yeah, using the Poisson made a huge difference. n are counts so no expectation of negative values.
There are 2 sites that have a lot higher activity than the other sites, so adding Site into the model produced this pp_check. A lot better!

poisson w Site

However, now I have large Rhat values and really low ESS.

Family: poisson 
  Links: mu = log 
Formula: n ~ s(doy) + Site + (1 | elevation) 
   Data: alldata (Number of observations: 1016) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 10;
         total post-warmup draws = 400

Smooth Terms: 
            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sds(sdoy_1)     2.27      0.52     1.30     3.33 1.22       14       20

Group-Level Effects: 
~elevation (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     3.86      2.75     0.31    10.19 1.72        7       24

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     5.76      5.63    -1.71    19.67 2.69        5       15
SiteBUL11    -0.82      7.40   -15.45     9.10 3.27        5       13
SiteBUL18    -5.17      6.18   -20.40     3.60 1.67        7       16
SiteBUL4     -3.14      7.40   -17.76     6.78 3.29        5       12
SiteGRP1     -2.15      7.40   -16.79     7.78 3.28        5       13
SiteGRP5     -3.04      7.40   -17.66     6.84 3.27        5       13
SiteLBL5     -1.50      7.40   -16.12     8.44 3.27        5       13
SiteTHL1     -4.21      6.18   -19.40     4.56 1.67        7       16
sdoy_1       -1.92      0.55    -3.02    -1.00 1.14       21       75

I am using the default priors from brms.

prior     class      coef     group resp dpar nlpar lb ub       source
                 (flat)         b                                                default
                 (flat)         b    sdoy_1                                 (vectorized)
                 (flat)         b SiteBUL11                                 (vectorized)
                 (flat)         b SiteBUL18                                 (vectorized)
                 (flat)         b  SiteBUL4                                 (vectorized)
                 (flat)         b  SiteGRP1                                 (vectorized)
                 (flat)         b  SiteGRP5                                 (vectorized)
                 (flat)         b  SiteLBL5                                 (vectorized)
                 (flat)         b  SiteTHL1                                 (vectorized)
 student_t(3, 3.8, 2.5) Intercept                                                default
   student_t(3, 0, 2.5)        sd                                      0         default
   student_t(3, 0, 2.5)        sd           elevation                  0    (vectorized)
   student_t(3, 0, 2.5)        sd Intercept elevation                  0    (vectorized)
   student_t(3, 0, 2.5)       sds                                      0         default
   student_t(3, 0, 2.5)       sds    s(doy)                            0    (vectorized)

Do you have any suggestions about stronger priors that might help?

Thanks for your helpful response!

sjp · May 1, 2024, 9:27pm

I’m glad that you’re on the right track!

So I think a couple of things can help here.

You are fitting a random intercept for elevation when there are only three levels. This is a very small number of levels for a random intercept, so it may be better to fit that as a fixed effect. Similarly, site could be fit as a random effect to take advantage of partial pooling.
I definitely recommend setting non-flat priors. You want to make sure that you’re current data doesn’t influence the priors that you set as much as possible, so I’d advise looking at some tutorials on Bayesian poisson models to see what weakly-informative priors can look like. Then I’d recommend doing prior predictive checks to see if it matches with previous research into your topic reasonably well.

Here are a few such resources:

Winter, B., & Bürkner, P. C. (2021). Poisson regression for linguists: A tutorial introduction to modelling count data with brms. Language and Linguistics Compass , 15 (11), e12439.

Solomon · May 8, 2024, 1:53pm

Hey @bhopalstiffs , I agree with @sjp that it’s hard to fit a model with random intercepts when you have only 3 upper-level groups. However, I sometimes run into this issue when I fit models to behavior-analytic data. In those cases, I can still often fit a multilevel model if I (a) use tighter non-default priors on the upper-level \sigma parameters, and (b) adjust control settings, such as increasing adapt_delta to some value greater than the default 0.8, like 0.9 or higher. Though yes, switching to a fixed-effects model is also a viable approach.

bhopalstiffs · May 11, 2024, 7:29pm

Thanks @sjp and @Solomon for the replies. Your suggestions made a world of difference in my model fit. I used a poisson distribution (since this was counts), I had already increased adapt_delta (that didn’t help), I increased the number of iterations (that helped a lot). I will work on using tighter non-default priors. Would having only one replicate in one of the upper-level groups also add to my model fit issues? I have one Site in one group, two Sites in another group, and six Sites in the last group. The travails of ecological data.

Topic		Replies	Views
How to fix odd pp_check results? Modeling brms	2	89	August 1, 2024
Plot doesn't look good from pp_check() in brms Modeling brms	3	1083	January 27, 2023
How bad is this pp_check? Should I alter the distribution? Modeling fitting-issues , specification , brms	28	277	March 24, 2025
BRMS pp_check with logistic regression brms	1	1111	June 15, 2019
Improving pp_check results Poisson model Modeling fitting-issues , specification , poisson , brms , count-data	5	101	September 2, 2024

Uncertain of model fit based on pp_check (brms, generalized additive model)

Related topics