Pp_check looking wild...how bad are they?

T_Han · March 29, 2024, 12:49pm

Hi all,
I ran some models with brms and my pp_checks look wild…Just wonder how bad are they…

I think it is because the distribution is bimodal and I run the code by default using gaussian. However, I’m facing a tricky situation (probably because I am really new to brms /dont know what I am doing):
So I have a 2 (B: present vs absent) 2 (C: high vs low) between subject design and I care more about group comparsions for the B
The code below is the model that come up with the first pp check…
F1.q23 ← brm(Q23 ~ BC, data = F1, iter = 6000, sample_prior = ‘yes’,
prior = c(prior(normal(0, 6), class = “b”)))
f1.q23 ← emmeans(F1.q23, ~ B*C)
cont ← contrast(f1.q23, “tukey”, reverse = TRUE)
cont_p_f1 ← gather_emmeans_draws(cont)
mode_hdi(cont_p_f1)
median_qi(cont_p_f1)

I tried to set the family as mixture but emmeans wont run…
So will the wonky pp check of the model Q23 ~ B*C be a serious problem if I’m mainly interested in the contrasts?

danielparthier · March 30, 2024, 1:52pm

Hey,
there are some things which could explain posteriors like you see here. One is that it could be a mixture of different distributions (e.g. normals).
How they come about is the actual question and you hinted at one possible explanation: Subject variance. It is a bit difficult to tell without seeing the input data and what the plots directly refer to, but how it looks to me is that there is something clearly not captured by the model which leads to different values (inside a group?) with an error around. The error you estimate though does not capture this grouping and tries to account for all the variance (global). So random effects should probably be included in your model.

Another explanation could be that you don’t exactly work with continuous data and it is artificially chopped into steps.

In summary:

Not “great” but you can identify the issues step by step :)
You are on the right track with the pp_check. You probably should also check the prior prediction (outcome of the model without the data used).

danielparthier · March 31, 2024, 10:01am

I saw that in the data set there is no subject id. Do you have this information somewhere? This might be crucial because baseline responses in evaluation might be subject-dependent. If you have this then you can use it as random effect for at least the intercept. A mixed gaussian as modeled currently probably does not reflect how the data was generated, right?
Also the response variable is probably rounded which explains some of the shape. One can account for that in Stan but not in brms directly (if I’m not mistaken).
But again subject id might make your life a lot easier.

T_Han · March 31, 2024, 10:36am

Thanks again! The sample was not nested or anything so the id is just simple 1,2,3,…So I tried random effect but it gave me an error

FB1$id <-1:nrow(FB1)
FB1.q71_1 <-brm(Q71_1 ~ B*C + (1|id), data = FB1, iter = 6000)
Warning messages:
1: Rows containing NAs were excluded from the model.
2: There were 648 divergent transitions after warmup. See
Runtime warnings and convergence problems
to find out why this is a problem and how to eliminate them.
3: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
Runtime warnings and convergence problems
4: Examine the pairs() plot to diagnose sampling problems
5: The largest R-hat is 1.08, indicating chains have not mixed.
Running the chains for more iterations may help. See
Runtime warnings and convergence problems
6: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
Runtime warnings and convergence problems
7: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
Runtime warnings and convergence problems

desislava · March 31, 2024, 11:12am

I think the posterior predictive check may be misleading and it would be better to use one of the “grouped” variants: “dens_overlay_grouped” instead of “dens_overlay” (and not a mixture for the response).

Both predictors B and C are binary. Here is are histograms of the outcome Q71_1 for each of the four combinations of B and C. There is a clear difference between C = 0 and C = 1 (B doesn’t make a strikingly big difference) and the peaks at about 65 when C = 0 and 85 when C = 1 clearly corresponds to the two peaks in the PPC plot.

T_Han · March 31, 2024, 11:38am

Thank you so much!! I didn’t know pp check has this variant and I believe that my problem has been solved. Thanks everyone, hooray!

Topic		Replies	Views
Plot doesn't look good from pp_check() in brms Modeling brms	3	1124	January 27, 2023
How to fix odd pp_check results? Modeling brms	2	95	August 1, 2024
How bad is this pp_check? Should I alter the distribution? Modeling fitting-issues , specification , brms	28	300	March 24, 2025
Bugs for pp_check, brms -to- emmeans communication, and strange "features" brms emmeans	3	1368	August 12, 2019
Estimating contrasts (code peer-review) brms specification , brms , emmeans	3	818	May 20, 2021

Pp_check looking wild...how bad are they?

Related topics