I experience a strange situation with a zero-one-inflated-beta model in brms. The model runs fine, the only thing I noticed was that the random intercepts for every participant are very big in comparison with the over-all effects. But I know that there is a lot of between-participant variance in my data (see my data my_example_data.csv (113.0 KB) ) so I assume that a lot of variance in the random intercept for participants is a good thing. I analyzed the data with the code in this file Real_Data.R (1.4 KB).
When I extract the fitted values with built-in functions such as marginal_effects() from brms or add_fitted_draws() from tidybayes, both give me values from the posterior that clearly represent the original data (real mean of reference category = 0.462, mean of posterior of Intercept = 0.451).
However, if I sample directly from the posterior by using posterior_samples() and then manually convert the values to the original scale by the inverse link function (plogis in this case since the link function is logit), I get very very different values (mean of posterior of Intercept= 0.761). Inspection of the estimates of the model also suggests these very different values.
I have done the following:
Using tidybayes::add_fitted_draws(scale = “linear”) returns (to my understanding) samples from the posterior on the link-level, so it should do the same thing I assume posterior_samples() is doing. If I run those values through plogis() I get the correct predicted values on the original scale. However, I don’t really understand all that is going on under the hood of these built-in functions (marginal_effects() and add_fitted_draws())
changing the categorical predictor to a linear one leads to the same problem
I tried to find out whether under the hood, brms or stan or anything changes the link function that is actually used but didn’t find anything.
Reducing the model (no or less random effects, reduction of other predictors) does not improve the situation. Neither does running a beta regression instead of a zero-one-inflated beta regression.
I tried to reproduce the problem creating synthetic data (using this code Synthetic_Data.R (2.1 KB) ) but here, the different methods to get to the posterior predicted values all agree on roughly the same values. I suspected the large between-participant-variance and tried to build that into my synthetic data but I still was not able to reproduce the error.
So now I don’t know what else to do. I would prefer not to go on and use the values of marginal_effects() or add_fitted_draws() and ignore that I didn’t get the right values “by hand”. I suspect that something in my data is causing the problem but I don’t know what that could be.
Any help is appreciated!
Thank you very much!
- Operating System: Mac OS X 10.14
- R version 3.5.2 (2018-12-20)
- RStudio: 1.1.463
- brms Version: 2.9.0