(Using brms v2.3.0)
If I fit a model using
fit <- brm(formula = tp ~ 1 + technique + category + subject,
data = d,
prior = c(set_prior(“normal(0,1)”, class=“Intercept”),
family = poisson(link=log),
sample_prior = “only” #To ignore the likelihood and check if the model is sane
I would expect that I later could do:
ppc_dens_overlay(y = d$tp, yrep = posterior_predict(fit, draws=25))
but unfortunately, I get:
Error in validate_yrep(yrep, y) : NAs not allowed in 'yrep'.
Fake data to reproduce this can be found here:
I have a nagging feeling I’ve completely missed something here…
I do think that approach should work in general. Is posterior_predict returning NA’s? If the error message is right then it seems like there are NA predictions when there shouldn’t be. If there are NAs, are there just a few or are all the predictions NA? If not then it could be a bayesplot issue (incorrect error message).
Quite many NAs…
yrep_fit_posterior <- posterior_predict(fit_prio_only)
There were 47 warnings (use warnings() to see them)
But none when I disregard
I just tested it and think I found the reason. The priors are so wide that they imply large numbers which cause
exp(<larger number>) to overflow causing NAs to be predicted.
As a side issue it should be
draws, but that is not the problem here.
nsamples I got to know about just a few minutes back by reading the docs ;)
You mean that my N(0,1) and cauchy(0,1) are too wide? And I thought they were fairly tight…
I’m also confused that you said normal(0,1) is wide
Sorry, I should have put that into more context. The scales of the predictors are so large that the chosen priors lead to very a large prior on the intercept, which is parameterised in a special way. See
?set_prior for details. I see the following output:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept -0.13 18.23 -36.15 35.96 4000 1.00
technique -0.02 1.02 -1.99 2.01 4000 1.00
category -0.01 1.00 -1.96 1.95 4000 1.00
subject 0.01 1.00 -2.01 1.98 4000 1.00
Thus, the intercept is the problem. You may put a prior on the intercept directly via
fit2 <- brm(formula = tp ~ 0 + intercept + technique + category + subject,
data = dat,
prior = set_prior("normal(0,1)", class="b"),
family = poisson(link=log),
sample_prior = "only" #To ignore the likelihood and check if the model is sane
In this case, the
normal(0, 1) prior really affects the intercept instead of the transformed intercept.
This is helpful. Thank you
@paul.buerkner already solved the problem with the intercept, but before that I was thinking about this
For poisson(link=log) they can be too wide, and compared to your data they seem to be too wide. If I read your model right, the intended model has four additive terms for log intensity with each having a prior N(0,1), and thus the prior for the log intensity would be N(0,4^2), which corresponds to assuming that there is 5% probability that the intensity is larger than 720. This may or may not be against your prior information, but at least it seems to be 36 times larger than the largest count in your data. See also @anon75146577’s blog post
Thanks @paul.buerkner and @avehtari, much clearer now and much appreciated. And thanks to @jonah et al. for Bayesplot!
It might be worth adding that the model @paul.buerkner provides in his answer still generates NAs when running
posterior_predict(). However, as indicated by Paul and @avehtari, setting the prior to, e.g.,
set_prior("normal(0,0.1)", class="b"), removes the NAs.
Also related to the prior predictive checking section of the visualization paper (Dan Simpson is everywhere)