fit ← brm(formula = tp ~ 1 + technique + category + subject,
data = d,
prior = c(set_prior(“normal(0,1)”, class=“Intercept”),
set_prior(“normal(0,1)”, class=“b”)),
family = poisson(link=log),
sample_prior = “only” #To ignore the likelihood and check if the model is sane
)
I would expect that I later could do: ppc_dens_overlay(y = d$tp, yrep = posterior_predict(fit, draws=25))
but unfortunately, I get: Error in validate_yrep(yrep, y) : NAs not allowed in 'yrep'.
Fake data to reproduce this can be found here:
I have a nagging feeling I’ve completely missed something here…
I do think that approach should work in general. Is posterior_predict returning NA’s? If the error message is right then it seems like there are NA predictions when there shouldn’t be. If there are NAs, are there just a few or are all the predictions NA? If not then it could be a bayesplot issue (incorrect error message).
yrep_fit_posterior ← posterior_predict(fit_prio_only)
There were 47 warnings (use warnings() to see them)
sum(is.na(yrep_fit_posterior))
[1] 7536
length(yrep_fit_posterior)
[1] 280000
I just tested it and think I found the reason. The priors are so wide that they imply large numbers which cause exp(<larger number>) to overflow causing NAs to be predicted.
As a side issue it should be nsamples not draws, but that is not the problem here.
Sorry, I should have put that into more context. The scales of the predictors are so large that the chosen priors lead to very a large prior on the intercept, which is parameterised in a special way. See ?set_prior for details. I see the following output:
Thus, the intercept is the problem. You may put a prior on the intercept directly via
fit2 <- brm(formula = tp ~ 0 + intercept + technique + category + subject,
data = dat,
prior = set_prior("normal(0,1)", class="b"),
family = poisson(link=log),
sample_prior = "only" #To ignore the likelihood and check if the model is sane
)
In this case, the normal(0, 1) prior really affects the intercept instead of the transformed intercept.
@paul.buerkner already solved the problem with the intercept, but before that I was thinking about this
For poisson(link=log) they can be too wide, and compared to your data they seem to be too wide. If I read your model right, the intended model has four additive terms for log intensity with each having a prior N(0,1), and thus the prior for the log intensity would be N(0,4^2), which corresponds to assuming that there is 5% probability that the intensity is larger than 720. This may or may not be against your prior information, but at least it seems to be 36 times larger than the largest count in your data. See also @anon75146577’s blog post
It might be worth adding that the model @paul.buerkner provides in his answer still generates NAs when running posterior_predict(). However, as indicated by Paul and @avehtari, setting the prior to, e.g., set_prior("normal(0,0.1)", class="b"), removes the NAs.