Unexpected prior predictive behavior in brms; do warmup iterations matter?

Hi all,

I’m observing some unusual results from prior predictive checks and am curious if this is expected (or more to the point: desirable) behavior. My understanding of prior predictive checks is that we sample or simulate values from the prior distributions defined to get a sense of how the model behaves and check this against our domain knowledge. This procedure does not seem like it should require warmup iterations, yet they seem to matter – at least in brms. Why? As a point of comparison, warmup iterations do not seem to matter in rstanarm.

Here’s a small reprex (tied to the dataset that brought this issue to my attention, but I suspect this may be more global behavior). The example shows that without warmup the prior predictive distribution has quite a few aberrant iterations (which are centered around 0 generally speaking and fall well outside the expected range of 9-16). Adding warmup iterations solves the issue as far as I can tell.

ex_data = tibble(FID = gl(50,16), litter = rep(rep(1:4, each=4), 50)) %>% 
mutate( fAge = rnorm(nrow(.), mean = 0.5*litter, sd=0.1), 
fAge_c = fAge-mean(fAge), 
y = rnorm(nrow(.), mean = 12.3, sd = 0.9) )

prior_in = prior(normal(13,1), class=Intercept) + 
prior(normal(0, 0.4), class=b) + 
prior(gamma(1,3), class=sd) + 
prior(gamma(1,2), class = sigma)

ex_model_nowu = brm(y ~ 1 + fAge_c + (1+fAge_c|FID) + (1|FID:litter), data = ex_data, chains = 4, iter=1000, warmup=0, sample_prior = 'only', prior = prior_in)
pp_check(ex_model_nowu, type = 'dens_overlay', nsamples=NULL)
pp_check(ex_model_nowu, type = 'stat', stat = function(x) { sum(!between(x,9,16))/length(x)})

ex_model_wu = update(ex_model, warmup = 500)
pp_check(ex_model_wu, type = 'dens_overlay', nsamples=NULL)
pp_check(ex_model_wu, type = 'stat', stat = function(x) { sum(!between(x,9,16))/length(x)})
  • Operating System: macOS 10.14.6
  • brms Version: 2.13.0
1 Like

A prior distribution can be sampled with MCMC just like any other distribution. It’s quite convenient to do the sampling this way cause all you need to do is not include the likelihood terms. This may seem a bit kludgy since with a prior we know the distributions in closed form usually, but if you’re going to do MCMC on the full model, presumably you can also do MCMC on the priors and that’ll work.

If you’re doing MCMC, you’ll still need warmup and all that. I suspect this is what brms is doing.

1 Like

It looks like the brms generated code (you can look at this with the make_stancode function) includes an if statement for including the likelihood or not, so I think it’s doing MCMC for its priors.

Ah, that all makes sense even if it’s not how I imagined brms setting up the prior sampling. I had glanced at the generated code, but hadn’t fully unpacked that prior MCMC sampling was taking place. Problem, in so much as there was a problem, solved.

Thanks for pointing me towards the answer!

1 Like