I am performing Bayesian posterior predictive checks, and I find that the posterior predictive p-value of a more complex model (a general model) is somewhat poorer (further from .5) than that of a simpler model (a nested model). Furthermore, the more complex model gets a lower posterior predictive p-value than the nested model. Is there necessarily something wrong? Everything else seems to be ok.
Yes, the priors. In principle, one should always be able to get a more general model to fit a simpler model if the prior is right. Alas, nobody knows how to design such priors other than in some special cases of penalized complexity (popular among spatio-temporal modeling folks, e.g., the Besag-York-Molie 2 model [BYM2]). I found this same thing in the latest thing I wrote on crowdsourcing, where we compare about 15 different models on a couple of data sets.
This was the problem that Andrew Gelman hired me and Matt Hoffman to solve, but we built Stan instead, because we couldn’t figure out how to tackle letting more complicated models gradually relax to simpler models. It’s a great research topic.
When Andrew and I were applying for grants, we liked to include a plot where the horizontal axis was amount of data and the vertical axis was max model complexity you can fit. The curve goes up as you get more data because you need more data to fit a complex model (the problem you’re running into), but then it goes back down with massive data because you can’t afford the compute of a complex model.