# How to adjust model specification based on posterior predictive check result

Before providing the details of the actual model, I want to first collect some general ideas regarding how to adjust model specification based on posterior predictive check result if the model seems to be misspecified. One thing I know is that for example if I am modeling count data using a Poisson distribution and the observed data show more 0s than predicted data, it is better to use negative binomial distribution instead of Poisson distribution. But besides that, I am not sure what specific instructions could posterior predictive check give.

In my case, I am modeling count data with zero-inflated negative binomial distribution and random effect. The data drawn from the posterior predictive distribution is a lot larger than the observed data. Below is the QQ plot with x-axis representing observed data and y-axis representing predicted data. The line is the x=y line.

In addition, by checking the proportion of 0s in the observed data vs. the predicted data, I found that the proportion of 0s in the predicted data is actually higher than the observed data. So I switched to negative binomial distribution but it doesnâ€™t help at all. I was wondering if there is any general advice on this type of situation? I will provide more details of the model if someone is interested in taking a closer look.

1. Will model reparameterization affect posterior predictive check? Another way of asking this question is that, will reparameterization change a misspecified model into a more correctly specified model?
2. I have also seen high autocorrelation during model fitting. Is this related to the specification of the model or it is an independent issue that is related to other things like sampler and number of iteration?

Thanks!

2 Likes

Hi,
sorry for not getting to you earlier, this is a relevant and well written question!

Unfortunately I am not aware of any such general ideas. In my experience, the problems and solutions differ extremely from case to case and there are few shared principles beyond â€śunderstand your model and dataâ€ť and â€śthink hardâ€ť :-/

So I donâ€™t think I can provide much better advice without seeing the model and the data.

That is not unexpected - it is quite possible that the fitted zero inflation in the inflated model was well informed by the data and fitted as very low (as the data donâ€™t seem to support much zero inflation) and thus both models could provide almost identical predictions.

I would try to understand why does the model predict so large values. One possible cause is that your prediction code/model do not match or have other bugs? It is quite rare for linear models to not get at least the overall mean of the data right, but it seems like your model has problems even there, hence I would suspect a bug.