Using Pearson residuals to validate models

Hello,
I have being using brms to fit various GLMs. Coming from Frequentist statistics, I always validate the models through a residual analysis using either Pearson or deviance residuals that should be normally distributed (for reasonably large sample sizes) when the model fits well the data.
(I hasten to point out that neither Pearson nor deviance residuals work well for binary or count GLMs, and randomized quantile residuals obtained by simulation should used instead; I can provide references if needed)
In Bayesian stats we no longer have a single residual per data point but a posterior distribution of residuals per data point. It is easy enough to obtain the mean of each residual for all data points and to plot them against the mean of the fitted values for each data point, and to make a QQ plot to assess their normality.
After this long preamble (sorry), here are my two questions:

  1. the current version of brms_2.13.5 says that Pearson residuals are soon going to be deprecated. Can anybody explain why this would be so and what residuals would be put in place?
  2. I wonder if (and somewhat fear that) the central limit theorem would unavoidably make the average Pearson residuals to be normally distributed even for badly fitting models.
    I know that I am somewhat transposing the procedure of Frequentist residual analysis to the Bayesian realm. Is this wrong or plain silly? Bayesians would not seem to use residual analysis for model validation for reasons that I ignore, and they might seem to prefer other means of validation based on posterior predictive distributions.

I would appreciate any comments and suggestions on these queries.
Cheers

PAblo

1 Like

There are some posts about best practices with model checking in a Bayesian Workflow. Here is one: https://www.monicaalexander.com/posts/2020-28-02-bayes_viz/ but there are others.

So our workflow in brms is something like this:
Explicit data workflow (why these choices not others)
Plots of the raw data
Explicitly set all priors with set_priors including documenting choice of prior
pairs plots to look for weird correlation
pp_check
Plot variance in the τ parameters
summary(k_fit_brms, prob = 0.5), 50% uncertainty intervals
Check all marginal effects with conditional_effects

There is also shinystan which will dump out many model diagnostics.

1 Like

Dear Ara,
Thank you your response. I was aware of some discussions and paper on the Bayesian workflow, as well as on the prior and posterior predictive distributions, and the LOO. Of course these are a great way forward.
But my queries aimed at a more general point about the convenience of using Pearson (or other) residuals to assess the quality of model fit, in analogy with was is done in Frequentist statistics.
Andrew Gelman has frequently criticized Bayesian modelers for not paying to much attention to model validation. Before writing here, I did look in the several Bayesian books and papers and the few that aim to assess the model goodness of fit might seem to prefer calculating statistics from the posterior predictive distributions, and using the LOO criterion. These are all fine and I can do (most of) them. But the point of my message is that I wish want to understand any theoretical reasons why the use of (Pearson or other) residuals would seem to be discouraged or at least infrequent. Any pointing towards books or papers addressing these issues would be greatly appreciated.
Regards,
PAblo

Hola,

Yeah bayesian model assessing is still an open issues most because is computationally expensive.

Well, in a Bayesian approach you can check the predictive errors instead of the residuals. The difference between the predictive distribution and the data.

For a Bayesian pearson test see this:

Might help you.
Asael.

1 Like