Linear model assumptions check


In Frequentist framework when someone runs a linear model has to check the model assumptions and this can be done easily in r by using the plot function

plot( lm(y ~ x))

And from that plot can get a flavour of heteroscedasticity for example. As simple as that.

There is a simple way to perform similar check using rstan equivalent to plot()?


I don’t think there’s anything automatic with rstan. There probably are with rstanarm and brms though.

I think you’d compute the residual in a generated quantities block. Assuming that is called residual, then you could pull that out in R with:

residual <- as.matrix(fit_simple, pars = "residual")

And that would give you a matrix of draws of residuals and you could plot that however.

bayesplot does some of this. Check the bottom example here:

Gelman had a post on assumptions of linear regression here you might find interesting: .

1 Like

With brms, you can easily get the fitted/predicted values with fitted() and the residuals with residuals(). After that, it’s easy to make the predicted vs residuals plot using either the base R plot() function that you suggested, or the qplot() function from the tidyverse package.

The only thing you have to be wary of is that the objects returned by fitted() and residuals() on a brmsfit objects will be data frames with multiple columns: the mean draw (point estimate), standard error, 2.5% percentile, 97.5% percentile. For the fitted vs residual plot, you only need the point estimates, i.e. the first column.

Therefore, the full code for the fitted vs residuals plot would look something like this:

point_preds <- fitted(your_model)[, 1]
point_errs <- residuals(your_model)[, 1]

qplot(point_preds, point_errs)

In addition, as far as I understand, posterior predictive checks (PPCs) in the Bayesian framework fill a similar role to checking of assumptions in the frequentist framework (or maybe a better thing to say would be that PPCs augment the checking of assumptions). For example, visualizing whether the distributions of simulated data drawn from the posterior predictive distribution match the distribution of your observed data is a rough check that your model isn’t horribly misspecified (e.g. by predicting normally distributed response when it really was distributed as a heavily skewed negative binomial). So in a way, PPCs are like checking of assumptions, although they don’t always mean your model breaks down if they’re not perfect, instead they often suggest ways of improving/expanding your model (which checking of assumptions can as well).

There’s a lot of options for different kinds of posterior predictive checks with the pp_check() function (see