What does rstanarm's predict do, exactly?

As a student of I am studying “Regression and Other Stories” the R-script below confuses me:

library(rstanarm)
library(tidyverse)

set.seed(1)
n = 100
x = runif(100, 1, 10)
a = 100
b = 0.5
y = a + b*x + rnorm(n,0,2)
fit = stan_glm(y ~ x, data = tibble(y,x), refresh=0)
# First way.
p0 = predict(fit) |> as.numeric()
# Second way
sims = as.matrix(fit)
p1 = colMeans( sims[,1] + sims[,2] %*% t(x))  # a + (b*c)

#  p1 and p0 though very close are not the same

The (apparent) equivalence of p0 and p1 is from https://github.com/avehtari/ROS-Examples/tree/master/Introclass in the plot_residuals.R script.

So how does predict for p0 differ from p1 ?

Hi @peter2108 and welcome to the Stan forums.

I think this one is for @andrewgelman or @avehtari.

If I had to guess, it would be that with a regression

y = \alpha + \beta \cdot x + \epsilon with \epsilon \sim \textrm{normal}(0, \sigma),

there is the linear predictor \alpha + \beta \cdot x but also normal error \epsilon. But becuase the error has a symmetric distirbution, if you simulate a bunch of \epsilon^{(m)} \sim \textrm{normal}(0, \sigma), then

\lim_{M \rightarrow \infty} \frac{1}{M} \sum_{m=1}^M \alpha + \beta \cdot x + \epsilon^{(m)} \rightarrow \alpha + \beta \cdot x.

In section 9.2 of Regression and Other Stories, we discuss the functions predict, posterior_linpred, and posterior_predict. The “predict” function uses the point prediction, so I don’t think Bob’s conjecture about the error term is correct. I would’ve guessed that, in your example, p0 and p1 would have been the same, so there must be something I’m missing here.
Jonah Gabry would know the answer because I think he’s the one who programmed this in rstanarm.

I was worried that I had missed something obvious, so am reassured by andrewgelman’s answer. Thanks to both for replying.

@peter2108 I think you’re right to be confused!

Andrew is right that predict uses a point prediction, but it’s strange because it takes posterior medians of the coefficients and then constructs the linear predictor using those point estimates. We don’t really recommend using it after MCMC (although it will come out to be quite close to correct in many cases, like you saw here), hence this disclaimer on the doc page:

This method is primarily intended to be used only for models fit using optimization. For models fit using MCMC or one of the variational approximations, see posterior_predict.

The reason for this strangeness is because rstanarm’s predict() method (for historical reasons) is implemented to match up with existing predict() methods associated with lm() and glm() in R, which just grab the point estimate of the coefficients and then create the linear predictor. The point estimate that’s used throughout rstanarm for MCMC models is the median (along with the MAD_SD instead of mean and SD). So to get the equivalent of your p0 = predict(fit) you would actually need to do:

median(sims[,1]) + median(sims[,2]) * x

But that’s not really recommended because for MCMC fits we have posterior_predict() and posterior_linpred(), which do the right thing from a Bayesian perspective (which is why the doc has that disclaimer about using predict()). Your computation of p1 = colMeans( sims[,1] + sims[,2] %*% t(x)) makes more sense from a Bayesian perspective and is the equivalent of colMeans(posterior_linpred(fit)). colMeans(posterior_predict(fit)) should be the same in expectation but there’s added noise due to the inclusion of the residual standard deviation sigma.

Sorry for the confusion. Hopefully that helps clear it up and didn’t make things even more confusing!

3 Likes

It is clear now. Chapter 9 (p 118-119) presented three functions:

We can compute the point prediction:
y_point_pred_2 <- predict(fit_2, newdata=new)
and simulations of the linear predictor:
y_linpred_2 <- posterior_linpred(fit_2, newdata=new)
And we can compute posterior predictive simulations for a single new person of height 70 inches:
y_postpred_2 <- posterior_predict(fit_2, newdata=new)

My takeaway is that in rstanarm the first of these is not needed. Given y=a+bx +error use posterior_linpred for simulations of y_i from x_i with no error term and posterior_predict to get y_i with an error term.

These two versions of p1 with A = sims[,1] + sims[,2] %*% t(x) are exactly the same:

p1=posterior_linpred(fit) |> apply(2, mean) 
p1=A |> apply(2, mean)                      

These two versions of p0, as Jonah told me, are exactly the same:

p0 = predict(fit) |> as.numeric()           
p0 = median(sims[,1]) + median(sims[,2])*x      

Though the matrix version q0 = A |> apply(2, median) is slightly different, I think I can assume q0 to be what predict does.

Thank you very much for your time!

Yes, exactly.

I think this is just because doing this computation using all posterior draws and then taking the median isn’t guaranteed to be identical to taking the medians of the draws of a and b first and using them to compute the prediction.