Distribution of predictions using posterior_predict() vs "by hand"

I’m quite new to stan, and I’m trying to outline Regression and Other Stories in R. I’m working on a simple model of the hibbs dataset (vote ~ growth). In Chapter 9 (pg. 116), there is code provided to obtain predictions “by hand”, rather than by using posterior_predict(). The code is:

m1 ← stan_glm(vote ~ growth, data = hibbs)

new = data.frame(growth = 2.0)
y_pred ← posterior_predict(m1, newdata = new)

sims ← as.matrix(m1)
a ← sims[,1]
b ← sims[,2]
sigma ← sims[,3]
n_sims ← nrow(sims)

y_pred_byhand ← as.numeric(a + b * new) + rnorm(n_sims, 0, sigma)

Is this really the same way posterior_predict() constructs its’ distribution of predictions?

There is an unfortunate error in the book. It’s listed in the errata and the code has been fixed also at Regression and Other Stories: Elections Economy.

This

y_pred_byhand <- as.numeric(a + b * new) + rnorm(n_sims, 0, sigma)

should be

y_pred_byhand <- a + b * as.numeric(new) + rnorm(n_sims, 0, sigma)

With the correct code, you will get the draws from the same posterior predictive distribution (the draws are different as a different random seed is used).

Aki, thanks so much! I really appreciate you taking the time to help. The book and your code is fantastic!

1 Like