These questions came out of conversations I had today with @mariel and @ignacio.

In the stan_lm vignette, @jonah and @bgoodri suggest investigating the average treatment effect using `posterior_predict`

.

```
clouds_cf <- clouds
clouds_cf$seeding[] <- "yes"
y1_rep <- posterior_predict(post, newdata = clouds_cf)
clouds_cf$seeding[] <- "no"
y0_rep <- posterior_predict(post, newdata = clouds_cf)
qplot(x = c(y1_rep - y0_rep), geom = "histogram", xlab = "Estimated ATE")
```

From what I understand (please correct me if this is wrong), `posterior_predict`

is not just calculating \hat y_i = X_i\hat\beta (the expected value of y); it is drawing a new \tilde y_i from the posterior for each MCMC iteration. This incorporates the residual variability into the prediction.

First off, is there a problem with the above calculation, in that it assumes the draws `y1_rep`

and `y0_rep`

are independent? I would think you want to make this calculation in a way that takes into account the fact that these are the same people (really the same observation) in each case. Another way to look at it, is that I would expect y_1 - y_0 to be smaller if you are comparing those observations on the same person, as compared to two different people that happen to have the same covariates. I donâ€™t think your suggested procedure accounts for this.

Second, I just want to point out that this is not the â€śstandardâ€ť way of calculating an ATE, which is done by most software packages that Iâ€™ve come across. Most of the time, \hat y_i = X_i\hat\beta is used as the prediction, and the residual variability is ignored. When this is done, the issue I raise above is moot. Iâ€™m not saying that your approach is wrong, just pointing out that it is not the standard.

Iâ€™m curious about your thoughts to these issues. Thanks!