I am building some AR(1) process models, and I want to generate some posterior predictive distributions. In my reading of the documentation/internet I see everyone use _rng functions. Why is that necessary? What is the difference between

generated quantities {
matrix[T, NC] Y_pred; //predictions
for (c in 1:NC){
Y_pred[1,c]=Y[1,c];
for (t in 2:T){
Y_pred[t,c] = normal_rng(alpha[c] + beta[c] * Y[t-1,c],sigma[c]);
}
}
}

and

generated quantities {
matrix[T, NC] Y_pred; //predictions
for (c in 1:NC){
Y_pred[1,c]=Y[1,c];
for (t in 2:T){
Y_pred[t,c] = alpha[c] + beta[c] * Y[t-1,c];
}
}
}

The former reflects the fact that even if two observations have the same value of alpha[c] + beta[c] * Y[t - 1, c] they will have different values of Y[t, c] due to noise / measurement error / etc. It also reflects your uncertainty in sigma, which is the standard deviation of said noise / measurement error / etc. As such, the posterior predictive distribution should closely resemble the marginal distribution of the data, whereas it will not β even when the model fits well β for the latter way you describe, which is not even a posterior predictive distribution but rather the posterior distribution of the conditional mean of the outcome. So, the first way is right and the second way is wrong for most purposes.

I am curious about the following. Would appreciate your straightening out any misconception.

If in the end, one would only use the mean of the posterior predictive distribution to compare against point forecasts from other non-Bayesian methods, does it mean starting with rng is redundant? Because the random number generated is independent of the estimated parameters. Taking expectations successively (first one the random number, then on the parameters) will end up with the same mean (right?).

(I know focusing only on point forecasts is odd to Bayesians. But I am coming from the non-Bayesian world and using Stan for its convenience in modeling multiple aspects of a problem in an integral way. I believe my audience would care more about whether the complexity would take them somewhere in terms of assessment criteria that they are more familiar with than using density-forecast based approaches.)

Even in non-bayesian area, you can compute a predictive distribution!

The point/distribution difference is not about prediction, but about parameters. You can get point estimates maximazing the likelihood for mean and dispersion, or you can get a posterior distribution of mean and dispersion. In both cases, you can use mean(s) and dispersion(s) to generate a predictive distribution to compare to your data.

Now, you could be only interested in the value of mu and the uncertainty around it. You could use for that point estimate with a non-really-interpretable confidence interval. Or use the ready to interpret posterior distribution.

The major differences, I think, is that we tend to give more importance to modelling the noise in the bayesian area. The noise around the mean is part of the data generating process, and failing to model it is simply failing to model the process we are interested in. For exemple, I stopped recently to be interested by r2, because I donβt want my data point to be closely around my means, I want instead to be sure to unterstand how the dispersion occured. A high r2 would have given a less interesting result!