Prediction in generated quantities vs. model parameters

Hi All,
More of a general question about predicting based on new observations in Stan. I see from the users guide (via the link below) that in addition to using generated quantities to make predictions, it is also possible to do this in the model block, where the predictions become parameters. I was wondering what the plusses and minuses are of making predictions either way - is it just using the generated quantities block to make predictions means that there are now n (# of predictions) less parameters Stan has to estimate?
Thanks,
MG

The basic difference between the model and transformed parameter blocks is that the former will not store the calculations made using the model parameters and the latter will return them with the traces of the original parameters. Otherwise, both can be used for the calculations needed for the HMC chain.

generated quantities is a bit different, it cannot (I believe) be used to compute the above, it only generates some arbitrary output using the samples from the chain. This has a few advantages: you don’t have to do any intermediate calculations, only that with the actual sample from the chain; because of that you can compute a forecast or any model output that doesn’t match the exact size of the data (if you do that the other blocks you will have to make sure which slice of the prediction actually matches the data); and finally, you can use random number generators in this block, which is not possible in the others (so you can for instance generate a stochastic prediction or ensemble of stochastic trajectories).

Whatever you choose, none of this affects the number of parameters has to estimate (only the number of intermediate calculations and of which kind). Also, I’d be careful with what is meant by prediction, because it is sometimes used as a synonym for forecasting (or backcasting), and sometimes only as the output you get from the model given a set of parameters.

Hi thanks for the help.

I’m not really forcasting or backcasting as I am predicting a new value through a linear model which was trained in Stan using a separate set of data

My question really is (and if you answered it excuse me but I must have missed it), looking at the Stan page on Prediction, (see link), what are the benefits of making predictions in the model block (predictions modeled as parameters) rather than the generated quantities block (predictions declared as generated quantities)?

Thanks,
MG

Except for the random numbers that can be generated in the generated quantities block it’s mostly a matter fo convenience: if you have no use for the transformation of parameters after inference it can go in the model block, if you need to store it, it may go in the transformed parameters.

If the data is separate but in the same support as the data used for inference you can use the same linear model computed for the transformed parameters, e.g. y = ax + b where x = \{0,1,2,3,4\}, but if x is different between data sets it may be best to use the generated quantities, but again, it’s a matter of organization and convenience for the most part.

2 Likes

Thank you very much, I think that clears up my questions.
Best,
MG

1 Like