Sorry, your question fell through a bit.
In my opinion, generating in host language is a good way to generate predictions (e.g. it is used by the brms
package), the main advantage is that it gives you the most flexibility. In practice you often can’t really reuse much code when generating predictions in Stan, although I admit there is some advantage in using the same language to generate the predictions (e.g. you don’t have to worry if Python has the same parametrizations of the distributions as your model does) .
If you want to use Stan for predictions, you might want to supply the data for prediction already when fitting the model - you would have something like N_pred, group_idx_pred, observation_data_pred, group_data_pred
in the data and generate one sample in generated quantities
for all the observations for prediction. I am not sure if it is already supported by PyStan, but there was some work to let you rerun just the generated quantities
part of a model, so you should (possibly in some future version) be able to fit the model once and then predict multiple times even when using just a single Stan model.
Personally, I think having a separate Stan model just for predictions gets you the worst of both worlds (with the only exception that you can use the same prediction code for multiple host languages).
Does that make sense?