Efficient Coding of Large-Dimensional Generated Quantities (Export only means?)

After fitting my model on n=~3000 rows, I compute predicted values for about ~250 different permutations of covariates (counterfactual scenarios). With 4 chains and 3000 steps each, this inflates the dimensionality of the generated quantities matrix for the fitted values to 3000 * 250 * 12000 = 9 billion entries. I have observed that exporting high-dimensional generated quantities significantly slows down the time between finishing of chains and completion of the rstan sample-function.

I only really need the posterior average residual for each permutation, so a 250 element vector with the per-step mean of the mean of rows (rather than the 9 billion matrix) would be sufficient to be exported.

What would be a recommended workflow if one only needs point estimates of generated quantities rather than the full posterior generated quantities of each step? Exclude full matrix from export with “{…}” and calculate the averages in the generated quantities section? Or “manually” calculate predictions based on mean posterior parameter outside of stan? Does the gen-quant calculation only take place after all steps of a chain are done? Or after each step?
One key element where I am missing understanding to inform my decision is whether the long additional run time with large-dimensional gen-quant exports is due to their calculation in the gen-quant section or due to the export of the stan model object back to R.

I would be very grateful for any advice and am happy to supplement a reproducible example, although I believe the question might be meta enough not to require an example.
Thank you!

Specify pars = "big", include = FALSE when you call stan or sampling where “big” is the name of the thing in generated quantities that is too big to store. The posterior means of everything are always available afterward via get_posterior_mean().

Thank you! And if i did not want to calculate fitted values for each of the 12,000 steps but only once for the mean parameters, could that be achieved within the generated quantities? Is there a way to access or calculate the posterior mean of parameters directly within generated quantities?

I’d still love to do the fitting with the mean posterior parameters in stan (e.g. to re-use matrix products from transformed data), but it’s computationally to intensive to do it 12,000 times for the whole posterior sample.

Do you happen to know if there is an equivalent of “pars” and “include” for CmdStan?

There is not. Also, you cannot calculate means (of functions) over all the posterior draws within the generated quantities block because the generated quantities block is executed one posterior draw at a time.

No, but given that data streams out, it’s much more scalable. Of course, that still needs to get read in if you want to analyze the output.

Usually when output is too much, users are maintaining too many posterior draws. We usually recommend a total ESS of around 200 for practical inference using 4 chains (the 50 per chain limit is more for our estimates of ESS than that we need a total ESS of 200). So if you have a lot of autocorrelation, you can thin.

There will only be 16K leapfrog steps per iteration with a tree depth of 14.

But yes, generated quantities is only executed once per iteration, no matter how many leapfrog steps are required.

In addition to it being more scalable due to streaming the output like @Bob_Carpenter says, if you’re running CmdStan via CmdStanR then when we read the output back into R we only read in what is requested. So, for example,

fit$draws("theta")

only reads theta into memory instead of reading in all parameters and then only returning theta. Also, if you were to subsequently run fit$draws(c("theta", "alpha")) it would only read in alpha the second time (since theta was already read in) but still return both theta and alpha.

I think this is true for CmdStanPy too but I’m not 100% sure.