Should $generate_quantities() accept CmdStanVB objects?

In cmdstanr, the $generate_quantities() method of CmdStanModel can only accept CmdStanMCMC objects for fitted_parameters. Would it ever make sense to accept CmdStanVB objects as well? This seems possible in the implementation because CmdStanVB still has a concept of draws, but I am genuinely not sure if drawing generated quantities is coherent or advisable in variational Bayes.

2 Likes

What we were already thinking about a few times is allowing generate_quantities directly use draws_array.

As we already have support for that under the hood (if you have a CmdstanMCMC object but no CSV files anymore) that would be easy to add.

As for how much sense GQ makes for VB, I will leave that to the experts.

3 Likes

Yeah I think that makes sense. @wlandau I have to step away from my computer for a few hours, so can you open an issue for this if you have a chance?

Yeah I think we should do that too.

I think the idea of generated quantities makes just as much sense for VB as for MCMC, but it will just be generating those quantities based on (typically) worse parameter estimates.

2 Likes

the point of VB is that it’s fast - you shouldn’t need to run generated quantities ever - just run VB and get what you want.

that said, sure, why not?

3 Likes

Just posted: Allow $generate_quantities() to directly use draws_array objects · Issue #387 · stan-dev/cmdstanr · GitHub

2 Likes

I think eventually it makes sense to have standalone generated quantities just work with whatever algorithm was used to fit the model (even including optimization), but @mitzimorris is definitely right that VB (and same would apply to optimization) is usually fast enough to just run it as needed without having a separate generate quantities method. Although if the dataset is big enough and/or there are tons of parameters VB can still be slow, it’s just a lot faster than MCMC.

Thanks!

The generated_quantities service doesn’t care where the sample is from; the problem arises because the Stan CSV output files differ just enough between the various estimation algorithm outputs to make processing them a complete PITA. https://mc-stan.org/docs/2_25/cmdstan-guide/stan-csv.html

Inasmuch as a VB sample is a sample, the generated_quantities method should accept any sample.

1 Like

Indeed! In order to make it work with generate_quantities Rok added fake sampler diagnostic columns to the CSV files from VB in order to make them match the files from sampling. That seems to work fine, although it’s not pretty.