Standalone generated quantities - comments welcome

features

#1

Stan’s standalone generated quantities service stan::service::standalone_gqs now takes as input the fitted parameter values on the constrained scale, i.e., as output by the sampler. This means that it should be relatively simple to plumb calls to this service through the interfaces.

I’ve updated the cmdstan issue here: https://github.com/stan-dev/cmdstan/issues/594
Here’s what it says:

The standalone_generate function requires as inputs:

  • the model
  • the data used to fit it
  • the set of draws from the posterior.

The command line arguments required for this feature are:

  • method = proposing new option: generate
  • data = same dataset as before
  • fitted_param_values = a new sub-option which specifies a single input file containing the fitted parameter values
  • output = name of output file
  • seed = specify random seed - optional

The input file of fitted parameter values can be in one of the following formats

  • Rdump
  • JSON
  • csv (?) needs investigation

thoughts on workflow for RStan and other interfaces welcome ,

cheers,
Mitzi


#2

Just to clarify, this list of inputs suggests that information created/derived in the transformed data block would not be able to be used in these standalone GQ. Is that correct?


#3

incorrect.

information created/derived in the transformed data block is available in the GQ block.

statements in the transformed data block are executed in the model’s constructor, i.e., on model instantiation. the standalone gqs service takes an instantiated model, therefore any variable declared in the top-level of the transformed data block will be available in the generated quantities block.

cheers,
Mitzi


#4

After getting this working preliminarily and discussion with @seantalts, I’ve revised this issue w/r/t to allowed input formats of the sample from the fitted model. In order to make this feature easy to use, the sample should be in its default format - for CmdStan, this is the stan csv file format.

The question is - should CmdStan try to also recognize csv data from RStan or PyStan?

Discussion from the updated issue is as follows:

It is easy to assemble the matrix of fitted parameter values with the correct ordering from Cmdstan’s output.csv file, (or concatenated output.csv files).

It is also possible to parse a csv file created from an RStan stan_fit object which has been saved as a csv file (without row names), because the first N columns correspond to the parameters on the unconstrained scale in the correct order.

Not sure whether or not something similar is possible in PyStan.

@jonah, @bgoodri - is it worth having this feature in CmdStan for RStan users?
@ariddell - is there a way to dump the sample in csv format such that the parameters are in declaration order?


#5

I guess. If the comments in the csv file are not relevant, then it is just a question of writing the columns in the correct order.


#6

right.
looks like RStan does this right thing by default -
given a stan_fit object foo_fit, the command
write.csv(as.matrix(foo_fit), file="foo.csv", row.names=FALSE)
produces a csv file with parameters in declaration order.

the column order in a file generated in this way differs from the column order in the current CmdStan “output.csv” format in that the CmdStan file has “lp__” as the first column, and cols 1:6 are the sampler state diagnostic variables, where in the RStan stan_fit object written as CSV the sampler state diagnostics aren’t there, and “lp__” is the last column. otherwise, everyone is in the same order. heuristics are trivial to see which kind of csv file is which - does header start with “lp__” or not?

if feature standalond generated quantities is available directly in RStan interface, should we invest extra work to make CmdStan handle csv files generated in this way?


#7

I don’t know that it is a huge priority but I’m sure someone will come up with a use case for it.