Standalone generated quantities - comments welcome

OK. The underlying problem is ultimately that RStan is assuming there will always be parameters to store (and lp__) in

In this case, state has 50 values but 53 or 54 were expected. Do I need a need a different writer for standalone gqs?

not sure - I don’t really understand what’s going on in your code - what is expecting 53 or 54 values?

is this a problem with the gq_writer? to me it seems like the problem is the set of draws passed into the call to method standalone_generate

the cmdstan implementation reads the input from the fitted_params file into a stan-csv object, and the draws are stored as an EigenMatrixXd in field fitted_params.sample. the first 7 columns of output are lp__ and sampler diagnostics - this is coded up as:

the draws are read into an EigenMatrixXd object which is passed into the call to the services method:

can you do something similar in RStan?

With RStan, we have to allocate memory for the new draws calling

The first three arguments are fine I think.

The fourth is N_sample_names, which ordinarily is a call to the size of the thing allocated by stan::mcmc::sample::get_sample_param_names, which is lp__ and accept_stat__. Is it correct that those two are excluded from the standalone generated quantities output?

The fifth is N_sampler_names (with an r), which ordinarily five since it is the size of

      sampler_names[0] = "stepsize__";
      sampler_names[1] = "treedepth__";
      sampler_names[2] = "n_leapfrog__";
      sampler_names[3] = "divergent__";
      sampler_names[4] = "energy__";

Is it correct that all of those are excluded in the output of the standalone generated quantities?

The sixth is N_constrained_param_names which is the size of the thing allocated by model.constrained_param_names() and would ordinarily include everything in the parameters, transformed parameters, and generated quantities blocks. Are the things in parameters and transformed parameters re-written in the output of standalone generated quantities?

The seventh is N_iter_save, which I assume should be the same as the number of rows in the matrix of draws from the original model. The eighth is warmup, which I assume is zero. And the ninth is this thing that indicates which quantities are “of interest”, since RStan allows the user to exclude some things declared in parameters, transformed parameters, and generated quantities from the output.

So, it comes to expect 54 by lp__, accept_stat__, mu, sigma, y_rep[1], y_rep[2]y_rep[50]. But what should it be?

the only thing in the output from gq_writer is the parameters declared in the generated quantities output.
so if your generated quantities block declares a single variable vector[50] y_rep then the output will consist of 50 columns.

this minimizes I/O, FWIW.

That makes sense. But do we have a way to count the number of things that are just in the generated quantities block? If there are no transformed parameters, then the size of constrained_parameter_names minus the number of columns in draws is the number of generated quantities. But if there are things in transformed parameters, then that won’t work.

yes - call method constrained_param_names twice -

model.constrained_param_names(names_vec_a, true, false);
model.constrained_param_names(names_vec_b, true, true);
size_t gq_vars = names_vec_b.size() - names_vec_a.size();

OK, I think I have this working now, but how do we tell the compiler to make everything double rather than var when compiling a Stan program to be called by standalone_gqs?

1 Like

Everything needed from the actually executable needs to be compiled and linked. If this isn’t happening through headers, then you need to create an explicit instantiation the way we do for the grammars (see the stan repo, path src/stan/lang/grammars and look at the instantiation pattern).

OK, we’ll have to talk about this Thursday.

A better example might be what we do for functions with expose_stan_functions and the code generator. That should be directly reusable here.

Bit late to the party but thanks for tackling this @mitzimorris, this feature is very useful for me - I’ll often build e.g. a large timeseries model and need to provide daily updates.

It’s working well for simple models but doesn’t handle vectorised parameters - it’s doing the param compare() on e.g. b.1 against b[1]

make lm2_fit
make lm2_pred
./lm2_fit sample data file=lm2_data.R output file=lm2_fit.csv
./lm2_pred generate_quantities fitted_params=lm2_fit.csv data file=lm2_data.R output file=lm2_pred.csv

lm2_example.R (168 Bytes)
lm2_fit.stan (152 Bytes)
lm2_pred.stan (207 Bytes)

I’ll look into this. are you running everything through RStan?

All through cmdstan - I don’t think this functionality is in rstan yet