Standalone generated quantities - comments welcome

bgoodri · March 10, 2019, 8:11pm

OK. The underlying problem is ultimately that RStan is assuming there will always be parameters to store (and lp__) in

github.com

stan-dev/rstan/blob/develop/rstan/rstan/inst/include/rstan/filtered_values.hpp#L52


  for (size_t n = 0; n < N_filter_; n++)
    if (filter.at(n) >= N_)
      throw std::out_of_range("filter is looking for "
                              "elements out of range");
}


// To deal with C++ name hiding
using stan::callbacks::writer::operator();


void operator()(const std::vector<double>& state) {
  if (state.size() != N_)
    throw std::length_error("vector provided does not "
                            "match the parameter length");
  for (size_t n = 0; n < N_filter_; n++)
    tmp[n] = state[filter_[n]];
  values_(tmp);
}


const std::vector<InternalVector>& x() {
  return values_.x();
}

In this case, state has 50 values but 53 or 54 were expected. Do I need a need a different writer for standalone gqs?

mitzimorris · March 10, 2019, 8:31pm

not sure - I don’t really understand what’s going on in your code - what is expecting 53 or 54 values?

is this a problem with the gq_writer? to me it seems like the problem is the set of draws passed into the call to method standalone_generate

the cmdstan implementation reads the input from the fitted_params file into a stan-csv object, and the draws are stored as an EigenMatrixXd in field fitted_params.sample. the first 7 columns of output are lp__ and sampler diagnostics - this is coded up as:

github.com

stan-dev/cmdstan/blob/d309ec8ebefbf6945db26577a6d2e1029247f1cf/src/cmdstan/command.hpp#L79


    stream.close();
    std::shared_ptr<stan::io::var_context> result = std::make_shared<stan::json::json_data>(var_context);
    return result;
  }
  stan::io::dump var_context(stream);
  stream.close();
  std::shared_ptr<stan::io::var_context> result = std::make_shared<stan::io::dump>(var_context);
  return result;
}


static int hmc_fixed_cols = 7; // hmc sampler outputs columns __lp + 6 




template <class Model>
int command(int argc, const char* argv[]) {
  stan::callbacks::stream_writer info(std::cout);
  stan::callbacks::stream_writer err(std::cout);
  stan::callbacks::stream_logger logger(std::cout, std::cout, std::cout,
                                        std::cerr, std::cerr);


#ifdef STAN_MPI

the draws are read into an EigenMatrixXd object which is passed into the call to the services method:

github.com

stan-dev/cmdstan/blob/d309ec8ebefbf6945db26577a6d2e1029247f1cf/src/cmdstan/command.hpp#L210-L215


return_code = stan::services::standalone_generate(model,
                                    fitted_params.samples.block(0, hmc_fixed_cols, num_rows, num_cols),
                                    random_seed,
                                    interrupt,
                                    logger,
                                    sample_writer);

can you do something similar in RStan?

bgoodri · March 10, 2019, 11:12pm

With RStan, we have to allocate memory for the new draws calling

github.com

stan-dev/rstan/blob/develop/rstan/rstan/inst/include/rstan/rstan_writer.hpp#L96


rstan_sample_writer*
sample_writer_factory(std::ostream *csv_fstream,
                      std::ostream& comment_stream,
                      const std::string& prefix,
                      size_t N_sample_names, size_t N_sampler_names,
                      size_t N_constrained_param_names,
                      size_t N_iter_save, size_t warmup,
                      const std::vector<size_t>& qoi_idx) {
  size_t N = N_sample_names + N_sampler_names + N_constrained_param_names;
  size_t offset = N_sample_names + N_sampler_names;


  std::vector<size_t> filter(qoi_idx);
  std::vector<size_t> lp;
  for (size_t n = 0; n < filter.size(); n++)
    if (filter[n] >= N)
      lp.push_back(n);
  for (size_t n = 0; n < filter.size(); n++)
    filter[n] += offset;
  for (size_t n = 0; n < lp.size(); n++)
    filter[lp[n]] = 0;

The first three arguments are fine I think.

The fourth is N_sample_names, which ordinarily is a call to the size of the thing allocated by stan::mcmc::sample::get_sample_param_names, which is lp__ and accept_stat__. Is it correct that those two are excluded from the standalone generated quantities output?

The fifth is N_sampler_names (with an r), which ordinarily five since it is the size of

      sampler_names[0] = "stepsize__";
      sampler_names[1] = "treedepth__";
      sampler_names[2] = "n_leapfrog__";
      sampler_names[3] = "divergent__";
      sampler_names[4] = "energy__";

Is it correct that all of those are excluded in the output of the standalone generated quantities?

The sixth is N_constrained_param_names which is the size of the thing allocated by model.constrained_param_names() and would ordinarily include everything in the parameters, transformed parameters, and generated quantities blocks. Are the things in parameters and transformed parameters re-written in the output of standalone generated quantities?

The seventh is N_iter_save, which I assume should be the same as the number of rows in the matrix of draws from the original model. The eighth is warmup, which I assume is zero. And the ninth is this thing that indicates which quantities are “of interest”, since RStan allows the user to exclude some things declared in parameters, transformed parameters, and generated quantities from the output.

So, it comes to expect 54 by lp__, accept_stat__, mu, sigma, y_rep[1], y_rep[2] … y_rep[50]. But what should it be?

mitzimorris · March 11, 2019, 12:11am

the only thing in the output from gq_writer is the parameters declared in the generated quantities output.
so if your generated quantities block declares a single variable vector[50] y_rep then the output will consist of 50 columns.

this minimizes I/O, FWIW.

bgoodri · March 11, 2019, 12:13am

That makes sense. But do we have a way to count the number of things that are just in the generated quantities block? If there are no transformed parameters, then the size of constrained_parameter_names minus the number of columns in draws is the number of generated quantities. But if there are things in transformed parameters, then that won’t work.

mitzimorris · March 11, 2019, 12:33am

yes - call method constrained_param_names twice -

model.constrained_param_names(names_vec_a, true, false);
model.constrained_param_names(names_vec_b, true, true);
size_t gq_vars = names_vec_b.size() - names_vec_a.size();

bgoodri · March 11, 2019, 8:36pm

OK, I think I have this working now, but how do we tell the compiler to make everything double rather than var when compiling a Stan program to be called by standalone_gqs?

Bob_Carpenter · March 12, 2019, 5:53pm

Everything needed from the actually executable needs to be compiled and linked. If this isn’t happening through headers, then you need to create an explicit instantiation the way we do for the grammars (see the stan repo, path src/stan/lang/grammars and look at the instantiation pattern).

bgoodri · March 12, 2019, 6:36pm

OK, we’ll have to talk about this Thursday.

sakrejda · March 13, 2019, 12:46am

A better example might be what we do for functions with expose_stan_functions and the code generator. That should be directly reusable here.

somefoo · May 1, 2019, 1:07pm

Bit late to the party but thanks for tackling this @mitzimorris, this feature is very useful for me - I’ll often build e.g. a large timeseries model and need to provide daily updates.

It’s working well for simple models but doesn’t handle vectorised parameters - it’s doing the param compare() on e.g. b.1 against b[1]

make lm2_fit
make lm2_pred
./lm2_fit sample data file=lm2_data.R output file=lm2_fit.csv
./lm2_pred generate_quantities fitted_params=lm2_fit.csv data file=lm2_data.R output file=lm2_pred.csv

lm2_example.R (168 Bytes)
lm2_fit.stan (152 Bytes)
lm2_pred.stan (207 Bytes)

mitzimorris · May 1, 2019, 1:22pm

I’ll look into this. are you running everything through RStan?
cheers,
Mitzi

somefoo · May 1, 2019, 2:14pm

All through cmdstan - I don’t think this functionality is in rstan yet

Topic		Replies	Views
Adding standalone generated quantities option to cmdstan (and rstan and pystan) Interfaces	9	1211	December 15, 2017
Standalone generated quantities usage help : cmdstanr CmdStan techniques , fitting-issues , algorithms	3	991	April 23, 2020
CmdStan generate_quantities and stansummary CmdStan	14	1325	January 9, 2022
Shinystan with separate generated quantities Developers cmdstan , shinystan	4	762	March 9, 2022
Generate Quantities - Mismatch between model and fitted_parameters CmdStan	3	593	September 28, 2022

Standalone generated quantities - comments welcome

Related topics