Generate Quantities - Mismatch between model and fitted_parameters

Dear Community,

I have to split up my code in order to use the sample/optmize procedure and the generate quantities procedure separately.

Thus, I am using cmdstan and cmdstanpy in order to achieve this goal (standalone generated quantities procedure).

Furthermore, I will write a lot of different models that share many similarities and thus, I will make extensive use of the include statement

Description of the problem

If I use my stan file with cmdstan while copying all program blocks together, including the generated_quantities block, the output is fine and can be obtained in a single run. Nevertheless, I have to split it up.

In more detail, this works!

/*
 * file: fit_alltogether.stan
 */

#include models/larson_miller_lognormal.stan

#include data_block.stan

#include generic/qr_decomposition.stan

#include parameter_block.stan

#include generic/qr_transformed_parameter.stan

#include generic/model_block.stan

// this should be moved into a separate file!
#include generic/generated_quantities_plugin_prediction.stan

I run this via:

fit_alltogether method=optimize algorithm=lbfgs tol_grad=1e-8 tol_param=1E-8 tol_rel_grad=1e-4 tol_obj=1E-12 tol_rel_obj=1E-8 history_size=6 iter=100000 data file=mytempfilename.json  output file=output.csv

Nevertheless, if I split this file up into two files, say:

  • fit_alone.stan
  • gq.stan

it does not work anymore.

/*
 * file: fit_alone.stan
 */

#include models/larson_miller_lognormal.stan

#include data_block.stan

#include generic/qr_decomposition.stan

#include parameter_block.stan

#include generic/qr_transformed_parameter.stan

#include generic/model_block.stan

// no generated quantities here

and the gq_file

/*
 *  file: gq.stan
 */
#include models/larson_miller_lognormal.stan

#include data_block.stan

#include generic/qr_decomposition.stan

#include parameter_block.stan

#include generic/qr_transformed_parameter.stan

#include generic/model_block.stan

#include generic/generated_quantities_plugin_prediction.stan

I run the fit program in the first place while runnig gq as standalone afterwards. Nothing is changed within the included program blocks.

fit_alone method=optimize algorithm=lbfgs tol_grad=1e-8 tol_param=1E-8 tol_rel_grad=1e-4 tol_obj=1E-12 tol_rel_obj=1E-8 history_size=6 iter=100000 data file=mytempfilename.json  output file=output.csv
gq  generate_quantities fitted_params=output.csv data file=mytempfilename.json output file=gq_out.csv

The invocation of gq leads to the error:

...
Mismatch between model and fitted_parameters csv file "output.csv"
...

Expected result:
I would expect to see no issues at allow, since it worked previously (see fit_alltogether), nevertheless of the implemented details in the includes statement.

If you need more details, please let me know and I will provide these as soon as possible.

Thanks in advance.

Hi, @haukehaien. This shouldn’t be happening if you really are running the same blocks in both programs. I would clean all the binaries and .hpp files out and verify that’s really the case.

If you get rid of all the includes and just paste everything together, is there still a problem running standalone gqs? If so, @mitzimorris wrote that, and can probably help debug.

Hi Bob_Carpenter,

thanks for your reply. I tested the mentioned procedure with the provided bernoulli example and still, it does not work even without any #include statements. Plain code in the following.

I used the following guide:
standalone-generate-quantities

Just to be sure:

$ cat bernoulli.stan
data {
  int<lower=0> N;
  array[N] int<lower=0,upper=1> y; // or int<lower=0,upper=1> y[N];
}


transformed data{
  array[N] int<lower=0,upper=1> y_tilde; // or int<lower=0,upper=1> y[N];
  y_tilde = y;
}

parameters {
  real<lower=0,upper=1> theta;
}

transformed parameters{

  real<lower=0,upper=1> theta_tilde;

  theta_tilde = theta;

}

model {
  theta ~ beta(1,1);  // uniform prior on interval 0,1
  y ~ bernoulli(theta);
}

and

cat bernoulli_ppc.stan
data {
  int<lower=0> N;
  array[N] int<lower=0,upper=1> y; // or int<lower=0,upper=1> y[N];
}

transformed data{
  array[N] int<lower=0,upper=1> y_tilde; // or int<lower=0,upper=1> y[N];
  y_tilde = y;
}

parameters {
  real<lower=0,upper=1> theta;
}

transformed parameters{

  real<lower=0,upper=1> theta_tilde;

  theta_tilde = theta;

}

model {
  theta ~ beta(1,1);  // uniform prior on interval 0,1
  y ~ bernoulli(theta);
}


generated quantities {
  real<lower=0, upper=1> theta_rep;
  array[N] int y_sim;
  // use current estimate of theta to generate new sample
  for (n in 1:N) {
    y_sim[n] = bernoulli_rng(theta);
  }
  // estimate theta_rep from new sample
  theta_rep = sum(y_sim) * 1.0 / N;
}

So far, so good.

The following procedure works:

./bernoulli sample data file=bernoulli.data.json output file=bernoulli_fit.csv
./bernoulli_ppc generate_quantities fitted_params=bernoulli_fit.csv data file=bernoulli.data.json output file=bernoulli_ppc.csv

While the following will not work, actually:

./bernoulli optimize data file=bernoulli.data.json output file=bernoulli_fit.csv
./bernoulli_ppc generate_quantities fitted_params=bernoulli_fit.csv

Error message:

Mismatch between model and fitted_parameters csv file "bernoulli_fit.csv"

But I can run without any issues and the desired outcome:

./bernoulli_ppc optimize data file=bernoulli.data.json output file=bernoulli_fit.csv

I can be wrong, but from a general point of view, it should work?
We use cmdstan as a basis, and depending on the use case later, we are also run cmdstanR or cmdstanpy.

As a quick and simple procedure, we are going to adapt the maximum likelihood estimate and would like to separate the estimating procedure from the generated quantities block.

More information is needed. Don’t hesitate to ask! Maybe @mitzimorris has an idea how to procede?
Thanks in advance.

Hi @haukehaien -

Currently, stand-alone generate quantities only works if the original output was generated from sampling, not optimization or VI. This has been recently changed and will work as you expect in the next version of Stan. See Allow standalone generate_quantities using non-HMC fit by WardBrian · Pull Request #1106 · stan-dev/cmdstan · GitHub

1 Like