Multiple generated quantities blocks

Hi there!

… the feature of multiple generated quantities blocks is a feature which I have been waiting for a long time. I just figured that there is a very simple way of getting this to work in rstan with the include functionality. The problem to solve is that the variables defined in the transformed data block are not available in the body of conventional stan functions. So even with exporting stan functions we are stuck. However, if we write the transformed data block in a separate file and then include this separate file in the transformed data block and at the start of a function definition, then we basically got all what is needed to have multiple generated function blocks - simply by defining functions which also include the manipulations done in the transformed data block. So something like this:

functions {
  vector simulate_model(... args as the data block ..., posterior parameters) {
   #include "transform_data.stan"
  // do stuff with the data, transformed data and the posterior
  return results;
}
}
data {
 // .... data input to model ...
}
transformed data {
  #include "transform_data.stan"
}
// ... rest of the model ...

Maybe this helps others - to me this is super-convenient to have in R. It’s a “creative” way of using the include facility, but it works.

Best,
Sebastian

Is the goal here to be able to do something like a new generated quantities block after the model’s been compiled only in R?

I think you’ll also need to have the transformed parameters, in general.

I’m also not sure how this relates to standalone generated quantities, which @mitzimorris is leading.

This sounds more like closures . Is that right? Adding those to the language is on the roadmap, though I think I’d be inclined to add higher-order functions first.

@Bob_Carpenter: In case you also need the transformed parameters for the simulation, then you just make it an argument to the function, yes.

The thing is that in R it is easy to apply a function over the posterior draws.

The best thing about my hack is that it works now with what we have available (though only in R).

@Matthijs: Yes, closures would also get me there. Defining functions which can access data+transformed data would also solve it.

in the posterior draws, the paramter values are the constrained parameter values.
in the generated quantities block, the computation is over the unconstrained parameter values.
are you transforming the parameters before passing them in to the function?

the initial implementation of standalone generated quantities had this problem. I’m working on fix via branch https://github.com/stan-dev/stan/tree/feature/2705-standalone-gqs-constrained-params-inputs
see issues: add logic to service `standalone_gqs` for constrained parameters as input · Issue #2705 · stan-dev/stan · GitHub, add constructor to class `stan/io/array_var_context` which takes as input an Eigen::Matrix vector of reals · Issue #2723 · stan-dev/stan · GitHub

cheers,
Mitzi

I am not sure why the computation is on the unconstrained parameter space. As a user I care about the whatever my parameters are. These are most of the time the constrained parameters. From a user perspective it only matters what my model is about - that will be most of the time constrained. The way I put things I get out of Stan what I want in the form I expect it…and exactly that is what I will feed into my functions for the simulation. The internals of Stan wrt to constraining are not relevant to me as a user.

It matters because unless you pass unconstrained parameters (and transformed parameters) into your function, the function results and the outputs from the model’s generated quantities block may be different. So this isn’t the same thing as a generated quantities block.

I completely understand your desire for this feature - we should have had it long ago.

Well… I see your point. However to me as a user the reference draws are the constrained one as this is what my model is about. I see your point that for stan internals the reference is what has been sampled which is the unconstrained space - but that is irrelevant to me as a user.

Anyway, there is a small problem with my approach which is rooted in a rather conservative choice of the parser to reject the following Stan code:

functions {
  vector bar(vector theta, vector eta, real[] x_r, int[] x_i) {
  return [ 0.0 ]';
  }
  vector foo(real a, int b) {
     real x_r[1,1] = { { a } };
     int x_i[1,1] = { { b } };

     return map_rect(bar, [0.0]', { []' }, x_r, x_i);
  }
}

This will trigger the parser to reject this code, because x_r get’s promoted by default to a var… but only if any of the inputs is a var. However, the code is totally fine (even in a Stan program) if I call the function foo with double only arguments. I see your reason to reject this, since in any sensible Stan program we will call the function with vars, but for the purpose of simulating things I am only instantiating things using double.

So do you think, @mitzimorris, that it would make sense to have a permissive mode of the parser which would allow such code to be parsed? I mean, it would be great to have a mode of the parser where these things are not considered as fatal, but rather as warning so that the parser would still output the C++ code.

In any case… this hacked generated quants thing is a major feature from my perspective. Having a clean solution would still be very valuable indeed… but maybe this approach shown here suggests alternative approaches via closures to the problem?

Presuming that you meant to write

functions {
  vector bar(vector theta, vector eta, real[] x_r, int[] x_i) {
  return [ 0.0 ]';
  }
  vector foo(real a, int b) {
     real x_r[1,1] = { { a } };
     int x_i[1,1] = { { b } };

     return map_rect(bar, [0.0]', { [0.0]' }, x_r, x_i);
  }
}

then this model can now be made to compile in stanc3 by adding the following data annotations on arguments:

functions {
  vector bar(vector theta, vector eta, real[] x_r, int[] x_i) {
  return [ 0.0 ]';
  }
  vector foo(data real a, data int b) {
     real x_r[1,1] = { { a } };
     int x_i[1,1] = { { b } };

     return map_rect(bar, [0.0]', { [0.0]' }, x_r, x_i);
  }
}

Does that suffice for your purposes?

Yes… this would do the trick.

Do you think one can already play with the new compiler or is it still too hot?

Not quite. @seantalts is still working on the code generation. We’ll keep you update on the progress.

I just thought I’d answer this as it pertains to parsing and semantic checking and that part of stanc3 is relatively stable and well-tested. I just checked that it accepts the model I quoted above.

Sure. BTW, is the data qualifier also allowed for declaring variables? In fact, I would expect that the code your wrote still does not work, but we would need to have this:

functions {
  vector bar(vector theta, vector eta, real[] x_r, int[] x_i) {
  return [ 0.0 ]';
  }
  vector foo(data real a, data int b) {
     data real x_r[1,1] = { { a } };
     int x_i[1,1] = { { b } };

     return map_rect(bar, [0.0]', { [0.0]' }, x_r, x_i);
  }
}

?

That’s not currently part of the language. The compiler currently uses static analysis to determine which variables should be treated as doubles and which ones as vars, based on what you specify for the arguments to a function.

and what happens if I mess that up? first assign to x_r some data and then some var? Will this come out being a var?

Then, the compiler will determine that x_r is a var and therefore the map_rect function call should be rejected during type checking.

+1

Cool.

this approach means that you’re doing the work of the transformed data block with each call to the function. but the generated model class only executes the transformed data block once, at model instantiation, i.e., in the model’s constructor.

Yes… I know. However, it really doesn’t matter in practice for my applications. That is, I really don’t notice it in my simulations. Transformed data is about sorting the data and reformatting it for use in the model. This happens in C++ speed - when you use R this is almost zero cost.