Is it possible to make Generated Quantities runtime-optional?

Simple linear regression with predictions looks like this:

data {
    int<lower=1> D;

    int<lower=1> N;
    matrix[N, D] X;
    vector[N] Y;

    int<lower=1> N_pred;
    matrix[N_pred, D] X_pred;

parameters {
    real a;
    vector[D] b;
    real<lower=1> s;

model {
    vector[N] mu = (X * b) + a;
    Y ~ normal(mu, s);

generated quantities {
    vector[N_pred] Y_pred;
        vector[N_pred] mu_pred = (X_pred * b) + a;
        Y_pred = normal_rng(mu_pred, s);

Sometimes, though, after compiling the model specified by this code, I just want to fit some parameters, and don’t want to generate any posterior predictive distributions. But because it’s been specified as above, if I don’t pass N_pred and X_pred variables to STAN at runtime, it yells at me (totally fairly!). I usually get around this by just passing in data which has N_pred set equal to N and X_pred equal to X but that seems really dumb and a waste of compute cycles when I don’t care about that stuff at all.

Is there any way to code this model so that, at sampling time, after compilation is long-since-finished, I have the option to either pass in N_pred and X_pred and have the generated quantities code run as written or pass in neither of those pieces of data and have the sampling process completely skip the generated quantities section entirely? I’d be totally fine if to achieve that goal I was required to have a variable in the data block called should_i_generate_quantities that I would need to be set to 1 when I want to give both extra bits of data and run the last block and be set to 0 when I don’t want to bother with anything related to the last block.


You pretty much answer your own question - enclose the generated quantities in an if-statement, and include a control variable in the data block. But don’t expect that much of a speedup. The generated quantities block is computationally cheap because it is only executed once per iteration, ulike all the stuff going on in the model block. Still, it will reduce the size of the fitted model object, and may save some time by reducing the amount of diagnostics.

Enclosing code inside generated quantities with an if statement can avoid the computation but you still need to pass in variables for that prediction (even if dummy or repeating data used for fitting) .

Another approach, if using RStan, is to put prediction code into a Stan function, expose the function in R, and call the function, passing it new data and fitted parameters. If not using RStan, you can put the prediction code in a separate file in generated quantities (omit the parameter and model blocks) and do the same.

Thanks for that reply! I’m on pyStan, does the same option exist there?

Either way, a follow up question: How does STAN respond, in general, to degenerate data structures? I.e. in the above setup could N_pred be 0 and if so, what data structure would you need to feed in to X_pred to keep the program happy?

You can pass in N_pred as 0 and X_pred as a [0,] matrix, and it’s easy to test; e.g.:

data {
  int<lower=0> N_pred;
  matrix[N_pred, 1] X_pred;

generated quantities {

In R (I don’t have python setup for Stan at the moment):

f <- rstan::stan(file = "test.stan", data = list(N_pred = 0, X_pred = matrix(1, nrow = 0, ncol = 1)), algorithm = "Fixed_param", iter = 1, chains = 1)

it’s also cheap because it doesn’t have to compute any gradients, unlike work done in the transformed parameter and model block.

this is a good point.

you could try running CmdStanPy which is going to be more memory efficient.
but it sounds like you want to have a pair of models where the 2nd model is run using
the generate_quantities method -

1 Like

Ah, yes. That’s actually what I meant by my very much less precise “stuff going on” in the model block 😊. Or what I was thinking about. Thanks for making it clearer, anyway!