Is it possible to make Generated Quantities runtime-optional?

8one6 · June 22, 2020, 8:40pm

Simple linear regression with predictions looks like this:

data {
    int<lower=1> D;

    int<lower=1> N;
    matrix[N, D] X;
    vector[N] Y;

    int<lower=1> N_pred;
    matrix[N_pred, D] X_pred;
}

parameters {
    real a;
    vector[D] b;
    real<lower=1> s;
}

model {
    vector[N] mu = (X * b) + a;
    Y ~ normal(mu, s);
}

generated quantities {
    vector[N_pred] Y_pred;
    {
        vector[N_pred] mu_pred = (X_pred * b) + a;
        Y_pred = normal_rng(mu_pred, s);
    }
}

Sometimes, though, after compiling the model specified by this code, I just want to fit some parameters, and don’t want to generate any posterior predictive distributions. But because it’s been specified as above, if I don’t pass N_pred and X_pred variables to STAN at runtime, it yells at me (totally fairly!). I usually get around this by just passing in data which has N_pred set equal to N and X_pred equal to X but that seems really dumb and a waste of compute cycles when I don’t care about that stuff at all.

Is there any way to code this model so that, at sampling time, after compilation is long-since-finished, I have the option to either pass in N_pred and X_pred and have the generated quantities code run as written or pass in neither of those pieces of data and have the sampling process completely skip the generated quantities section entirely? I’d be totally fine if to achieve that goal I was required to have a variable in the data block called should_i_generate_quantities that I would need to be set to 1 when I want to give both extra bits of data and run the last block and be set to 0 when I don’t want to bother with anything related to the last block.

Thanks!

erognli · June 23, 2020, 5:52am

You pretty much answer your own question - enclose the generated quantities in an if-statement, and include a control variable in the data block. But don’t expect that much of a speedup. The generated quantities block is computationally cheap because it is only executed once per iteration, ulike all the stuff going on in the model block. Still, it will reduce the size of the fitted model object, and may save some time by reducing the amount of diagnostics.

ssp3nc3r · June 23, 2020, 1:38pm

Enclosing code inside generated quantities with an if statement can avoid the computation but you still need to pass in variables for that prediction (even if dummy or repeating data used for fitting) .

Another approach, if using RStan, is to put prediction code into a Stan function, expose the function in R, and call the function, passing it new data and fitted parameters. If not using RStan, you can put the prediction code in a separate file in generated quantities (omit the parameter and model blocks) and do the same.

8one6 · June 23, 2020, 1:59pm

Thanks for that reply! I’m on pyStan, does the same option exist there?

Either way, a follow up question: How does STAN respond, in general, to degenerate data structures? I.e. in the above setup could N_pred be 0 and if so, what data structure would you need to feed in to X_pred to keep the program happy?

ssp3nc3r · June 23, 2020, 5:28pm

You can pass in N_pred as 0 and X_pred as a [0,] matrix, and it’s easy to test; e.g.:

data {
  int<lower=0> N_pred;
  matrix[N_pred, 1] X_pred;
}

generated quantities {
  print(X_pred);
}

In R (I don’t have python setup for Stan at the moment):

f <- rstan::stan(file = "test.stan", data = list(N_pred = 0, X_pred = matrix(1, nrow = 0, ncol = 1)), algorithm = "Fixed_param", iter = 1, chains = 1)

mitzimorris · June 23, 2020, 7:27pm

it’s also cheap because it doesn’t have to compute any gradients, unlike work done in the transformed parameter and model block.

this is a good point.

you could try running CmdStanPy which is going to be more memory efficient.
but it sounds like you want to have a pair of models where the 2nd model is run using
the generate_quantities method - https://cmdstanpy.readthedocs.io/en/latest/generate_quantities.html

erognli · June 24, 2020, 9:29am

Ah, yes. That’s actually what I meant by my very much less precise “stuff going on” in the model block 😊. Or what I was thinking about. Thanks for making it clearer, anyway!

Topic		Replies	Views
Simple code for prediction using generated quantities Modeling	4	549	September 8, 2018
Problem in generated quantities Modeling specification	2	702	November 15, 2019
Do not output a variable from generated quantities Modeling	5	1249	February 24, 2020
Calculating values in gen qualtities that don't vary with sample? Modeling	6	65	November 8, 2024
Properly set up codes to get posterior predictions without re-running the original model General	1	398	September 12, 2021

Is it possible to make Generated Quantities runtime-optional?

Related topics