Disable Generated Quantities for Warmup?

Someone can feel free to correct me if I’m wrong, but as far as I can tell the Generated Quantities block runs regardless of whether Stan is in the Warmup or Sampling phase. I could imagine that someone might want to view the Warmup Generated Quantities. However, I have also run into a case where running the Generated Quantities block for the Warmup frequently causes errors*. I have since re-written my code so that this doesn’t happen, but I would think that there should be a way to disable Generated Quantities for the Warmup. Is that possible?

*Basically what happens is that the Warmup parameters induce a non-stationarity in the model I am fitting. This causes the mean to explode to infinity or negative infinity, which causes an error in simulating from the normal distribution. I adjusted the code so that it creates an NA when this occurs instead and I get some warnings, but no error messages and the sampled data does not have any NAs in them (which is why I think the Generated Quantities block is running on the Warmup as well).

1 Like

This is a known issue: https://github.com/stan-dev/stan/issues/2459 - you should be able to get around this with save_warmup = FALSE - this will mean you are not saving the parameters for warmup either, but the interface doesn’t support having different output for warmup and the main phase.

Thanks for the advice.

Sorry to bring this old topic back. I have experienced similar situations but I am using vb instead of stan to fit the model. Based on my limited understanding, there is no sampling in variational inference so there is no “warmup” either. How should I avoid extreme large values during the Generated Quantities block in this case? I know that a mis-specified model could also cause this issue but I tried to fit the model without the Generated Quantities block and manually take the estimated values of parameters to generate data myself. However, I don’t see any extremely large value.

1 Like

In ADVI, there is warmup, but there is the initial optimization phase, which could also have some large values. I however don’t think you can switch the gen quants off for this phase.

I see. In that case, I guess all I can try to improve is to come up with a better model that can avoid generating extremely large values?

You can always introduce some hard boundaries on values, e.g. instead of:

my_value = poisson_log_rng(possibly_large_value);

you could write

//Some threshold that should never be reached for pluasible parameters
if(possibly_large_value > 100) { 
  my_value = -1;
else {
  my_value = poisson_log_rng(possibly_large_value);
}

which should let you get rid of any actual errors.

Thanks! Just curious, how big of a problem it is to have extremely large values during data generation?

The - to me - most obvious answer here would be to run the model first without the generated quantities and then run the generated quantities facility of cmdstan in a second step while providing an input file only containing the sampled parameters. A bit tedious, but gets the trick done.

3 Likes

The only big issue is that in some cases it may result in errors thrown in the generated quantities block (e.g. when sampling from a distribution with infinite mean). This in turn prevent the sampling run from finishing. Large numbers that don’t result in an error are usually not a problem.