How does the generated quantities block iterate over sample draws?

sweenejo · November 3, 2017, 12:18am

In a previous topic, @bbbales2 helped me to understand that the generated quantities block is run for each sample draw.

I interpreted this as essentially a for loop.

for (i in 1:"Number of sample draws") {
  generated quantities {
    some_post_pred = some_distribution_rng(parameters[i])
  }
}

I then went ahead and coded up a more complex generated quantity.

generated quantities{
  vector[N] y_new;
  int b_new;
  vector[N] x_new;

  b_new = 1000;
  while(b_new > 20) {
    for (i in 1:N) {
    x_new[i] = -1;  // Rejection sampling to truncate normal at 0.
    while(x_new[i] < 0) {
      x_new[i] = normal_rng(mean(x), sd(x));
      }
    }
  b_new = poisson_rng(sum(x_new) * lambda); // works replacing lambda with true 0.012.
  }
  for (i in 1:N) {
    y_new[i] = normal_rng(beta * x_new[i], sigma);
  }
}

What I want the generated quantities block to produce is two posterior predictions: b_new for some randomly generated x_new such that b_new is less than 20, and y_new for the same x_new that made b_new less than 20. How’s that for a tongue twister?

When I fit the model as written, it stalls (at least it takes > 24 hours). But when I replace lambda with it’s true value 0.012, it does what I want. Any idea why this would be the case?

It might help me, and others, if someone could explain how Stan generates the generated quantities. Should we think of it as iterating over sample draws? Or something else.

If this quantity can’t be calucated in Stan, my backup approach would be to iterate over each draw after extracting the model fit in R. Of course my preference would be to run everything in Stan as I can be sure the distribution parameterizations are identical.

I’ve attached a reproducible example below (.R and .stan). Thanks for all the help.

gen_quant_test.stan (697 Bytes)
gen_quant_test.R (1.1 KB)

mitzimorris · November 3, 2017, 5:53pm

sounds like an infinite loop - I recommend adding “print” statements within your loops to check this.

I’m not sure I understand what you’re asking here - wasn’t this already answered in the previous topic Specifying the number of samples for rng - see last comment?

how Stan generates the generated quantities

see the Stan langauge manual section “Program Blocks”

sweenejo · November 3, 2017, 11:41pm

Thanks for the response. The manual seems to suggest my “for loop” understanding is correct.

I added a print statement as you suggested so the new generated quantities block is

generated quantities{
  vector[N] y_new;
  int b_new;
  vector[N] x_new;

  b_new = 1000;
  while(b_new > 20) {
    for (i in 1:N) {
    x_new[i] = -1;
    while(x_new[i] < 0) {
      x_new[i] = normal_rng(mean(x), sd(x));
      }
    }
  b_new = poisson_rng(sum(x_new) * lambda); // works with 0.012
  print(lambda);
  }
  for (i in 1:N) {
    y_new[i] = normal_rng(beta * x_new[i], sigma);
  }
}

When I fit the model as written, nothing happens after it starts the iteration process. It’s like it fails to run even the first iteration. However, when I substitute lambda for the the true value 0.012 as below, it’s works! I get a bunch of printed 0.012s and the model finishes fitting.

  b_new = poisson_rng(sum(x_new) * 0.012); // works with 0.012
  print(0.012);
  }

My question is, why does it fail to iterate when I use the model parameter lambda, but work when I use the true value 0.012?

bbbales2 · November 4, 2017, 1:05am

This sorta code I’d do outside of Stan (so you can run it in a debugger).

Maybe as a bigger picture thing, there’s probably a way to write these rngs without the rejection sampling.

For instance, you can at least do Inverse transform sampling - Wikipedia for the truncated normals. Maybe there’s an even better way. It’d be worth Googling around something about sampling from a truncated normal.

For the Poissons too, you can just evaluate the PDF for all outputs < 20 or whatever, renormalize the distribution you get, and sample something from there.

My question is, why does it fail to iterate when I use the model parameter lambda

It’s probably something like if lambda is 100.0, then there’s a small chance that b_new < 20. If lambda being 100 ever shows up in the sampler, then maybe you’re in trouble. Stan’s samplers pretty aggressively explore parameter space, so I wouldn’t rule anything out.

randommm · November 4, 2017, 12:32pm

The generated quantities block is evaluated after each sample is obtained, not after all of them are obtained. Consider for instance, this example:

library(rstan)
scode <- '
parameters {
  real y; 
} 
model {
  y ~ normal(0, 1);
  print("model block");
}
generated quantities {
  print("generated quantities block");
}
'

fit1 <- stan(model_code = scode, iter = 10, chains=1)

Which will give you:

model block

model block

...

model block

Iteration: 1 / 10 [ 10%]  (Warmup)
model block

model block

generated quantities block

Iteration: 2 / 10 [ 20%]  (Warmup)
model block

...

model block

generated quantities block

Iteration: 3 / 10 [ 30%]  (Warmup)

model block

...

model block

generated quantities block

Iteration: 4 / 10 [ 30%]  (Warmup)

mitzimorris · November 4, 2017, 3:03pm

I’d be more generous with my use of print statements - I’d put one inside each of your while loops.
e.g.

print("outer loop, bnew: ", b_new);
....
print("inner loop, i: ",  i,  " x_new: ", x_new);

this general technique is called “printf debugging” (https://en.wikipedia.org/wiki/Debugging#Techniques)
I hate working in an IDE, so I use this pretty much everywhere.

sweenejo · November 5, 2017, 3:41am

Thanks all! This is great, I really learned a lot.

Bob_Carpenter · November 7, 2017, 7:29pm

Probably due to initialization sending the model block into an infinite loop. The prints are buffered until the end of the model block, so if things hang inside there, you won’t see them.

Rejection sampling can be very inefficient if the rejection rate is high.

One thing you can do here is define x_new as a parameter with lower bound of zero and then just use

x_new ~ normal(mean(x), sd(x));

though you’re better off coputing the mean and sd in transformed data and storing them. That’d probably be better in cases where the truncation doesn’t work. But it won’t work for a Poisson as we can’t take discrete draws other than in the generated quantities block.

Also, replying on a poisson draw to be less than 20 also seems dangerous. The reason it is probably working with 0.012 is that the poisson RNG has some chance of choosing a value below 20. I have no idea what sum(x_new) is here.

joni1104 · February 11, 2022, 8:10am

I am trying to understand the basics of stan and stumbled upon this reply. It’s been years this post was last active but hoping to get an answer anyway. Why do you have multiple “model block” statements printed if it runs after every sample drawn? For each iteration, shouldn’t it be just one “model block” followed by one “generated quantities block” and so on? Or is it because of samples getting rejected? Thank you.

Topic		Replies	Views
Specifying the number of samples for rng Modeling	6	2412	November 3, 2017
Predictions by using the generated quantities block in Stan General	7	6042	March 15, 2018
Running stan in R with only generated quantities RStan	7	2466	May 27, 2021
Generated quantities for prediction Modeling specification , posterior-predictive	9	542	June 5, 2024
Posterior Predictive Checks After Sampling Modeling	3	810	October 23, 2022

How does the generated quantities block iterate over sample draws?

Related topics