How does the generated quantities block iterate over sample draws?


#1

In a previous topic, @bbbales2 helped me to understand that the generated quantities block is run for each sample draw.

I interpreted this as essentially a for loop.

for (i in 1:"Number of sample draws") {
  generated quantities {
    some_post_pred = some_distribution_rng(parameters[i])
  }
} 

I then went ahead and coded up a more complex generated quantity.

generated quantities{
  vector[N] y_new;
  int b_new;
  vector[N] x_new;

  b_new = 1000;
  while(b_new > 20) {
    for (i in 1:N) {
    x_new[i] = -1;  // Rejection sampling to truncate normal at 0.
    while(x_new[i] < 0) {
      x_new[i] = normal_rng(mean(x), sd(x));
      }
    }
  b_new = poisson_rng(sum(x_new) * lambda); // works replacing lambda with true 0.012.
  }
  for (i in 1:N) {
    y_new[i] = normal_rng(beta * x_new[i], sigma);
  }
}

What I want the generated quantities block to produce is two posterior predictions: b_new for some randomly generated x_new such that b_new is less than 20, and y_new for the same x_new that made b_new less than 20. How’s that for a tongue twister?

When I fit the model as written, it stalls (at least it takes > 24 hours). But when I replace lambda with it’s true value 0.012, it does what I want. Any idea why this would be the case?

It might help me, and others, if someone could explain how Stan generates the generated quantities. Should we think of it as iterating over sample draws? Or something else.

If this quantity can’t be calucated in Stan, my backup approach would be to iterate over each draw after extracting the model fit in R. Of course my preference would be to run everything in Stan as I can be sure the distribution parameterizations are identical.

I’ve attached a reproducible example below (.R and .stan). Thanks for all the help.

gen_quant_test.stan (697 Bytes)
gen_quant_test.R (1.1 KB)


#2

sounds like an infinite loop - I recommend adding “print” statements within your loops to check this.

I’m not sure I understand what you’re asking here - wasn’t this already answered in the previous topic Specifying the number of samples for rng - see last comment?

how Stan generates the generated quantities

see the Stan langauge manual section “Program Blocks”


#3

Thanks for the response. The manual seems to suggest my “for loop” understanding is correct.

I added a print statement as you suggested so the new generated quantities block is

generated quantities{
  vector[N] y_new;
  int b_new;
  vector[N] x_new;

  b_new = 1000;
  while(b_new > 20) {
    for (i in 1:N) {
    x_new[i] = -1;
    while(x_new[i] < 0) {
      x_new[i] = normal_rng(mean(x), sd(x));
      }
    }
  b_new = poisson_rng(sum(x_new) * lambda); // works with 0.012
  print(lambda);
  }
  for (i in 1:N) {
    y_new[i] = normal_rng(beta * x_new[i], sigma);
  }
}

When I fit the model as written, nothing happens after it starts the iteration process. It’s like it fails to run even the first iteration. However, when I substitute lambda for the the true value 0.012 as below, it’s works! I get a bunch of printed 0.012s and the model finishes fitting.

  b_new = poisson_rng(sum(x_new) * 0.012); // works with 0.012
  print(0.012);
  }

My question is, why does it fail to iterate when I use the model parameter lambda, but work when I use the true value 0.012?


#4

This sorta code I’d do outside of Stan (so you can run it in a debugger).

Maybe as a bigger picture thing, there’s probably a way to write these rngs without the rejection sampling.

For instance, you can at least do https://en.wikipedia.org/wiki/Inverse_transform_sampling for the truncated normals. Maybe there’s an even better way. It’d be worth Googling around something about sampling from a truncated normal.

For the Poissons too, you can just evaluate the PDF for all outputs < 20 or whatever, renormalize the distribution you get, and sample something from there.

My question is, why does it fail to iterate when I use the model parameter lambda

It’s probably something like if lambda is 100.0, then there’s a small chance that b_new < 20. If lambda being 100 ever shows up in the sampler, then maybe you’re in trouble. Stan’s samplers pretty aggressively explore parameter space, so I wouldn’t rule anything out.


#5

The generated quantities block is evaluated after each sample is obtained, not after all of them are obtained. Consider for instance, this example:

library(rstan)
scode <- '
parameters {
  real y; 
} 
model {
  y ~ normal(0, 1);
  print("model block");
}
generated quantities {
  print("generated quantities block");
}
'

fit1 <- stan(model_code = scode, iter = 10, chains=1)

Which will give you:

model block

model block

...

model block

Iteration: 1 / 10 [ 10%]  (Warmup)
model block

model block

generated quantities block

Iteration: 2 / 10 [ 20%]  (Warmup)
model block

...

model block

generated quantities block

Iteration: 3 / 10 [ 30%]  (Warmup)

model block

...

model block

generated quantities block

Iteration: 4 / 10 [ 30%]  (Warmup)

#6

I’d be more generous with my use of print statements - I’d put one inside each of your while loops.
e.g.

print("outer loop, bnew: ", b_new);
....
print("inner loop, i: ",  i,  " x_new: ", x_new);

this general technique is called “printf debugging” (https://en.wikipedia.org/wiki/Debugging#Techniques)
I hate working in an IDE, so I use this pretty much everywhere.


#7

Thanks all! This is great, I really learned a lot.


#8

Probably due to initialization sending the model block into an infinite loop. The prints are buffered until the end of the model block, so if things hang inside there, you won’t see them.

Rejection sampling can be very inefficient if the rejection rate is high.

One thing you can do here is define x_new as a parameter with lower bound of zero and then just use

x_new ~ normal(mean(x), sd(x));

though you’re better off coputing the mean and sd in transformed data and storing them. That’d probably be better in cases where the truncation doesn’t work. But it won’t work for a Poisson as we can’t take discrete draws other than in the generated quantities block.

Also, replying on a poisson draw to be less than 20 also seems dangerous. The reason it is probably working with 0.012 is that the poisson RNG has some chance of choosing a value below 20. I have no idea what sum(x_new) is here.