In a previous topic, @bbbales2 helped me to understand that the generated quantities block is run for each sample draw.
I interpreted this as essentially a for loop.
for (i in 1:"Number of sample draws") {
generated quantities {
some_post_pred = some_distribution_rng(parameters[i])
}
}
I then went ahead and coded up a more complex generated quantity.
generated quantities{
vector[N] y_new;
int b_new;
vector[N] x_new;
b_new = 1000;
while(b_new > 20) {
for (i in 1:N) {
x_new[i] = -1; // Rejection sampling to truncate normal at 0.
while(x_new[i] < 0) {
x_new[i] = normal_rng(mean(x), sd(x));
}
}
b_new = poisson_rng(sum(x_new) * lambda); // works replacing lambda with true 0.012.
}
for (i in 1:N) {
y_new[i] = normal_rng(beta * x_new[i], sigma);
}
}
What I want the generated quantities block to produce is two posterior predictions: b_new
for some randomly generated x_new
such that b_new
is less than 20, and y_new
for the same x_new
that made b_new
less than 20. How’s that for a tongue twister?
When I fit the model as written, it stalls (at least it takes > 24 hours). But when I replace lambda with it’s true value 0.012, it does what I want. Any idea why this would be the case?
It might help me, and others, if someone could explain how Stan generates the generated quantities. Should we think of it as iterating over sample draws? Or something else.
If this quantity can’t be calucated in Stan, my backup approach would be to iterate over each draw after extracting the model fit in R. Of course my preference would be to run everything in Stan as I can be sure the distribution parameterizations are identical.
I’ve attached a reproducible example below (.R and .stan). Thanks for all the help.
gen_quant_test.stan (697 Bytes)
gen_quant_test.R (1.1 KB)
sounds like an infinite loop - I recommend adding “print” statements within your loops to check this.
I’m not sure I understand what you’re asking here - wasn’t this already answered in the previous topic Specifying the number of samples for rng - see last comment?
how Stan generates the generated quantities
see the Stan langauge manual section “Program Blocks”
Thanks for the response. The manual seems to suggest my “for loop” understanding is correct.
I added a print statement as you suggested so the new generated quantities block is
generated quantities{
vector[N] y_new;
int b_new;
vector[N] x_new;
b_new = 1000;
while(b_new > 20) {
for (i in 1:N) {
x_new[i] = -1;
while(x_new[i] < 0) {
x_new[i] = normal_rng(mean(x), sd(x));
}
}
b_new = poisson_rng(sum(x_new) * lambda); // works with 0.012
print(lambda);
}
for (i in 1:N) {
y_new[i] = normal_rng(beta * x_new[i], sigma);
}
}
When I fit the model as written, nothing happens after it starts the iteration process. It’s like it fails to run even the first iteration. However, when I substitute lambda
for the the true value 0.012
as below, it’s works! I get a bunch of printed 0.012
s and the model finishes fitting.
b_new = poisson_rng(sum(x_new) * 0.012); // works with 0.012
print(0.012);
}
My question is, why does it fail to iterate when I use the model parameter lambda
, but work when I use the true value 0.012
?
This sorta code I’d do outside of Stan (so you can run it in a debugger).
Maybe as a bigger picture thing, there’s probably a way to write these rngs without the rejection sampling.
For instance, you can at least do Inverse transform sampling - Wikipedia for the truncated normals. Maybe there’s an even better way. It’d be worth Googling around something about sampling from a truncated normal.
For the Poissons too, you can just evaluate the PDF for all outputs < 20 or whatever, renormalize the distribution you get, and sample something from there.
My question is, why does it fail to iterate when I use the model parameter lambda
It’s probably something like if lambda is 100.0, then there’s a small chance that b_new < 20. If lambda being 100 ever shows up in the sampler, then maybe you’re in trouble. Stan’s samplers pretty aggressively explore parameter space, so I wouldn’t rule anything out.
The generated quantities block is evaluated after each sample is obtained, not after all of them are obtained. Consider for instance, this example:
library(rstan)
scode <- '
parameters {
real y;
}
model {
y ~ normal(0, 1);
print("model block");
}
generated quantities {
print("generated quantities block");
}
'
fit1 <- stan(model_code = scode, iter = 10, chains=1)
Which will give you:
model block
model block
...
model block
Iteration: 1 / 10 [ 10%] (Warmup)
model block
model block
generated quantities block
Iteration: 2 / 10 [ 20%] (Warmup)
model block
...
model block
generated quantities block
Iteration: 3 / 10 [ 30%] (Warmup)
model block
...
model block
generated quantities block
Iteration: 4 / 10 [ 30%] (Warmup)
I’d be more generous with my use of print
statements - I’d put one inside each of your while loops.
e.g.
print("outer loop, bnew: ", b_new);
....
print("inner loop, i: ", i, " x_new: ", x_new);
this general technique is called “printf debugging” (https://en.wikipedia.org/wiki/Debugging#Techniques)
I hate working in an IDE, so I use this pretty much everywhere.
Thanks all! This is great, I really learned a lot.
Probably due to initialization sending the model block into an infinite loop. The prints are buffered until the end of the model block, so if things hang inside there, you won’t see them.
Rejection sampling can be very inefficient if the rejection rate is high.
One thing you can do here is define x_new
as a parameter with lower bound of zero and then just use
x_new ~ normal(mean(x), sd(x));
though you’re better off coputing the mean and sd in transformed data and storing them. That’d probably be better in cases where the truncation doesn’t work. But it won’t work for a Poisson as we can’t take discrete draws other than in the generated quantities block.
Also, replying on a poisson draw to be less than 20 also seems dangerous. The reason it is probably working with 0.012 is that the poisson RNG has some chance of choosing a value below 20. I have no idea what sum(x_new)
is here.
I am trying to understand the basics of stan and stumbled upon this reply. It’s been years this post was last active but hoping to get an answer anyway. Why do you have multiple “model block” statements printed if it runs after every sample drawn? For each iteration, shouldn’t it be just one “model block” followed by one “generated quantities block” and so on? Or is it because of samples getting rejected? Thank you.