Predictions by using the generated quantities block in Stan



I have a question regarding how the prediction samples are drawn if using the generated quantities block in stan. Take the following simple codes as an example.

int<lower=0> T; // number of samples
int<lower=0, upper=1> y[T]; // observed samples
real<lower = 0> shape1; // hyperparameter for beta prior
real<lower = 0> shape2; // hyperparameter for beta prior
int<lower=0> T_new; // number of predictions
real<lower=0, upper=1> p;
p ~ beta(shape1, shape2);
y ~ bernoulli(p);
generated quantities{
int<lower=0, upper=1> y_new[T_new];
for (t in 1:T_new){
y_new[t] = bernoulli_rng(p);

I use beta prior with Bernoulli / binomial likelihood, and the new predictions are based on posterior of p, which we know is also a beta distribution.
For example, we only want to predict next observation, that T_new = 1. If we set iter = 2000, chains = 4 in sampling, I notice that Stan will draw the 4000 samples of parameter p, and 4000 samples of y_new, that is, the sample number of parameters and predicted quantities would be the same. So I am very curious about how stan does this.
I assume if we do prediction out of stan, we can sample from the posterior parameter space, and for each parameter sample,we draw a lot of samples for y_new, say if we draw 1000 parameters from the posterior space, and draw 1000 samples for predicted y_new per parameter sample, we should finally have 1 million samples for y_new, which form the posterior distribution of y_new given y, p(y_new|y).
But with Stan, it seems only one sample is drawn for the prediction per sample of parameter. And this puzzles me.



You are right in your assumptions.