Simulating multiple datasets from posterior predictive distribution

I am aware that I can use the generated quantities block to simulate from the posterior predictive distribution. However, it seems that I can only generate one dataset everytime. How would I go about generating multiple datasets (say 100)?

data {
  int<lower=0>N;
  vector[N]y;
  vector[N] x;
}
parameters {
  real beta0;
  real beta1;
  real<lower=0> sigma;
}
model{
  y ~ normal(beta0 + beta1*x, sigma);
  beta0 ~ normal(0,1);
  beta1 ~ normal(0,1);
  sigma ~ gamma(1,1);
}
generated quantities {
  vector[N]y_pred;
  for (n in 1:N){
     y_pred[n] = normal_rng(beta0 + beta1*x[n], sigma);
  }

Specifying the number of datasets as M:

generated quantities {
  matrix[N,M] y_pred;
  for (n in 1:N){
    for(m in 1:M){
       y_pred[n,m] = normal_rng(beta0 + beta1*x[n], sigma);
    }
  }

Thanks for your reply. I can see the 100 datasets when I extract the fit. Do you know how I could plot density curves for all of the datasets on the same plot?

I think you would need to show the data frame format you have the data in to answer that question, as it might require a few steps and will depend on whether e.g., you are using r or python or something else.

In general in R you could put all the data into long form - one column with all the data points and another column indicating which sample those points come from. Then something like:

ggplot(dataframe) +
geom_density(aes(x = estimate, color = sample))

Alternatively, select a subsample, say 25 samples and plot them in a grid:

ggplot(dataframe) +
geom_density(aes(x = estimate) +
facet_wrap(~sample)
2 Likes