# Simulating multiple datasets from posterior predictive distribution

I am aware that I can use the generated quantities block to simulate from the posterior predictive distribution. However, it seems that I can only generate one dataset everytime. How would I go about generating multiple datasets (say 100)?

``````data {
int<lower=0>N;
vector[N]y;
vector[N] x;
}
parameters {
real beta0;
real beta1;
real<lower=0> sigma;
}
model{
y ~ normal(beta0 + beta1*x, sigma);
beta0 ~ normal(0,1);
beta1 ~ normal(0,1);
sigma ~ gamma(1,1);
}
generated quantities {
vector[N]y_pred;
for (n in 1:N){
y_pred[n] = normal_rng(beta0 + beta1*x[n], sigma);
}
``````

Specifying the number of datasets as M:

``````generated quantities {
matrix[N,M] y_pred;
for (n in 1:N){
for(m in 1:M){
y_pred[n,m] = normal_rng(beta0 + beta1*x[n], sigma);
}
}
``````

Thanks for your reply. I can see the 100 datasets when I extract the fit. Do you know how I could plot density curves for all of the datasets on the same plot?

I think you would need to show the data frame format you have the data in to answer that question, as it might require a few steps and will depend on whether e.g., you are using r or python or something else.

In general in R you could put all the data into long form - one column with all the data points and another column indicating which sample those points come from. Then something like:

``````ggplot(dataframe) +
geom_density(aes(x = estimate, color = sample))
``````

Alternatively, select a subsample, say 25 samples and plot them in a grid:

``````ggplot(dataframe) +
geom_density(aes(x = estimate) +
facet_wrap(~sample)``````
2 Likes