Distribution of mean of posterior predictive distribution

I have a posterior predictive distribution that I generated by fitting a model in Stan. Correspondingly, I can calculate the mean of the posterior predictive distribution. However, I don’t have a good sense of how to obtain the distribution of the mean of the posterior predictive distribution. In particular, I would think that the distribution of the mean of the posterior predictive distribution should be wider than the posterior distribution of the mean.

Would it make sense to fit the posterior predictive distribution with the initial model I used to fit the original data? This should give me the distribution of the mean, but I would be using more datapoints than I originally had (about 10x more) and I’m worried that the prior would no longer make sense.

If I understand your question properly, in general for the posterior distribution of a quantity of interest, e.g. a mean, you generate your predictions for a certain number of draws, compute the quantity of interest for each draw, and then look at the between-draw distribution of the quantity.

So, for example, say you have fit a model and are making 100 point predictions from that model, and you want to compute the posterior distribution of the mean of those 100 point predictions. If you run your model for 1000 iterations per chain, say with 4 chains, you will get 4000 sets of predictions (draws) for each point. If you take the mean of each of these sets, you will get 4000 means, and you can then look at their distribution.

I do understand there is some discussion over just the right terminology to apply here (draws vs samples), but the underlying concpt is the same - you generate many sets of predictions using the posterior distribution, generate your quantity of interest for each, and then look at the distribution of that quantity.

I’m not entirely sure what you mean here, so I hope that I’m not misinterpreting your question.

It might be easier to provide a simple Stan program to elucidate what I’m talking about. Below is a simple univariate normal Stan model (I have not actually run it, but it conveys the general idea and hopefully no bugs) with a fixed standard deviation and a generated quantities block to get the posterior predictive.

Let’s suppose I - consistent with what you are saying - fit this model with 1000 iterations per chain and four chains. This would give me 4000 mu’s but also 4000 predictions from the posterior predictive distribution.

So now, one can calculate the mean of the posterior predictive distribution or its standard deviation. Typically, however, people would just use simple summary statistics. However, what if I am interested in the standard error of the mean of the posterior predictive distribution? I could certainly calculate it using a traditional frequentist calculation, but as the model gets more complex I wonder why not use a Bayesian fit (and Stan in particular)? And if you are using a Bayesian fit, should I just fit it with the same model I have below?

My concern was that in this case, the prior for mu is unchanged (mu ~ normal(0, 1)) but now there are 4000 data points, instead of the original N. In particular, the original model is fitting the N original history of data points, while the posterior predictive represents a sampling of 4000 data points from one point in history. So it’s a little different conceptually.

Alternately, one could fit it without the prior or generated quantities block and adding in a parameter for the standard deviation. This would be more like a MLE fit of the posterior predictive distribution. Given that the number of predictions (4000) might be significantly larger than the original N, then this might be reasonable too.

data {
    int<lower=0> N;
    vector[N] X;
}
parameters {
    real mu;
}
model {
    mu ~ normal(0, 1);
    X ~ normal(mu, 1);
}
generated quantities {
    real posterior_predictive;
    posterior_predictive ~ normal_rng(mu, 1);
}

In my example above and in an earlier (deleted) reply I assumed that you wanted to compute the mean across many new data points (or the original data) and look at the distribution of that mean across many draws.

From what I understand here, though, you’re interested in the posterior distribution of a single prediction. In that case, though, I’m not sure what you mean by the following:

I assume by asking about the standard error of the mean you are interested in quantifying uncertainty around the single prediction? In that case, the uncertainty is captured by the distribution of the posterior draws - I guess in my understanding there is no “standard error of the mean” since you are not asking “how does the mean of each sample vary between samples” but “how does a single point prediction vary between samples”.

Thanks for the reply.

There is a typo in the generated quantities block, so it should be posterior_predictive = normal_rng(mu, 1);, but otherwise yes, I am interested in a single new observation. So for instance, I might estimate the mean of a normally distributed variable and then calculate the posterior predictive for a new observation (as in a forecast from the initial distribution).

What I’m interested in is quantifying the uncertainty in the mean of the posterior predictive distribution. I don’t think this really corresponds to how a single point prediction varies between samples. In this case, that would better correspond with the standard deviation of the posterior predictive distribution (the actual posterior predictive distribution may have a standard deviation higher than one, even though it simulates from a N(mu, 1) distribution because mu has uncertainty itself).

Here is another way to think about it, suppose I randomly sample the data generating process M times and in each iteration I fit the Stan model, get the posterior predictive distribution, and then calculate the mean of the posterior predictive distribution. If I calculate the standard deviation of the collection of means, then that should correspond to a standard error of the mean of the posterior predictive distribution.

But in your case the posterior predictive only consists of draws for a single point, so am not sure what you want to average over.

Yeah, I guess I think this is what you want so long as you are interested in the uncertainty for a single future point. That said I’m no time-series expert and maybe somebody more authoritative will weigh in who has a different take.

Interestingly enough, rstan reports the standard error of the mean of the posterior_predictive variable. They describe how it is calculated in Section 16.4 of the reference manual, which is just the posterior standard deviation divided by the square root of the effective sample size. Of course, this is basically the same thing you would get if you fit using MLE and assumed the samples are independent.

I was able to write some code that gets the distribution of the mean of the posterior predictive through sampling, but it is quite slow to run. Doing it this way confirmed that the standard deviation of the mean of the posterior predictive is larger than the mean of the standard deviation of the posterior distribution of mu. If that makes sense.

Unfortunately, I haven’t yet been able to figure out a simpler way to get the similar numbers in Stan. I was able to get close, but still a little lower than I would have liked.

I think this post is relevant to your question Uncertainty around central tendency - Bayes SE vs. Bayesin CI, specifically the notion that the Monte Carlo error that you are referencing in the manual in section 16.4.3 is not describing uncertainty in central tendency of the posterior, but accuracy in computation.

Thanks that is definitely interesting and makes the concept of standard errors of the mean for parameters much clearer to me. Unfortunately, that isn’t the same concept that I am interested in (which is the point about simulating the data multiple times, fitting the Stan model each iteration, getting the mean of the posterior predictive distribution each time, and then taking the standard deviation of that).