If the pp distribution is p(yrep | y) = \int p(yrep | theta) p(theta | y) dtheta,
and we know just some (S) samples from the posterior then the pp distribution should be

p(yrep | y) = mean( p(yrep | theta_s) ).

If, for instance, the likelihood is gaussian, then the pp distribution is an average of S gaussians of parameters theta_s. I would have made a random sample of, say, M numbers of every those gaussians to get a matrix of S rows and M columns, and then the pp distribution would be (with R):

colMeans(yrep) # A vector of M elements.

But in the many examples and documentation I read the sampling procedure is to sample M = 1 from the s-th likelihood and have the distribution of yrep as the vector of S samples.

What is wrong in my first assumptions for the distribution?

p(yrep | y) is conceptually the integral over theta of the function p(y_rep | theta) * p(theta | y). We can draw from the distribution whose PDF is p(yrep | y) by repeating the following two steps many times:

Draw theta_tilde from the distribution whose PDF is p(theta | y)

Draw y_rep from the distribution whose PDF is p(y_rep | theta = theta_tilde)

In practice, we already have a bunch of draws from the posterior distribution, so we take each of them to be theta_tilde in turn and draw from the distribution whose PDF is p(y_rep | theta_tilde). You can then evaluate any function of the draws of y_rep.

What you are proposing is to estimate the expectation of theta — which is defined as the integral over theta of the function theta * p(theta | y) — and calling that theta_bar. If you then draw from the distribution whose PDF is p(y_alt | theta = theta_bar), that is only the right thing to do if you were certain that the unknown theta really is theta_bar, which you can never be absolutely certain of. Otherwise, the resulting draws of y_rep do not approximate draws of y_rep; they will have too little variance and tail heaviness because you have discarded you uncertainty over what theta is.