I’d like to understand the theory underlying the prediction for a new observation in the “inhaler” ordinal regression model fitted using brms. The aim is to get the probability for each of the K = 4 response categories given the predictors, contained in a data.frame `new_data`

with 1 row. For simplicity, consider the case of having no group-level effects.

I understand the behavior of `predict(inhaler_fit, newdata = new_data)`

in the following way: For each of the already performed S posterior draws of \theta (say \theta_s), draw a new observation from p(y_{rep} | \theta_s), i.e. in my case using `ordered_logistic_rng`

. When looking only at the resulting S draws for y_{rep}, this gives the posterior predictive distribution for a new observation on the ordinal scale. Because of the default `summary = TRUE`

in `predict.brmsfit()`

, the proportions (i.e. averages) for each of the K = 4 response categories are returned, so this gives the probabilities P(y_{rep} = i | y) for i \in \{1, 2, 3, 4\}, estimated by sampling from p(y_{rep} | \theta_s) (in addition to sampling from the posterior p(\theta | y), of course). I considered the following procedure: For each posterior draw \theta_s, calculate P(y_{rep} = i | \theta_s) for i \in \{1, 2, 3, 4\} and afterwards average P(y_{rep} = i | \theta_s) over all s \in \{1, \dots, S\}. From some checks, I found out that this is returned by `fitted(inhaler_fit, newdata = new_data)`

. Shouldn’t this give P(y_{rep} = i | y) as well, since P(y_{rep} = i | y) = \int P(y_{rep} = i | \theta) p(\theta | y) d\theta = E_{\theta | y}(P(y_{rep} = i | \theta))? So, shouldn’t the second way (based on `fitted.brmsfit()`

) be equivalent to the first one (based on `predict.brmsfit()`

), but be more accurate since it is not based on sampling from p(y_{rep} | \theta_s)? I guess I am missunderstanding something as the documentation for `fitted.brmsfit()`

says: “By definition, these predictions have smaller variance than the response predictions performed by the predict method.” but I don’t see where I got it wrong.

I found out that my question is very similar to this one, but I still don’t understand why we can’t get the posterior predictive distribution through averaging over the likelihood at the S posterior draws. Or is my assumption p(y_{rep} | y) = \int p(y_{rep} | \theta) p(\theta | y) d\theta = E_{\theta | y}(p(y_{rep} | \theta)) wrong?