I’ve compared some models with loo_compare but when I plot the predicted vs observed plots for the best model returned, it is really poor compared to the second best model, which has a difference in elpd of 300 (s.e. = 90). Why might this be? Is elpd maximising the same thing that a predicted vs observed plot can show?
No. ELPD is related to the height of the posterior predictive density function over an observed point, where the posterior predictive density is derived from a model with that point left out. So there are two ways to obtain the situation you describe.
One is if your predicted versus observed plots are using point predictions, and these point predictions are a poor summary of the height of the posterior predictive density over the observations. For example, if the posterior predictive density function is approximately Gaussian, and the observation is 10, then a posterior predictive density of Normal(5,5) is superior to a posterior predictive density of Normal(9, 0.1), even though the latter would be summarized as a point prediction that is much closer to the observed value.
The other is if the posterior predictive densities from the fitted model are a poor approximation to the posterior predictive densities from the model where the point is left out. In this latter case, you will likely see some diagnostic warnings from the PSIS-LOO procedure.
If you’re confident that neither of these things is happening, then some further details would be helpful.