I would also say that based on loo, in this case there is no solid evidence that our theoretically-preferred would provide better predictions, but note also that there is no solid evidence that it would provide worse predictions than your alternative.
Overall loo is not good for detecting very small differences between models (and the same holds for WAIC, etc.). To detect small differences it is possible to add more assumptions about the future data, but then you need to check those assumptions. See more, e.g. http://dx.doi.org/10.1214/12-SS102 and http://link.springer.com/article/10.1007/s11222-016-9649-y
Aki