GoF with stan_clogit

I have tried to run a conditional logit model (alternative-specific predictors) with stan_clogit. Everything works fine, but I am stuck with goodness of fit and model comparison via loo. Basically, I have two models

> m_add ← stan_clogit(y ~ x1 + x2 + (1|j), strata=i, data=data)
> m_int ← stan_clogit(y ~ x1 * x2 + (1|j), strata=i, data=data)

Results from stan_clogit and frequentist alternative “mlogit” are virtually identical, usual tests with the frequentist models shows that m_int by far outperforms m_add, as expected. However, the checks with “loo” show the complete opposite. Moreover, with these simple and usually quite effective models Pareto k values are super high and problematic. “waic” etc. also do not appear to work properly.

Does anyone know why this is the case? I believe that, with stan_clogit, it would be better to leave out entire groups of observations (i) and not some j nested in i. Right? However, is not yet implemented. Does anyone know any reliable alternative to assess model quality and perform comparisons? Anything other than “loo”?? Thanks a lot, Guido