On this forum I have often read people who know a lot about modeling say that lots of divergent transitions means the model may be mis-specified. I am wondering how solid a rule that is for judging a model to be inferior to another.

I recently added a cubic polynomial interaction term to a model. Four chains of 1000 iterations. The Rhats for all coefficients ranged from 1.46 - 1.60 and the bulk ESS were 7-8. Also a message saying 1000 transitions after warmup.

The previous model containing a linear and quadratic interaction term on the other hand, performed marvelously, all Rhats 1.00, ESS all over 2000. Great posterior predictive checks etc.

Usually when comparing models I use k-fold or loo cv and compare via the `loo-compare()`

function in brms. But with such poor diagnostics I canât really k-fold or loo the cubic model.

Is it sufficient, legitimate, and most importantly *defendable* to say âthe cubic model failed to converge and hence was judged to be inferior to the quadratic?â

Alternatively should I try to bandage up the cubic to a point where it can be legitimately compared to the quadratic using CV (e.g. by increasing adapt_delta)?