On this forum I have often read people who know a lot about modeling say that lots of divergent transitions means the model may be mis-specified. I am wondering how solid a rule that is for judging a model to be inferior to another.
I recently added a cubic polynomial interaction term to a model. Four chains of 1000 iterations. The Rhats for all coefficients ranged from 1.46 - 1.60 and the bulk ESS were 7-8. Also a message saying 1000 transitions after warmup.
The previous model containing a linear and quadratic interaction term on the other hand, performed marvelously, all Rhats 1.00, ESS all over 2000. Great posterior predictive checks etc.
Usually when comparing models I use k-fold or loo cv and compare via the loo-compare()
function in brms. But with such poor diagnostics I canât really k-fold or loo the cubic model.
Is it sufficient, legitimate, and most importantly defendable to say âthe cubic model failed to converge and hence was judged to be inferior to the quadratic?â
Alternatively should I try to bandage up the cubic to a point where it can be legitimately compared to the quadratic using CV (e.g. by increasing adapt_delta)?