Dear Stan users,
I’m running some models with different predictors and I’d like to compare them.
The models are hurdle models (one component is bernoulli_logit and the other one is beta) and they run very well using non-centered parameterization: meaningful results, good diagnostics, no warnings. N is ~190 for bernoulli and ~160 for beta.
The log_lik is computed following Aki’s suggestion here Log likelihood for hurdle model
The Pareto k diagnostic values that I obtained from the loo package are not very good: I have between 10% and 15% of values greater than 0.7, depending on the model. Following “Vehtari, Gelman, & Gabry, 2016, Practical Bayesian model…” I decided to use k-fold cross validation, which is more robust than waic and psis-loo.
Here is my question: 10-fold cv resulted to be unreliable, that is, I run it twice and I obtained different results (in one case a model was better than the other and in another case it was worse). Is there anything else I can do to get more reliable model comparisons information? Would k > 10 help? Can I trust more results of model comparisons using a k closer to N?
Thank you for any suggestion you may have.
Luca