Regarding your second point, how many posterior samples have you obtained for each model? In order to get stable Bayes Factor results, you need many more posterior samples than you would typically need for parameter estimation. See for example this thread about calculating Bayes Factors for brms
models - increasing the number of samples from something like iter = 2000, warmup = 1000, chains = 4
to iter = 10000, warmup = 1000, chains = 4
apparently yielded more stable results.
1 Like