After some further experimentation,
- The priors on the linear model need to be flatter than those on the quadratic model. Small changes to the quadratic DGR can lead to large changes in the best frequentist linear model fit, especially as the true DGP becomes “more quadratic.”
- Using two sigma terms pretty consistently produces worse samples, in terms of ESS, R-hat, degenerate posteriors, etc. ¯\_(ツ)_/¯
- The sampler seed can make a huge difference in the quality of the samplers.
- The sampling problems get worse, not better, as the true DGP becomes “more quadratic.” That seems to support my impression that identifiability isn’t the issue.
Since there were questions about why I’d use mixtures for this, let me reframe the issue. I’m interested in doing Bayesian model selection between these two regression models. In terms of a mixture model, the mixture parameter gives the probability distribution over the two regressions. Apparently I’m not the first person to think this way. (That earlier thread also reports sampling problems, and also didn’t seem to get any resolution.) This approach also seems to be in line with Andrew Gelman’s standard advice, namely, to fit one big multilevel model that contains all your various hypotheses or comparisons.
So is there some other multilevel approach to Bayesian model selection in Stan that I’ve just missed?