Some models I estimate are some problematic mixture models. Priors and constraints help identify these things, but sometimes a chain may get stuck in a region near a constraint boundary due to not “finding” the “better” posterior mode.
In principle, this shouldn’t be ignored, since multimodal posteriors pretty well state that there are indeed multiple solution sets.
However, more practically, on simulated data I see that chains with the greatest log-posterior recover the generative parameters very well.
In developing this new model, I’ve taken a pragmatic approach based on that observation to basically select the chain with the greatest lp__, which I’m sure is a statistical sin (even if it is rooted in performance tests).
I realized last night that if I didn’t want to ignore these other regions, I could include them, but weight them according to the lp__ in that chain (or per sample?).
I tested this, and it worked fairly well. E.g., I have two chains that “found” the solution set I had generated the data from, and their mean lp__ was about -640. The other two chains were stuck in a mode near a boundary meant to identify the model, and their lp__ were about -690. On a log scale, of course, that’s a fairly large difference. So I computed the mean for a certain parameter that was weighted by the post-prob at that sample.
samps <- as.data.frame(stanOut) # Extract samples sum(samps$parameter * exp(samps$lp__))/sum(exp(samps$lp__)) # Weighted mean
The computed means, of course, lined up fairly well with the generative parameters, since the lower-lp__ chains were much, much lower on the antilog scale.
From a performance-only point of view, this worked well.
But is this a catastrophically bad idea, and more sinful than selecting the highest lp__ chain when some multimodality exists?