Question about lp__ as weight?

Stephen_Martin · October 31, 2017, 12:12am

Some models I estimate are some problematic mixture models. Priors and constraints help identify these things, but sometimes a chain may get stuck in a region near a constraint boundary due to not “finding” the “better” posterior mode.

In principle, this shouldn’t be ignored, since multimodal posteriors pretty well state that there are indeed multiple solution sets.
However, more practically, on simulated data I see that chains with the greatest log-posterior recover the generative parameters very well.
In developing this new model, I’ve taken a pragmatic approach based on that observation to basically select the chain with the greatest lp__, which I’m sure is a statistical sin (even if it is rooted in performance tests).

I realized last night that if I didn’t want to ignore these other regions, I could include them, but weight them according to the lp__ in that chain (or per sample?).
I tested this, and it worked fairly well. E.g., I have two chains that “found” the solution set I had generated the data from, and their mean lp__ was about -640. The other two chains were stuck in a mode near a boundary meant to identify the model, and their lp__ were about -690. On a log scale, of course, that’s a fairly large difference. So I computed the mean for a certain parameter that was weighted by the post-prob at that sample.
Or

samps <- as.data.frame(stanOut) # Extract samples
sum(samps$parameter * exp(samps$lp__))/sum(exp(samps$lp__)) # Weighted mean

The computed means, of course, lined up fairly well with the generative parameters, since the lower-lp__ chains were much, much lower on the antilog scale.
From a performance-only point of view, this worked well.
But is this a catastrophically bad idea, and more sinful than selecting the highest lp__ chain when some multimodality exists?

avehtari · October 31, 2017, 9:57pm

You are right, it’s a wrong thing to do.

You should weight them according to the posterior mass in the mode, and not with mean lp, but the estimation of the mass ie the marginal posterior is difficult. A simple imporrtance sampling estimate would lead to the harmonic mean of the likelihood values 1/mean(1/exp(lp__))). This illustrates that it would be something else than mean lp__, Unfortunately this harmonic mean estimator is unstable for marginal likelihood. You could use bridge sampling http://www.sciencedirect.com/science/article/pii/S0022249617300640 or stacking [1704.02030] Using stacking to average Bayesian predictive distributions

It seems that although you wrote mean lp__ for the mode (region) here you are weighting each draw with the density, but this is wrong as the MCMC sample already has more draws where the density is high, and you are then in a way double weighting your expectation.

I mentioned bridge sampling http://www.sciencedirect.com/science/article/pii/S0022249617300640 and stacking [1704.02030] Using stacking to average Bayesian predictive distributions, but note that these will assume that you have found all relevant modes and the modes are not overlapping too much, which might a big problem with your mixture models.

Stephen_Martin · October 31, 2017, 10:58pm

I wasn’t sure whether the mean lp__ or per-sample lp__ would be more appropriate (the latter which I had concerns about that you mentioned - It’s doubly weighting). Turns out - Neither are :p. Thank you for your response.

I’ve played with the bridge sampling package, so I may try applying that to the problem at hand.

Practically speaking, based on my sims at least, the differences in the model posterior probs are so vastly different on the antilog scale, that I’m expecting the method to basically suggest what the wrong methods above already do, even if it does so in a more principled, correct manner. Thanks again for your response, i’ll consider my options.

Topic		Replies	Views
Chain covergence of complex nonlinear models highly dependent on initial values Modeling fitting-issues , specification , performance	13	615	January 29, 2021
Mixtures for model selection Modeling	6	1194	April 1, 2018
Two component mixture model, strange result from stan Modeling	2	507	May 7, 2019
Trying to understand lp__ as it is not producing the results that I expect when modelling count data using poisson distribution Modeling	12	1204	September 6, 2020
Model comparison - lp__ Modeling	14	4810	April 23, 2021

Question about lp__ as weight?

Related topics