Wide intervals and oddly-shaped posterior densities of mixture‐model weights and means —should I be concerned?

PiDayEveryday · April 18, 2025, 9:31pm

I’m fitting a mixture model in Stan to strictly positive data. Each component distribution has a mean that can be expressed by some summation of a and b. (a and b are two “factor” parameters shared across groups.) The mixing proportions are denoted as weights, w.

I have included the density plots for a, b, and w. They seem oddly shaped to me, and I was wondering if I should be worried.

My concerns: The density plot for b exhibits several bumps. Intuitively, as it is a factor of the mean of a distribution, I would have expected it to follow a relatively bell‐shaped curve, maybe with a right skew due to the restriction to positive values. The weights w2 and w3 have wide intervals. Is this an indicator that the parameter is not identifiable or weakly identifiable from the data?

What have I done so far?

I enforced an ordering constraint on the component means to prevent label‐switching.
I simulated data sets from the prior, and the parameters are recoverable by the model. (The 95% HPDI and CIs both contain the simulated parameter values). This leads me to believe structural non-identifiability is not the problem.
Convergence metrics (All R̂ ≈ 1.00) and trace plots indicate good mixing of chains. Auto-correlation plots don’t show any major concerns after thinning. Effective sample sizes seem sufficiently large.
Posterior predictive checks don’t show any cause for concern. The max, min, mean, median, mode, and sd of simulated datasets reasonably compare to the observed data. This leads me to believe that the model reasonably reflects the data generation process.

Nevertheless, I’m uneasy: are these jagged and wide density plots a red flag or a common occurrence in mixture models? Do I have reason to be concerned or am I searching for problems where there aren’t any? Any recommendations for diagnostics or best practices would be appreciated. Thank you!

Bob_Carpenter · April 20, 2025, 3:53am

Are you using positive constrained component distributions that are reasonable for the data?

How many draws did you have and what was the effective sample size? This just looks like what yo’d get from not miuch data—you’ll see this if you estimate densities from small amounts of data. Otherwise, bumps usually indicate a form of multimodality.

The posterior predictive check is really the gold standard for whether a model can fit the data. If those are fine and you’re measuring the right statistics of interest, then it should be fine.

PiDayEveryday · April 20, 2025, 11:26pm

Thank you so much for the response & apologies for all the questions included in my follow-up.

I am using negative binomials for the component distributions. (The data is discrete). I am using a truncated normal distribution as the prior for a+b. (We have prior information from previous studies on a+b, but not the individual values. I then used a beta distribution for the prior on the proportion of a+b accounted for by a).

I have 4 chains of 10000 iterations with thinning = 5. I used the default, so the first half of the samples are dedicated to warmup. This leaves me with 4000 posterior draws. Could that be too few draws? It seems like a reasonable number to me, but I am new to Stan so maybe I am overlooking something obvious concerning the number of draws.

The effective sample sizes are shown below:
w[1] 3880.632
w[2] 3758.774
w[3] 3693.810
w[4] 3904.761
w[5] 3850.515
w[6] 4173.410
w[7] 3705.880
w[8] 3956.897
w[9] 4016.341
w[10] 3834.332
w[11] 3975.094
b 3506.461
a 3950.396

To me, the effective sample sizes didn’t raise any red flags as they are all near the 4000 mark. Am I missing something here?

Yes, I am concerned about multimodality. It seems like multimodality would be a sign of label switching, but again, I do not think that is the problem here as I have enforced an order on the means of the component distributions. I did not think it was necessary to enforce order on both weights and means to prevent label switching. Structural non-identifiability or partial-identifiability due to the data set used would be the other culprits for multimodality, right?

I have included the pairs plot for a and b in case that is relevant.

Thanks for the reassurance regarding the posterior predictive check! All of my posterior predictive p-values sit in the 0.4-0.6 range, and histograms of predicted values display the same patterns as observed data.

Bob_Carpenter · April 21, 2025, 10:29pm

No worries—that’s why the forums are here.

So a and b are continuous values? if so, then this all seems reasonable assuming you want the prior to penalize large value with exponential tails and be consistent with zero. If you contrast a lognormal, for example, it approaches zero density as the value approaches zero, whereas half normal approaches a fixed positive value at zero. If the beta distribution is being estimated, a mean and total count parameterization usually works better and is easier to parameterize as a prior.

Negative binomials can be hard to fit, but if that’s not a problem, they should be better than a Poisson.

That ESS looks good—good enough you can probably get away with fewer iterations and no thinning. But if you want to do kernel density plotting, you may need more iterations to get everything to look super smooth. I’m attaching a plot generated using default kernel density estimation in R for a standard normal:

> x <- rnorm(4000)
> plot(density(x))

You can see that its default bandwidth leaves the plot quite jagged. If you make the bandwidth smaller (e.g., argument density(x, bw=0.1)) it gets even jaggier and if you make it wider, it gets smoother.

std-normal-4K.pdf (30.5 KB)

Topic		Replies	Views
Non-identifiability of mixing weights in a 2-component Gaussian model Modeling	5	546	July 20, 2018
Bayesian finite mixture model in stan Modeling	15	2191	September 19, 2017
Scale-mixture of normal vs. student t Modeling fitting-issues	10	1640	February 22, 2019
Time-varying mixing weights in 2-component Gaussian mixture model Modeling	2	677	April 13, 2018
Beta binomial mixture model Modeling mixture	5	1302	March 4, 2021

Wide intervals and oddly-shaped posterior densities of mixture‐model weights and means —should I be concerned?

Related topics