Nonsensical Mixing Proportions

ShawnFoster · May 17, 2023, 8:41pm

Short summary of the problem
Hello all! I’m fairly new to Bayesian modelling and have been having some trouble with a Gaussian Mixture model. Specifically, I have two questions.

Are mixtures of Gaussians an appropriate way to model this data?
If it is, why am I getting nonsensical estimates for my mixing proportion.

I have data from two experimental conditions. I’m trying to model both using two-component mixtures of Gaussians.

When I run two separate models that simply estimate the underlying means and mixing proportions for each distribution, I get reasonable enough results. I.e.:



prior<-c(
  prior(normal(-1,1),Intercept, dpar=mu1),
  prior(normal(1,1),Intercept, dpar=mu2)
)

fit_mix_simple_1<-brm(bf(F2.50.class~1), simple_data, family = mix,
                      prior = prior, chains = 2)

Tells me that for the data on the left, theta1 is roughly 0.25 (CI [0.12,0.38 ]) and theta2 roughly 0.75 (CI[0.62,0.88]). This seems reasonable to me.
However, doing the same for the distribution on the right estimates the mixing proportions to be 0.55 (CI [0.01,0.99]) and 0.45 (CI [0.01,0.99]) . Relating to my first question, does this estimate make the use of a mixture model problematic? Or does it accurately reflect that the underlying distributions may be mixed so thoroughly that a high degree of uncertainty is expected?

As for my second question, when I try to run a model that looks at both distributions and predicts mixing proportions based on experimental condition, I get answers that aren’t proportions. So running something like

fit_mix<-brm(bf(F2.50.class~1, mu1~condition, theta2~total_cond), data, family = mix,
             prior = prior, chains = 2)

tells me that the estimated intercept for theta2 is 240.47 and the estimated coefficient for experimental condition is -455.54 . The Rhats for both estimates are very high, but this didn’t seem like the type of problem that running for more iterations would solve.

Thank you all!

Shawn

Operating System: Windows 10
brms Version: 2.13.3

spinkney · May 17, 2023, 8:56pm

On distribution 2, the CI is so wide because it’s not identified. The data could reasonably be estimated by just one Gaussian with a large enough variance. The large CI indicates that the proportions could be nearly 0 or nearly 1 or anything in between, that’s the meaning of non-identified.

ShawnFoster · May 18, 2023, 8:17pm

Thank you for the advice!

Topic		Replies	Views
Interpreting recovered mixing parameters for mixture of > 2 distributions brms	0	416	September 29, 2022
Predicting mixing proportions brms	2	1023	June 14, 2018
Priors on mixing distributions - probability vs log-odds Modeling brms	1	793	May 24, 2022
Finding appropriate mixture distribution for brms model Modeling fitting-issues , specification , brms	3	1343	June 2, 2022
Convergence issues with brms mixture models Modeling brms	6	1822	May 18, 2022

Nonsensical Mixing Proportions

Related topics