# Nonsensical Mixing Proportions

Short summary of the problem
Hello all! I’m fairly new to Bayesian modelling and have been having some trouble with a Gaussian Mixture model. Specifically, I have two questions.

1. Are mixtures of Gaussians an appropriate way to model this data?
2. If it is, why am I getting nonsensical estimates for my mixing proportion.

I have data from two experimental conditions. I’m trying to model both using two-component mixtures of Gaussians.

When I run two separate models that simply estimate the underlying means and mixing proportions for each distribution, I get reasonable enough results. I.e.:

``````

prior<-c(
prior(normal(-1,1),Intercept, dpar=mu1),
prior(normal(1,1),Intercept, dpar=mu2)
)

fit_mix_simple_1<-brm(bf(F2.50.class~1), simple_data, family = mix,
prior = prior, chains = 2)

``````

Tells me that for the data on the left, theta1 is roughly `0.25 (CI [0.12,0.38 ])` and theta2 roughly `0.75 (CI[0.62,0.88])`. This seems reasonable to me.
However, doing the same for the distribution on the right estimates the mixing proportions to be `0.55 (CI [0.01,0.99]) `and `0.45 (CI [0.01,0.99])` . Relating to my first question, does this estimate make the use of a mixture model problematic? Or does it accurately reflect that the underlying distributions may be mixed so thoroughly that a high degree of uncertainty is expected?

As for my second question, when I try to run a model that looks at both distributions and predicts mixing proportions based on experimental condition, I get answers that aren’t proportions. So running something like

``````fit_mix<-brm(bf(F2.50.class~1, mu1~condition, theta2~total_cond), data, family = mix,
prior = prior, chains = 2)
``````

tells me that the estimated intercept for theta2 is `240.47` and the estimated coefficient for experimental condition is `-455.54 `. The Rhats for both estimates are very high, but this didn’t seem like the type of problem that running for more iterations would solve.

Thank you all!

Shawn

• Operating System: Windows 10
• brms Version: 2.13.3

On distribution 2, the CI is so wide because it’s not identified. The data could reasonably be estimated by just one Gaussian with a large enough variance. The large CI indicates that the proportions could be nearly 0 or nearly 1 or anything in between, that’s the meaning of non-identified.

1 Like