Interpreting recovered mixing parameters for mixture of > 2 distributions

I was banging my head against this problem for a while, and finally came up with the solution. Since it frustrated me for a while, I thought I’d share my thought process and the outcome lest anyone else find themselves in the same situation.

I was trying to fit a model that makes use of brms’ mixture function, but having trouble interpreting the estimates I obtained for the resulting mixing parameters. A simple example:

simData <- data.frame(y = c(rnorm(2000), rnorm(1000, 6), rnorm(1000, 12)))
test <- mixture(gaussian, nmix=3, order="none")
testModel <- bf(y ~ 1,
               theta1 ~ 1,
               theta2 ~ 1,
               family = test)

testPrior <- c(set_prior("constant(0)", class="Intercept", dpar="mu1"),
               set_prior("constant(6)", class="Intercept", dpar="mu2"),
               set_prior("constant(12)", class="Intercept", dpar="mu3"))

testFit <- brm(testModel, data=simData, prior=testPrior)

This gave me the resulting output:

 Family: mixture(gaussian, gaussian, gaussian) 
  Links: mu1 = identity; sigma1 = identity; mu2 = identity; sigma2 = identity; mu3 = identity; sigma3 = identity; theta1 = identity; theta2 = identity; theta3 = identity 
Formula: y ~ 1 
         theta1 ~ 1
         theta2 ~ 1
   Data: simData (Number of observations: 4000) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mu1_Intercept        0.00      0.00     0.00     0.00   NA       NA       NA
mu2_Intercept        6.00      0.00     6.00     6.00   NA       NA       NA
mu3_Intercept       12.00      0.00    12.00    12.00   NA       NA       NA
theta1_Intercept     0.69      0.04     0.62     0.77 1.00     3418     3179
theta2_Intercept     0.00      0.04    -0.08     0.09 1.00     3851     3200

Family Specific Parameters: 
       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma1     0.99      0.02     0.96     1.03 1.00     4999     3244
sigma2     0.99      0.02     0.94     1.03 1.00     5144     3087
sigma3     0.98      0.02     0.94     1.03 1.00     3942     2585

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

From the thread here, I understood that estimating the mixing parameters involves a logit transformation, but the sum of the inverse transformation of the estimates of theta1_intercept and theta2_intercept was > 1, leaving me confused and unsure what I was missing/misunderstanding.

The Stan code had the following lines which I had an inkling may be relevant:

    for (n in 1:N) {
      // scale theta to become a probability vector
      log_sum_exp_theta = log(exp(theta1[n]) + exp(theta2[n]) + exp(theta3[n]));
      theta1[n] = theta1[n] - log_sum_exp_theta;
      theta2[n] = theta2[n] - log_sum_exp_theta;
      theta3[n] = theta3[n] - log_sum_exp_theta;
    }

Preceding code also suggested that theta3 is automatically set to 0, leaving me to assume there was some sort of normalisation occurring. Calculating log_sum_exp_theta from the estimates gave me

> log(exp(0.69)+exp(0)+exp(0))
[1] 1.384722

This in turn gave

> 0.69-1.384722
[1] -0.694722
> 0-1.384722
[1] -1.384722

as the theta1 and theta2 (and theta3, since that was set to 0 in the Stan code) values with log_sum_exp_theta subtracted. I then eventually figured out that these were log-transformed values, so I could use exp to get interpretable estimates:

> exp(-0.694722)
[1] 0.4992132
> exp(-1.384722)
[1] 0.2503934
> exp(-1.384722)
[1] 0.2503934

So, it seems obvious at the end (I mean, the estimates are log odds, right? Only an idiot wouldn’t understand that at first pass…), but it took me a while to get there. Hopefully this helps someone who is as lost as I was avoid the same confusion in future.