I was banging my head against this problem for a while, and finally came up with the solution. Since it frustrated me for a while, I thought I’d share my thought process and the outcome lest anyone else find themselves in the same situation.
I was trying to fit a model that makes use of brms’ mixture
function, but having trouble interpreting the estimates I obtained for the resulting mixing parameters. A simple example:
simData <- data.frame(y = c(rnorm(2000), rnorm(1000, 6), rnorm(1000, 12)))
test <- mixture(gaussian, nmix=3, order="none")
testModel <- bf(y ~ 1,
theta1 ~ 1,
theta2 ~ 1,
family = test)
testPrior <- c(set_prior("constant(0)", class="Intercept", dpar="mu1"),
set_prior("constant(6)", class="Intercept", dpar="mu2"),
set_prior("constant(12)", class="Intercept", dpar="mu3"))
testFit <- brm(testModel, data=simData, prior=testPrior)
This gave me the resulting output:
Family: mixture(gaussian, gaussian, gaussian)
Links: mu1 = identity; sigma1 = identity; mu2 = identity; sigma2 = identity; mu3 = identity; sigma3 = identity; theta1 = identity; theta2 = identity; theta3 = identity
Formula: y ~ 1
theta1 ~ 1
theta2 ~ 1
Data: simData (Number of observations: 4000)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mu1_Intercept 0.00 0.00 0.00 0.00 NA NA NA
mu2_Intercept 6.00 0.00 6.00 6.00 NA NA NA
mu3_Intercept 12.00 0.00 12.00 12.00 NA NA NA
theta1_Intercept 0.69 0.04 0.62 0.77 1.00 3418 3179
theta2_Intercept 0.00 0.04 -0.08 0.09 1.00 3851 3200
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma1 0.99 0.02 0.96 1.03 1.00 4999 3244
sigma2 0.99 0.02 0.94 1.03 1.00 5144 3087
sigma3 0.98 0.02 0.94 1.03 1.00 3942 2585
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
From the thread here, I understood that estimating the mixing parameters involves a logit transformation, but the sum of the inverse transformation of the estimates of theta1_intercept and theta2_intercept was > 1, leaving me confused and unsure what I was missing/misunderstanding.
The Stan code had the following lines which I had an inkling may be relevant:
for (n in 1:N) {
// scale theta to become a probability vector
log_sum_exp_theta = log(exp(theta1[n]) + exp(theta2[n]) + exp(theta3[n]));
theta1[n] = theta1[n] - log_sum_exp_theta;
theta2[n] = theta2[n] - log_sum_exp_theta;
theta3[n] = theta3[n] - log_sum_exp_theta;
}
Preceding code also suggested that theta3 is automatically set to 0, leaving me to assume there was some sort of normalisation occurring. Calculating log_sum_exp_theta
from the estimates gave me
> log(exp(0.69)+exp(0)+exp(0))
[1] 1.384722
This in turn gave
> 0.69-1.384722
[1] -0.694722
> 0-1.384722
[1] -1.384722
as the theta1
and theta2
(and theta3
, since that was set to 0 in the Stan code) values with log_sum_exp_theta
subtracted. I then eventually figured out that these were log-transformed values, so I could use exp
to get interpretable estimates:
> exp(-0.694722)
[1] 0.4992132
> exp(-1.384722)
[1] 0.2503934
> exp(-1.384722)
[1] 0.2503934
So, it seems obvious at the end (I mean, the estimates are log odds, right? Only an idiot wouldn’t understand that at first pass…), but it took me a while to get there. Hopefully this helps someone who is as lost as I was avoid the same confusion in future.