Just leaving the answer here below in the case someone finds this and is looking for the answer. It appears that, when estimating mixing proportions in brms, estimates are indeed on the logit scale and priors should be specified as such.
I was still somewhat uncertain of this after looking at the Stan code created by the make_stancode()
function, so I ran some simple models with simulated data that convinced me. The following code demonstrates this with some simulated data:
library(brms)
set.seed(717)
# Generating data from two different normal distributions.
N <- 100
x1 <- rnorm(N, -5, 1)
x2 <- rnorm(N, 5, 1)
x <- c(x1, x2)
d <- data.frame(x)
I then fit a model with Normal(0,1) prior on mixing proportions - appropriate for logit space, and wide but non-exchangeable priors on the location of the two distributions (assuming one is negative and one is positive). The model fits well and the mixing proportion is estimated at zero, which is where it should be for even proportions in log-odds.
b1 <- brm(bf(x ~ 1,
theta2 ~ 1),
data = d,
family = mixture(gaussian(),gaussian(), order = TRUE),
cores = 4,
prior = c(
prior(exponential(1), class = "sigma1"),
prior(exponential(1), class = "sigma2"),
prior(normal(-5, 2.5), class = "Intercept", dpar = "mu1"),
prior(normal(5, 2.5), class = "Intercept", dpar = "mu2"),
prior(normal(0, 1), class = "Intercept", dpar = "theta2")
),
backend = "cmdstanr"
)
summary(b1)
Family: mixture(gaussian, gaussian)
Links: mu1 = identity; sigma1 = identity; mu2 = identity; sigma2 = identity; theta1 = identity; theta2 = identity
Formula: x ~ 1
theta2 ~ 1
Data: d (Number of observations: 200)
Draws: 4 chains, each with iter = 1000; warmup = 0; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mu1_Intercept -4.90 0.09 -5.08 -4.73 1.00 4126 2907
mu2_Intercept 5.01 0.11 4.79 5.22 1.00 5148 3758
theta2_Intercept -0.00 0.14 -0.28 0.27 1.00 4333 2882
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma1 0.89 0.06 0.77 1.02 1.00 4694 3080
sigma2 1.09 0.08 0.96 1.26 1.00 4794 2633
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
I then fit the same model with a beta(4, 4) on the mixing proportions, which would be an appropriate weakly informative prior in probability space but informative and strange in the logit space as it’s bounded. The true mixing proportion is not recovered and there are now a few divergent transitions.
b2 <- brm(bf(x ~ 1,
theta2 ~ 1),
data = d,
family = mixture(gaussian(),gaussian(), order = TRUE),
cores = 4,
prior = c(
prior(exponential(1), class = "sigma1"),
prior(exponential(1), class = "sigma2"),
prior(normal(-5, 2.5), class = "Intercept", dpar = "mu1"),
prior(normal(5, 2.5), class = "Intercept", dpar = "mu2"),
prior(beta(4, 4), class = "Intercept", dpar = "theta2")
),
backend = "cmdstanr"
)
summary(b2)
Family: mixture(gaussian, gaussian)
Links: mu1 = identity; sigma1 = identity; mu2 = identity; sigma2 = identity; theta1 = identity; theta2 = identity
Formula: x ~ 1
theta2 ~ 1
Data: d (Number of observations: 200)
Draws: 4 chains, each with iter = 1000; warmup = 0; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mu1_Intercept -4.90 0.09 -5.08 -4.73 1.00 3992 3236
mu2_Intercept 5.01 0.11 4.79 5.23 1.00 5524 3304
theta2_Intercept 0.23 0.09 0.08 0.42 1.00 3499 2149
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma1 0.89 0.07 0.77 1.03 1.00 4296 2495
sigma2 1.09 0.08 0.96 1.26 1.00 4092 2686
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning message:
There were 14 divergent transitions after warmup. Increasing adapt_delta above may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
I then ran this again with a mixing proportion of 0.25 for the second distribution. Ran the same two models again. The weakly-informative prior of normal(0,1) in the logit space allows for my model to recover the true mixing proportion, as -1.08 in logit space is approximately 0.25 back-transformed to probability.
# Creating unbalanced data for probability of theta2 = 0.25 (probability scale)
set.seed(717)
x1 <- rnorm(150, -5, 1)
x2 <- rnorm(50, 5, 1)
x <- c(x1, x2)
d <- data.frame(x)
b3 <- brm(bf(x ~ 1,
theta2 ~ 1),
data = d,
family = mixture(gaussian(),gaussian(), order = TRUE),
cores = 4,
prior = c(
prior(exponential(1), class = "sigma1"),
prior(exponential(1), class = "sigma2"),
prior(normal(-5, 2.5), class = "Intercept", dpar = "mu1"),
prior(normal(5, 2.5), class = "Intercept", dpar = "mu2"),
prior(normal(0, 1), class = "Intercept", dpar = "theta2")
),
backend = "cmdstanr"
)
summary(b3)
Family: mixture(gaussian, gaussian)
Links: mu1 = identity; sigma1 = identity; mu2 = identity; sigma2 = identity; theta1 = identity; theta2 = identity
Formula: x ~ 1
theta2 ~ 1
Data: d (Number of observations: 200)
Draws: 4 chains, each with iter = 1000; warmup = 0; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mu1_Intercept -5.00 0.08 -5.16 -4.84 1.00 4352 2901
mu2_Intercept 4.88 0.14 4.62 5.15 1.00 5091 3547
theta2_Intercept -1.08 0.16 -1.40 -0.76 1.00 5389 2836
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma1 0.99 0.06 0.88 1.11 1.00 4558 2747
sigma2 0.96 0.10 0.79 1.18 1.00 4197 2180
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Here the beta(4,4) prior is really not good, as we are fixing the mixing proportions at an incorrect value that is not close to the true value. Chaos ensues and the model can’t sample properly.
b4 <- brm(bf(x ~ 1,
theta2 ~ 1),
data = d,
family = mixture(gaussian(),gaussian(), order = TRUE),
cores = 4,
prior = c(
prior(exponential(1), class = "sigma1"),
prior(exponential(1), class = "sigma2"),
prior(normal(-5, 2.5), class = "Intercept", dpar = "mu1"),
prior(normal(5, 2.5), class = "Intercept", dpar = "mu2"),
prior(beta(4, 4), class = "Intercept", dpar = "theta2")
),
backend = "cmdstanr"
)
summary(b4)
Family: mixture(gaussian, gaussian)
Links: mu1 = identity; sigma1 = identity; mu2 = identity; sigma2 = identity; theta1 = identity; theta2 = identity
Formula: x ~ 1
theta2 ~ 1
Data: d (Number of observations: 200)
Draws: 4 chains, each with iter = 1000; warmup = 0; thin = 1;
total post-warmup draws = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mu1_Intercept -5.05 0.16 -5.52 -4.84 1.06 74 NA
mu2_Intercept -0.03 4.92 -5.08 5.13 1.73 6 NA
theta2_Intercept 0.30 0.25 0.02 0.75 1.73 6 NA
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma1 4.51 3.55 0.90 9.19 1.74 6 NA
sigma2 0.89 0.11 0.71 1.13 1.36 9 NA
Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning messages:
1: Parts of the model have not converged (some Rhats are > 1.05). Be careful when analysing the results! We recommend running more iterations and/or setting stronger priors.
2: There were 61 divergent transitions after warmup. Increasing adapt_delta above may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup