I am building a double-gaussian mixture model. I know the variable (2-level factor) which explains the existence two separate distributions. How can I specify this information in the model?
In the example below, as in my real data, the data y
are split between two gaussian distributions (83% and 17% respectively), and each distribution aligns with a factor level in var
. The real (non-simulated) data patterns like this, so the simulation isn’t perfect but hopefully sufficient for this question:
My model also contains 7 predictors variables (3 in the example below). In my real model, as I add the final predictors (6+7), the BULK_Ess decrease substantially, Rhats rise, and theta proportions move away closer to 0.5/0.5. I think this might be solved by adding the information about which factor level contributes to each distribution.
Is there a way to tell the model that mu1 is strictly associated factor level 1 of var
and mu2 with factor level 2 of the same variable?
Apologies if this is in the documentation and I missed it! Also, apologies if I’m on the completely wrong track. Many thanks in advance.
# Simulated data
var1 <- rep(1, 830)
y1 <- rnorm(830, 13, 22)
data1 <- data.frame(y1, var1)
data1 <- rename(data1, c("y1"="y", "var1"="var"))
var2 <- rep(2, 170)
y2 <- rnorm(170, 43, 22)
data2 <- data.frame(y2, var2)
data2 <-rename(data2, c("y2"="y", "var2"="var"))
sim_data <- rbind(data1, data2)
sim_data$var <- as.factor(sim_data$var)
sim_data$expl1 <- rnorm(1000, 30, 3)
sim_data$expl2 <- rnorm(1000, 118, 21)
sim_data$expl3 <- rnorm(1000, 65, 19)
sim_data %>%
ggplot(aes(x = y, fill = var))+
geom_histogram(binwidth = 1)+
facet_grid(var~.)
# Example model
mix <- mixture(gaussian, gaussian)
formula = bf(y ~ 1 + expl1 + expl2 + expl3)
test <- brm(formula = formula,
data = sim_data,
family = mix,
chains = 1)
summary(test)
pp_check(test)
- Operating System: Mac OS 10.15.8
- brms Version: 2.17.0