I am building a double-gaussian mixture model. I know the variable (2-level factor) which explains the existence two separate distributions. How can I specify this information in the model?
In the example below, as in my real data, the data
y are split between two gaussian distributions (83% and 17% respectively), and each distribution aligns with a factor level in
var. The real (non-simulated) data patterns like this, so the simulation isn’t perfect but hopefully sufficient for this question:
My model also contains 7 predictors variables (3 in the example below). In my real model, as I add the final predictors (6+7), the BULK_Ess decrease substantially, Rhats rise, and theta proportions move away closer to 0.5/0.5. I think this might be solved by adding the information about which factor level contributes to each distribution.
Is there a way to tell the model that mu1 is strictly associated factor level 1 of
var and mu2 with factor level 2 of the same variable?
Apologies if this is in the documentation and I missed it! Also, apologies if I’m on the completely wrong track. Many thanks in advance.
# Simulated data var1 <- rep(1, 830) y1 <- rnorm(830, 13, 22) data1 <- data.frame(y1, var1) data1 <- rename(data1, c("y1"="y", "var1"="var")) var2 <- rep(2, 170) y2 <- rnorm(170, 43, 22) data2 <- data.frame(y2, var2) data2 <-rename(data2, c("y2"="y", "var2"="var")) sim_data <- rbind(data1, data2) sim_data$var <- as.factor(sim_data$var) sim_data$expl1 <- rnorm(1000, 30, 3) sim_data$expl2 <- rnorm(1000, 118, 21) sim_data$expl3 <- rnorm(1000, 65, 19) sim_data %>% ggplot(aes(x = y, fill = var))+ geom_histogram(binwidth = 1)+ facet_grid(var~.) # Example model mix <- mixture(gaussian, gaussian) formula = bf(y ~ 1 + expl1 + expl2 + expl3) test <- brm(formula = formula, data = sim_data, family = mix, chains = 1) summary(test) pp_check(test)
- Operating System: Mac OS 10.15.8
- brms Version: 2.17.0