Background
I am trying to model data from a psychological experiment, where my participants use a slide scale on a computer screen to give a response. There actually two very similar experiments that both have the same problem. The response variable y can hence vary between 0 and 1. Unfortunately, I think I made mistake while designing the experiment as the starting position of the slider is always 0.5 and it seems participants always move the slider at least a bit and never leave it at the middle even when they “should”. This can be seen in the raw distribution of y:
ggplot(Experiment1, aes(x = est_dist)) +
geom_density() +
geom_vline(xintercept = 0.36) +
geom_vline(xintercept = 0.65) +
theme_classic() +
labs(x = "Raw distribution of y", y = "Density")
As we can see, the raw distribution of y has two modes approximately at y = 0.36 and y = 0.65. My interpretation is that participants fairly randomly move the slider to the left or the right even if they want to give a centre response. My original aim with this analysis was to simply build a hierarchical regression model, where I predict y using the two predictor variables x_1 and x_2 because I would like to make inferences whether there is evidence that the predictors are different from zero. However, pp_check()
of a model with only one Gaussian response distribution still has two modes, so I feel I have to do something different.
I played around with my data to see whether there is anything that would determine a left or right shift but I could not find anything so I turned to mixture models, which I understand might be useful in this context. Here is how I currently model the data:
First, I scaled y, x_1 and x_2 so don’t wonder if there numbers are different from the raw distribution. The model I fit with its priors is
# Create Gaussian mixture model
mix <- mixture(gaussian, gaussian)
# Prior
prior <- c(prior(normal(-0.7, 1), class = "Intercept", dpar = mu1),
prior(normal(0, 1), class = "b", dpar = mu1),
prior(normal(0, 0.5), class = "b", coef = "x2", dpar = mu1),
prior(normal(0.57, 1), class = "Intercept", dpar = mu2),
prior(normal(0, 1), class = "b", dpar = mu2),
prior(normal(0, 0.5), class = "b", coef = "x2", dpar = mu2))
# Fit model
m1 <- brm(y ~ x1 * x2 + (x1 * x2 | ppid),
data = Experiment2,
family = mix,
prior = prior,
iter = iter,
warmup = warmup,
chains = chains,
cores = cores,
init = 0,
seed = 1412)
The summary
Family: mixture(gaussian, gaussian)
Links: mu1 = identity; sigma1 = identity; mu2 = identity; sigma2 = identity; theta1 = identity; theta2 = identity
Formula: y ~ x1 * x2 + (x1 * x2 | ppid)
Data: Experiment2 (Number of observations: 3240)
Draws: 10 chains, each with iter = 34000; warmup = 7000; thin = 1;
total post-warmup draws = 270000
Group-Level Effects:
~ppid (Number of levels: 27)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(mu1_Intercept) 0.33 0.05 0.24 0.45 1.00 79474 136602
sd(mu1_x1) 0.55 0.09 0.40 0.76 1.00 88560 144127
sd(mu1_x2) 0.24 0.08 0.11 0.41 1.00 59470 109863
sd(mu1_x1:x2) 0.31 0.15 0.04 0.61 1.00 43261 67606
sd(mu2_Intercept) 0.40 0.07 0.29 0.54 1.00 71623 125214
sd(mu2_x1) 0.50 0.09 0.35 0.70 1.00 91114 152636
sd(mu2_x2) 0.22 0.09 0.05 0.41 1.00 53965 62593
sd(mu2_x1:x2) 0.46 0.15 0.17 0.78 1.00 80310 96385
cor(mu1_Intercept,mu1_x1) 0.24 0.19 -0.16 0.59 1.00 85937 136518
cor(mu1_Intercept,mu1_x2) -0.14 0.25 -0.62 0.35 1.00 165462 178790
cor(mu1_x1,mu1_x2) -0.39 0.25 -0.83 0.12 1.00 116450 138334
cor(mu1_Intercept,mu1_x1:x2) -0.25 0.30 -0.80 0.36 1.00 168084 143507
cor(mu1_x1,mu1_x1:x2) -0.15 0.31 -0.73 0.46 1.00 183502 149842
cor(mu1_x2,mu1_x1:x2) 0.46 0.34 -0.42 0.90 1.00 74380 115312
cor(mu2_Intercept,mu2_x1) -0.28 0.19 -0.63 0.12 1.00 99610 151304
cor(mu2_Intercept,mu2_x2) -0.20 0.28 -0.69 0.41 1.00 171919 151727
cor(mu2_x1,mu2_x2) -0.29 0.29 -0.80 0.31 1.00 158194 154069
cor(mu2_Intercept,mu2_x1:x2) -0.01 0.27 -0.55 0.49 1.00 165735 175828
cor(mu2_x1,mu2_x1:x2) 0.58 0.21 0.09 0.90 1.00 168609 177125
cor(mu2_x2,mu2_x1:x2) -0.52 0.30 -0.93 0.18 1.00 91228 124046
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
mu1_Intercept -0.56 0.07 -0.70 -0.43 1.00 50255 93363
mu2_Intercept 0.77 0.08 0.60 0.93 1.00 48193 90484
mu1_x1 0.56 0.11 0.34 0.79 1.00 67273 110857
mu1_x2 -0.23 0.07 -0.37 -0.10 1.00 144838 158255
mu1_x1:x2 0.03 0.11 -0.18 0.25 1.00 169239 171867
mu2_x1 0.47 0.11 0.26 0.68 1.00 72559 120684
mu2_x2 -0.19 0.07 -0.33 -0.06 1.00 138909 158043
mu2_x1:x2 0.25 0.13 -0.00 0.51 1.00 122729 166378
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma1 0.48 0.01 0.45 0.50 1.00 173057 197944
sigma2 0.45 0.02 0.42 0.48 1.00 150047 181474
theta1 0.56 0.02 0.53 0.60 1.00 129195 173993
theta2 0.44 0.02 0.40 0.47 1.00 129195 173993
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
and pp_check()
of model:
pp_check(m1, ndraws = 1000)
I would argue that this already looks much better than what I got using one Gaussian distributions instead of two but it doesn’t seem to be perfect.
Possibly interesting side note for readers who try to fit similar models: the model behaved well after I set more informative priors on the intercepts, have a very high number of samples (warmup = 7,000 and iter = 34,000 with 10 chains) and set init = 0
, which seemed pretty crucial in order to get it to converge.
Questions
I’ve looked at some responses on here and also read the chapter on finite mixture models in Bayesian Data Analysis from Gelman and co. but I am still not exactly sure how to approach this issue. Mainly, I have these questions:
1. Does my approach make any sense?
My first question is whether what I attempt makes any sense.
2. Is it acceptable to eye-ball the priors for the intercept based on the raw data?
Generally, I think my field prefers specifying priors before seeing the data but in this particular case, I feel this was not possible. I therefore wonder if the approach of eye-balling possible locations of the intercepts based on the data, is fine.
3. How do I do do inference on the effect of my predictors?
Assuming that my current model or a similar version makes any sense, I would like to know how I can make the inference whether or not there is an effect of my predictor variables. Since, we have two Gaussian distributions I have two slopes per predictor etc.
Is it possible to average mu1_x1
and mu1_x2
together? Or do I have to strictly interpret them in terms of the two Gaussian distributions? My idea was to do average them like this using the $\theta$s:
# Extract the posterior draws from the model
m1_draws <- as_draws_df(m1)
m1_draws$b_mu1_x1 <- m1_draws$b_mu1_s_true_dist
m1_draws$b_mu2_x1 <- m1_draws$b_mu2_s_true_dist
# Calculate the average effect of x1
m1_draws$b_x1 <- m1_draws$theta1*m1_draws$b_mu1_x1 +
m1_draws$theta2*m1_draws$b_mu2_x1
# Plot
ggplot(m1_draws, aes(x = b_x1)) +
geom_histogram(bins = 1000) +
theme_classic()
Is that reasonable? Relatedly, what is the prior of x_1 so I can compare it to the posterior of it?
When we look at the relationship between the posterior samples of mu1_x1
and mu1_x2
, we btw find they are not correlated, which I guess is a good thing.
ggplot(m1_draws, aes(x = b_mu1_x1, y = b_mu2_x1)) +
geom_point(alpha = 0.01) +
geom_smooth() +
theme_classic()
How are theta_1 and theta_2 interpreted?
When I want to describe the results of the model, would I then simply report the values of \theta_1 and \theta_2 along with the intercepts and the \sigma values to report the distributions? Can I interpret \theta as the proportion of the distribution in the data?