# Help with specifying a multilevel Gaussian-Lognormal mixture model in brms

Operating System: macOS v11.6
R version 4.1.2
brms Version: 2.17.0

Hi,

I have data coming from an experiment (with humans) in which participants type words in 2 conditions (indicated later on by the mode variable, recoded with a contrast as -0.5/+0.5). These data are movements times (MTs) in seconds, ranging from close to 0sec to a few seconds.

These movement times are roughly Lognormally distributed when the participant correctly performs the task. But, some participants did not perform the task well in one of these two conditions, and in this (wrongly performed) condition, the movements times are rather Normally distributed.

I would like to build a model to distinguish between these two “classes” of participants:

1. good participants: participants that perform the task well (i.e., Lognormally distributed MTs in both conditions)

2. bad participants: participants that do not perform the task well (i.e., Lognormally distributed MTs in one condition, and Normally distributed MTs in the other condition)

My approach has been to use a mixture of two distributions, a Gaussian and a Lognormal one, as follows.

mix <- mixture(gaussian, lognormal)

Then to fit the following brms model.

mixture_model <- brm(
formula = bf(
movement_time ~ 1 + mode + (1 + mode | participant),
theta2 ~ 1 + mode + (1 + mode | participant)
),
family = mix,
inits = 0,
chains = 4, cores = 4,
warmup = 2000, iter = 5000,
data = df2,
sample_prior = TRUE,
file = "./models/mixture_model.rds"
)


Where the condition (mode) is allowed to affect both the movement time and the mix proportion theta2. This seems to “work”, as the average mix proportion (i.e., plogis(theta2_Intercept)) is quite different between “good” and “bad” participants, but still, I am not sure that this model does exactly what I want. More precisely, I would like to build a mixture model where theta2 (i.e., the probability that a given movement time comes from the Lognormal or the Normal distribution) depends on both the participant and the condition (i.e., mode). Is the current formulation of the mode achieving that aim?

Here is a clumsy and incomplete (omitting the varying effects and priors) math notation if it can help.

\begin{aligned} z_{n} &\sim \mathrm{Bernoulli}\left(p_{\text{good}}\right)\\ p_{\text{good}} &= \alpha + \alpha_{\text{participant[n]}} + \left(\beta + \beta_{\text{participant[n]}} \right) \times \text{mode}_{n}\\ \text{MT}_{n} &\sim \begin{cases}\mathrm{LogNormal}\left(\mu_{1}, \sigma_{1}\right), & \text {if } z_{n} = 1\\ \mu_{1} = \alpha + \alpha_{\text{participant[n]}} + \left(\beta + \beta_{\text{participant[n]}} \right) \times \text{mode}_{n}\\ \mathrm{Gaussian}\left(\mu_{2}, \sigma_{2}\right), & \text {if } z_{n} = 0\\ \mu_{2} = \alpha + \alpha_{\text{participant[n]}} + \left(\beta + \beta_{\text{participant[n]}} \right) \times \text{mode}_{n}\\ \end{cases}\\ \end{aligned}

Thank you for your help!