We would like to fit a model combining a hurdle model and a finite mixture model. We want to model the probability for a zero, and then for the non-zero values, we want to assume that the observations come from two separate processes. The question is, is it possible to combine a hurdle model with a finite mixture model in brms code?
We are familiar with the mixture function where you can model two gaussian processes and their proportion. Can this be extended to a model that includes also the hurdle part? Naively we would think that this should work with using mixture (hurdle_lognormal, lognormal) for the family.
We tried the following code, and got this error:
Error: The parameter ‘hu’ is not a valid distributional or non-linear parameter. Did you forget to set ‘nl = TRUE’?
The defined mixture doesn’t really make sense to me.
The lognormal distribution only has support for data with positive values. The hurdle-lognormal modification allows the presence of zeroes by specifying a separate process for zero and non-zero components. I have a hard time conceiving of data that would require both the hurdle-lognormal and the lognormal?
Are you trying to run a mixture of two lognormals for the non-zero component? I’m not certain if you’ll be able to do that out of the box in brms, though it should be doable in Stan.
The data are reading times of words within a sentence. Sometimes readers skip words, resulting in reading time = 0. The conditional reading time (i.e. the reading time if the word is NOT skipped) seems to have two peaks, suggesting that we have two processes driving those observations (reflected as faster and slower reading time). And as all reaction time measures, the distribution is right-skewed (thus, lognormal).
So yes, we are trying to run a mixture of hurdle for the skipping (0 = skipped), and two lognormals for the non-zero component.
Traditionally, the word skipping has been split into a separate variable of skipping probability, and the conditional reading time into another variable, and these are then modeled separately. Would be cool to include them in the same model.
Any ideas of how this could be done would be very welcome!
I agree that a mixture of two hurdle-lognormal models should work. Are you fitting any predictors to the zero component? If not, you can use set_prior() to constrain the second hurdle parameter to be equal to the first (using the constant() prior); this should should make the results mathematically equivalent to what you want.
If you are fitting predictors to your prior, you should still be able to use a constant constraint, but I don’t remember the details of how to do it; in any case, a bit of experimentation should reveal the right syntax.