@amynang and @jsocolar thanks very much for your replies. I appreciate the time taken to help me on this topic. The example about transects and structural zeros is an interesting one that I will keep in mind for the future.
I will give an example from observational data. If you think of n in the example as number of hours an individual (imagine a capuchin) is observed, and Y as the amount of time the individual was seen doing extractive foraging of a particular fruit. With limited sampling, there will be individuals that were never seen doing this behavior, but if we had observed them long enough many more individuals would have been documented performing the behavior. In such a case itās more likely that the false zeros are present for individuals with less observation time. Individuals also vary in their likelihood of taking part in extractive foraging, making individuals that engage in it less also less likely to be documented performing the behavior over a period of time. Letās assume though that a hurdle model is appropriate because some individuals just never become extractive foragers, and different demographic variables drive the likelihood of becoming an extractive forager.
As for the continuous part, I could divide the Y by n, but then I would generate a rate while throwing away information about certainty in the estimate due to sampling effort (1 second observed over 30 minutes, versus 200 seconds observed over 20 hours. In this case, assume individuals also vary in the amount of time they dedicate to extractive foraging, conditional on doing the extractive foraging.
So, overall there is variation in the population in both the likelihood of an event (extractive foraging) and in the time spent performing the behavior (amount of time spent extracting, given that you are extractive foraging).
To understand variation in likelihood of doing extractive foraging, if I were to run the model as a binomial model (where count is the sum of events were Y > 0), I would set it up as :
mod_bin ā brm(count | trials(n) ~ x1 + x2 + (1 | Subject), family=binomial(), data=df).
but an equivalent syntax does not work in the hurdle portion of a hurdle model as far as I am aware.
If I were interested in variation in extractive foraging time (conditional on the event taking place), I could run a model as such:
mod_gamma ā brm(Y ~ x1 + x3 + offset(log(n)) + (1 | Subject), family=gamma, data=df).
but then if trying to use a hurdle_gamma model, how does one properly account for sampling effort in both the hurdle and the continuous portion (letās assume the case is that its reasonable to do so)? For the gamma portion, its seems offset(log(n)) would work. But it is not clear to me what to do with the hurdle portion of the model.
gamma_hu_model <- brm(
bf(
# gamma, non-zero (how much do you extractively forage)
Y ~ x1 + x2 + offset(log(n)) + (1 | RE) ,
# binomial, zeros(do you become an extractive forager)
hu ~ x1 + x3 + (1 | RE)
),
family = hurdle_gamma(link = "log"),
data = df
)