As much as my limited understanding says, it would be appropriate to define priors for the model, which results I am going to publish (a correct Bayesian way). However, how to code my prior knowledge?
I have a hurdle model that would need weakly informative priors for all predictors at once, if possible. N = 5000 patients.
fit = brm(bf(received_treatment_hours ~ p1 + p2 + p3_fct + p4 + p5_fct + p6 + p7 + (1 | region), hu ~ p1 + p2 + p3_fct + p4 + p5_fct + p6 + p7 + (1 | region)), data = df, family = hurdle_lognormal(), cores = 3, chains = 3, prior = prior)
Histogram of outcome variable - zero inflated and there are also some extreme values
My prior knowledge about receiving treatment for lognormal part of the model:
Differenced more than 30 hours in received treatment hours are unlikely for between different predictor levels.
Differences more than 30 hours in received treatment hours are unlikely between the regions.
Thus, my prior for lognormal part should be:
prior = c(prior(student(3, 0, 15), class=b), #allows extreme values and 2xSD = 2x15 = 30 hours
prior(student_t(3, 0, 15), class= sd, group = county)) #prior for hierarchical part of the model, allows extreme values and 2xSD = 2x15 = 30 hours
But how to complement the prior for the hurdle part of the model?
I know that the proportion of zero values ranged quite a bit between the different levels of the predictors. From 5% up to 95%
I know that the proportion of zero values ranged quite a bit between the regions. From 10% up to 80%.
Finally, does my model have other parts that would need priors?