Help specifying the appropriate priors

Dear all,

I would highly appreciate your input on this. I’m running a model using the default priors. Here it is - just in case it matter.

m1 <- brm(
  percentvoiced ~ position*voicing*target_vowel+poa+
    (position*voicing*target_vowel+poa| subject) +
  data = dat1,
  sample_prior = TRUE,
  family = gaussian(),
  cores = 8,
  control = list(adapt_delta = 0.999, max_treedepth = 15),
  seed = 1432)

When I run pp_check() on the model to assess the fit, I got the following plot which shows that the fit is not good.

Could anybody instruct me on what aspect of the model I should modify? I guess the priors, if so how should I best modify this? Should it be for the parameters or just the intercept?

Sorry if the question is vague and not informative but I am happy to narrow it down as per your question/request.

Thank you in advance!

Hi @Dallak
Just from looking at your data (and then glancing at the formula), could it be that your response variable is based on a 0 to 100 slider? That would explain the modes at 0, 50 (I guess) and 100. You could either model this with a 0 1 inflated beta by transforming it to the [0,1] interval or, maybe as an ordinal model?

A starting point for the ordinal model could be this paper, I don’t have a good primer for the 0-1-inflated beta but there probably are blog posts out there about it.
So instead of tweaking the priors, I would start with the more fundamental architecture of your model.


Thank you, @scholz for this input!

Yes, you are correct. The response variable is based on a 0 to 100 slider. I would love to go for the first suggestion. For that I need to add family = zero_one_inflated_beta() instead of family = gaussian(), right? Do I also need to rescale the response variable and be from 0 to 1 instead of 0 to 100?

Thank you again for your input!

Yes, for a beta distribution (and the inflated variants) your outcome variable has to be in the unit interval.
You then have additional parameters that you could also predict from your data for the two processes that cause 0 and 1.
But I don’t have a lot of experience on this kind of data so not sure I can help you further with the analysis.

Thank you @scholz , for this.
I will give it a go, and post the output here for anybody else who could provide further help.
Thank you again.

I’d also recommend starting with the simplest model you can, given the data and what you want to do. Like maybe just a few predictors? Then you can tighten up the priors on the simple model. Once that looks solid you can expand the model and priors.

1 Like

Thank you all for your suggestion!

I guess it works after using “family = zero_one_inflated_beta()” and rescaling the response variable following @scholz’s suggestion.