How to choose a prior distribution for a pseudo proportion (like 0 - 100% but sometimes goes under and over the limits)


I am enjoying using the brms package and trying to use it for my current dataset, which includes a unique OUTCOME variable. (sorry I initially wrote “predictor variable”, which was an error)

The outcome variable is calculated by subtracting the accuracy rate on the posttest (0 - 1) from the accuracy rate on the pretest (0 - 1). So, the value is usually between 0 - 1 like a beta distribution, but it can go under and over the 0 - 1 range, such as -0.01 and 1.02 (although very rate).

I tried non-informative prior distribution; however, the model does not converge. So, I am trying to specify appropriate prior distributions. (I am also hoping that this enhances the calculation of predictor intervals by bounding them within 0 - 1 to some extent.)

I would appreciate it if someone could suggest to me which distribution to use for such predictor variables.

Please and thank you.

It sounds like the range of your outcome variable is (-1, 1). In which case you could just add 1 and divide by 2 to get it into (0, 1) and then model it as beta-distributed. The prior distributions refer only to the effects (coefficients) of the predictors (not the outcome), so they do not need to be bounded to any specific range – in other words, conventional weakly informative priors like normal(0,1) would be suitable if your predictors are standardized or categorical.

However, an alternative modelling option is to predict your post-test with your pre-test, along with any other predictors under consideration. In that case, you would not need to rescale the response variable because it is already in (0, 1) and so suitable for beta regression.

Thank you very much for your reply, Andy (If I may)!

Your alternative modeling option sounds like a great idea! I don’t know why I didn’t think about it!
I will definitely try that!

Your first suggestion sounds interesting!

Actually, the value sometimes goes beyond 1 so, I could say, its range is (-0.5, 1.5).
So, I may be able to add 0.5 and divide by 1.5? Am I understanding right?

Also, if I do such a conversion, I guess, I need to do 1.5 times the estimated outcome variable, then subtract 0.5. Am I understanding this right?

I think before making any more suggestions, it would be helpful to understand how your OUTCOME variable can be greater than 1 if it is post-test minus pre-test and both tests are in (0, 1).

This is mainly to build on what @andymilne has said here

I think a simpler way would be to use the post-test as the outcome variable and the pre-test as a predictor variable. This explicitly allows you to model the relation between post-test and pre-test and it’s probably going to be easier to find the appropriate distribution for the accuracy rate. (An accuracy rate suggests that a binomial or Bernoulli distribution could work.)

Thank you for the replies, Andy and stijn.

Before seeing this comment from you, I somehow thought that the OUTCOME variable could be beyond 1, which is impossible from the calculation. I apologize for the confusion and thank you for helping me realize this.

Thank you for your suggestion, stijn. Now I understand that using post-test scores as an outcome variable is more straightforward and can be easily applied as I do not need to convert back to the original value.

I will try the model with the post-test as the outcome variable. Thank you very much, you two!