Feedback requested on beta regression phi prior - includes visualisation

I am working with some different priors for the phi/precision parameter of a beta regression in brms and would love to hear your thoughts or advice on the possibilities. This prior is set on the identity scale, so it does not get exponentiated (which sometimes is the case for phi or standard deviation). I have found that the following prior works well and also fits with my intuitions about the sort of data I am modeling:

set_prior(“student_t(2.5, 0.1 , 10)”, class = “phi”))

I.e., nu is 2.5 which allows quite heavy tails, mean is 0.1, and sd is 10, again allowing quite some spread for the data.

I am typically modeling psychologically-related responses (e.g., how confident are you that xyz will happen, from 0 to 100), and very high phi values are not all that common because people vary a lot in their responses. However, I of course want to include a wide range of possibilities for phi and not make the prior overly informative, nor overly vague.

This is how the prior looks as a density plot over possible phi values:

And here you can see it ‘in action’, having sampled 100 draws from a prior predictive check, and then set the mean of the beta distribution at either .5 (red) or .67 (blue) so that you can see the spread of the data implied by the prior when the mean and precision are combined.

This prior seems to have been working well, and can recover the parameters for simulated data with both low (e.g., less than 5) and really rather high phi values (100-200) that I wouldn’t actually expect to get in real data.

However, I would really like to know people’s thoughts about this prior - is there anything patently wrong with it? Do you have recommendations for an alternative? Would this prior strike you as strange in an actual paper and if so can you elaborate a bit on why or why not? Finally, would you consider this to be an ‘informative’, ‘weakly informative’, ‘terribly precise/vague’ etc prior?

Your feedback would be much appreciated!

2 Likes

Are you confident that a beta model is appropriate? You cannot include data at the scale extremes in the model if you use it. There have been a couple of discussions here on bounded psych responses, and I suggested the ZOIB as decent at some point (Sometimes I R: How to analyze visual analog (slider) scale data?) but no longer think it is a very good model for such data. I suspect this choice is more influential for your inference down the road than is the choice of prior.

What is the default brms prior on phi (if there is one)? How would you like to improve on it? Is your data sparse (want to make it more informative)? Do you have good info on potential locations?

1 Like

Hi Matti, thanks and yes I’ve seen your post - it was very nice to read. I really quite like the beta distribution for quite a bit of the data I’ve been collecting. So long at the responses are typically not getting right at the extremes it seems to work quite well - the extremes usually occur more in my experience with online questionnaires where some people seem to prefer to just flick their mouse in one direction or another. What is the reason you don’t like the ZOIB so much any more?

When I ask brms to ‘get_prior’ for this sort of model, it says

gamma(0.01, 0.01) phi

However, I haven’t quite managed to work out what this parameterisation of gamma is to explore and plot. It seems to be rather vague, because if I do not set my own prior on phi and run a prior predictive check, it hits a large amount of divergent iterations, whereas it runs very easily with the above prior of mine.

1 Like

Running the prior predictive without setting a prior, and accepting thousands of divergent transitions which could of course affect whether the phi parameter reflects the prior properly, it looks like this:

So I am thinking it is sort of just assigning near-uniform low probability over a large range of values (or crazy high probability at close to 0)?

Sorry I don’t have much to say about the prior right now. I would ask what you are trying to achieve with the prior. That is, why do you want to change from the default? Do you just want to get rid of the divergences?

The beta model is not good imho because it rejects the possibility–however rare in your data–that responses are at the extremes. (I’ve worked quite a bit with confidence etc. and have found that responses are very often at the extremes, but that obviously depends quite a bit on the exact materials.) the ZOIB model is not great because it is practically cumbersome (two different effects, one for the continuous component, one for the binary; gazillion parameters) and is not well theoretically justified to underlie the data generating process (whats going on when a participant makes a rating–i.e. do they make a decision between the continuous and binary response, is the beta density any good, etc.)

1 Like

I’m not sure about the prior. You could draw samples of the prior and visualize to get a better idea.

One of the key things I’m trying to do is really just to understand what I am actually telling the model to do. So in this case, it is also a bit more of a learning exercise than modeling data right now. I don’t like that I don’t understand the default prior and always like to be able to set my own - especially if it means I can properly run a prior predictive check.

From looking at the sample from phi using the default prior, is seems that the vast majority of phi values it entertains are all really close to 0, for example:

[1] 1.799607e-147 3.283728e+01 2.630090e-59 3.877169e-100 4.389716e-01 1.956723e-72 1.133318e-04

So every so often there is a beta distribution that has a visible peak - in this case, 5 times out of 100, and all the rest are flat with extremes at 0 and 1 - so that would certainly not be my prior expectation for the data and if it is what the prior anticipates I would quite like to change that.

I understand about the ZOIB not really being intuitive or necessarily matching the presumed underlying process. On the other hand, I expect there is probably an individual difference or state variable that might even predict the sort of person, mood, or question, that elicits large numbers of people going to one extreme or the other. Interesting to think about…

1 Like

Is there another reason why you are modeling phi with identity link? That will make it difficult to include predictors on it. The gamma prior is chosen because of the identity link, otherwise it will get a t prior.

1 Like

Sure - it’s possible to add a predictor on phi and then it usually is modeled on the log scale by brms. In this case, I was predicting the mean from a group designation, so I could also predict phi from group designation and allow for unequal variances. If I do that and try to run it as a prior only analysis, however, it doesn’t run because it requests a user specified prior on the phi parameter. So that would leave me with the same question - what a good prior on phi might look like if I want to set one! Like I say, it seems sometimes difficult to know what the defaults are, and hence why I would like to be setting my own

You seem to have done most of the work on checking the prior (good job!), I don’t think there is much more to advise in general beyond prior predictive checks. I do think that @matti 's points about model plausibility are important to think about, but in the end, you know your data best and you are the ultimate judge.

I think brms’s philosophy for setting default priors is a mix of “very weakly informative” (i.e. close to uniform) and “what has been frequently used in the literature”. Especially for less frequently used models (which you seem to have due to the identity link) it is quite possible nobody really put any thought into the default prior - gamma(0.01,0.01) is just a frequently used prior for strictly positive parameters in general. Feel free to override it.

Thanks for the feedback Martin. Do you have any thoughts on a sense of how ‘informative’ this prior might be considered? I understand that this is not a well-defined term and is context specific, but I was thinking it might be considered ‘weakly informative’ (but not very weakly informative) in the sense that it is based more on domain general understanding of these type of responses and also allows a wide range of phi values. So a fully ‘informative’ prior might be more one that is actually drawn from previous research on the particular research question and that might be more restrictive. As a general impression, do you think that is about right, or is it something that is just too underspecified and relative to say?

I honestly don’t know for certain :-) I think you make a good case for this being a sensible weakly informative prior, but I don’t understand these models well enough to have a strong opinion, @matti is clearly more knowledgeable and I would refer to him. Only other suggestion I have is to try to do a full prior predictive check - draw from the full prior including the predictors and generate some actual answers to see how the prior for phi interacts with the rest of the priors.

To get ahead with this it might be helpful to get a fully reproducible example (if [a subset of] data can be shared). Generally, with the kinds of variables mentioned, I’d find it very difficult to think of a non-item-specific prior for phi. And I suspect that priors on the coefficients on phi (not just the intercepts) would be important.

What I’m working on is actually a really quite basic tutorial to try and explain the implications of using different types of priors (how specific/informative they are), and precisely how to specify and visualise them. The data is simulated, and its meant for an audience that is not going to be into using lots of equations or maths to understand it, but rather to give people an intuitive and visual sense of different priors and what they actually are. I’m using beta regression in part also because I think it is useful to help people get to grips with coming up with priors when they have to think in terms of transformed parameters (like the logit link for the mean of a beta distribution).

So the data is super simple, as is the model - a simple group comparison. The initial intention is simply to show how to put different kinds of priors on the expected group means, and then on a general phi parameter (and I would later show how e.g., one could allow for phi to also be predicted by group). The rprop function below I think requires the ExtraDistr package (you use it in your ZOIB tutorial).

group1 <- tibble(group = “a”, value = rprop(n = 50, size = 7, mean = .58))

group2 <- tibble(group = “b”, value = rprop(n = 50, size = 7, mean = .44))

sample <- full_join(group1, group2)

The contextual idea is to understand the reported confidence of two groups, in terms of their estimate for how likely they would be to get into a program for which the objective chance is said to be around 50%, but one group is anticipated to be slightly less confident than they should be in an ‘objective’ sense, and the other group slightly more confident than they should be.