There seems to be a difference between people’s descriptions of ‘multinomial’ and ‘categorical’ multilevel models on internet forums, mc-stan posts, and stack exchange posts. I thought that a categorial variable is equivalent to a multinomial variable, meaning a variable with multiple, unordered categories. Surely, I’m missing a piece of the puzzle. Can someone please explain whether there is actually a difference between the two terms in theory and in brms, and if so, explain what the difference is?
I have two more questions about the priors of my particular multilevel model. Before that, here is some background information on the type of data I have:
Predictor variable (2): both categorical, 3 and 4 categories, N=25
Control variable (1): categorical, 25 categories, N=25
There are levels (a hierarchy) in my data: the outcome variable is at level 1 and the predictors and the control variable are at level 2.
Goal : Is there a correlation between the predictor variables and the outcome variable, taking the control variable into account?
For example , 168 students responded to a question that had 6 possible answers (outcome var.). These 168 students are unequally distributed across 25 schools (control var.). Each of these 25 schools has 2 characteristics (predictor var.). Do students’ answers correlate with the kind of school they attend, controlling for the school itself?
Model:
ex.prior <- c(prior_string(“normal(0,1)”, class=“b”))
fit <- brm(
formula= GS ~ IS + SS + (1|School),
family= categorical (link=“logit”),
prior= ex.prior,
data= pdata,
cores=3,
control = list(adapt_delta = 0.9)
)
When running get_prior() on my potential model, there are 52 priors that I have to set. I’ve read that it is more efficient to create priors in strings (i.e. with prior_string() ), but I am unsure of whether I should set priors on all 52 of the parameters or not. The Bayesian part of my brain shouts “yes!”, but in terms of doing this practically, the priors that I set often return errors. I would just like to use weak, regularizing priors, and I am a bit confused as to how to do this for the different kinds of parameters. Do I set priors on all of the classes, the coefs, the groups, or the dpars?
I mentioned in question 2 that I want to use weak, regularizing priors, as I’m still trying to understand priors for categorical variables. From what I’ve read on different forums and papers, it seems that “normal(0,1)” for coef b, and “cauchy(0,1)” for coef sd, are used quite often as weak, regularizing priors for categorical models. Does anyone have any recommendations for literature on priors for categorical variables?
For statisticians, the categorical distribution is a special case of the multinomial distribution with one observation that has that simplex of probabilities. Lots of non-statisticians use the phrase categorical distribution to mean a multinomial distribution
You don’t have to set priors on all of what’s shown in get_prior. In fact, you mostly don’t have to set any priors yourself in order to achieve convergence (not that I would necessarily recommend setting no priors yourself). You may also look at the doc of set_prior.
normal(0, 1) or cauchy(0,1) should be ok if your predictors are roughly of scale 1 and you are using a logit-link. normal(0, 1) may be a bit too narrow if your data suggests more extreme patterns. There may be papers about that, but right now, I can’t remember where I have seen them.
If I understand your comment correctly, what I described above are categorical variables and should also be treated as categorical in brms (and not multinomial and therefore MCMCglmm). Is that correct?
I don’t know what you mean with the “scale of 1”. Could you please clarify what you mean with:
if your predictors are roughly of scale 1
I’m trying to understand how brms treats categorical variables (outcomes and predictors) in the multilevel model to set my priors accordingly. Are categorical variables treated as binary in the model?
If my priors were normal(0,1) on a categorical variable, then would I be expecting a normally distributed variable with a mean of 0 and sd of 1? When would a broader prior like normal(0,10) be appropriate for a categorical variable?
I meant standard deviation of 1 or at least not of magnitudes smaller (say sd = 0.1).
Are categorical variables treated as binary in the model?
Not sure I understand the question.
There is no general rule when one prior is more appropriate than the other. It all depends on the goal of inference.
If you don’t have much experience with setting priors, I would probably be using a little bit wider prior, say normal(0, 3) or normal(0, 5) on a logit scale, but that’s really just an ad-hoc recommendation I wouldn’t take too serious.
The reason I asked about the multinomial vs. categorical variables is because it listed separately in the brms overview paper table 1 (“brms: An R Package for Bayesian Multilevel Models using Stan”). I was trying to understand what the difference is to make an informed decision on the kind of model I want to run (MLM vs. MCMCglmm).
I’m sorry for not being very clear in my question. I’ll try to explain it a bit better here.
My confusion lies in how to set the priors for my categorical predictors. When the variable is numeric/continuous, I find it quite straight forward to set a prior. But what I don’t understand yet is how to set priors on a variable when the categories are A,B,C,D,E,F.
Are the categorical variables treated by brms as numeric, in that the category “A” becomes “1” (B=2, C=3, …)?
Is the prior on a categorical variable equivalent to the probability of receiving category X versus all other categories?
Are the priors set on all of the categories as a whole (A-F) or on each individual category level (e.g. category C of variable X)? For example, say we have the variable “school” with 4 different schools. Do I set a prior for each of the 4 schools? And if so, what does the prior actually do? For example, if I set a prior on school 1 that was normal(0,1), does this mean that the coefficient of school 1 has a normally distributed mean probability of 0 with a sd of 1?
Hope I’ve used the right lingo in the right places…
Categorical predictors in brms are treated in the same way as in other packages as I use the same underlying functions. That is we use dummy coding by default.
That depends on the coding. If you are not familiar with coding of categorical variables, I am sure there are some nice tutorials for this on the internet.
Again, it depends on the coding what the parameters mean and thus what we put a prior on.
I also have a question about the categorical model family. I guess it is more of a statistics, rather than a Bayesian or brms question. If I should start a new post, please let me know and I will do that.
Data: Participants made drawings of 6 different emotions. They were allowed to choose 1 out of 16 colours to draw an emotion (1 colour per emotion). Colours were chosen without replacement (so if they picked a colour for one emotion they could not use the same colour again for the next drawings).
Is the categorical model family appropriate for this type of outcome variable (which colour that they picked) or does the model assume that categories can be picked again?