Setting informative priors for multinomial model and comparisons across levels from parameters

Hi everyone,

I’ve been working on a couple of multinomial models, working on eye fixations. For a brief background, I would like to check where the subjects directed their first fixation across a series of trials. There are 4 possible targets, so I would like to estimate the probability that the first fixation would be directed to each target. The subjects belong to one of 5 different groups (different n in each).

For this, my understanding is that I need a model with variable intercept as follows (subjects are clustered in groups from the “strategy” variable):

y | trials(total) ~ 1 + (1 | strategy/subject)

Now, I would like to estimate the parameters for all the outcomes, which I think is accomplished by setting refcat = NA in the brm function.

Some other posts on the topic mentioned that using that setting, I should provide informative priors in order for the model to run properly. My question here would be, which one should I use?

I decided to use a skew_normal distribution centered at -1.0987, which in log odds is the equivalent to 25%, assuming that there is no difference in the allocation of fixations across the 4 targets. Also, I set the sd as a half cauchy distribution. The code looks like this:

first$y <- cbind(first$`1`, first$`2`, first$`3`, first$`4`)  

    first1.1 <- brm(data    = first,
                    family  = multinomial(refcat = NA, 
                                          link   = logit),
                    formula = y | trials(total) ~ 1 + (1 | strategy/subject), 
                    prior   = c(prior(skew_normal(-1.0987, 2, 8), class = Intercept),
                                prior(skew_normal(-1.0987, 2, 8), class = Intercept, dpar = mu1),
                                prior(skew_normal(-1.0987, 2, 8), class = Intercept, dpar = mu2),
                                prior(skew_normal(-1.0987, 2, 8), class = Intercept, dpar = mu3),
                                prior(skew_normal(-1.0987, 2, 8), class = Intercept, dpar = mu4),
                                prior(cauchy(0, 2), class = sd, dpar = mu1),
                                prior(cauchy(0, 2), class = sd, dpar = mu2),
                                prior(cauchy(0, 2), class = sd, dpar = mu3),
                                prior(cauchy(0, 2), class = sd, dpar = mu4)
                     iter    = 2000, 
                     warmup  = 500,
                     chains  = 1, 
                     cores   = 1,  
                     control = list(adapt_delta = 0.9)

I still get a warning after running it:

Specifying global priors for regression coefficients in categorical models is deprecated and may not work as expected

So I am wondering which other priors should I specify.

Finally, for checking the draws from the posterior for differences, I would like to compare each of the four targets to each other, to see if one of them shows a particular preference from the different groups. I am a little bit lost about what method would be more appropriate.

Thanks for your help.

Copy of the data:
first.csv (3.0 KB)

1 Like

Unfortunately I don’t have time answer now, but maybe @MegBallard has time and can help?

1 Like

What is the output from this?

get_prior(data    = first,
          family  = multinomial(refcat = NA, link   = logit),
          formula = y | trials(total) ~ 1 + (1 | strategy/subject))
1 Like

Hi Solomon,

This is what I get:

                    prior     class      coef            group resp dpar nlpar bound
1                         Intercept                                                 
2  student_t(3, 18, 22.2) Intercept                                  mu1            
3    student_t(3, 0, 2.5)        sd                                  mu1            
4                                sd                   strategy       mu1            
5                                sd Intercept         strategy       mu1            
6                                sd           strategy:subject       mu1            
7                                sd Intercept strategy:subject       mu1            
8  student_t(3, 18, 22.2) Intercept                                  mu2            
9    student_t(3, 0, 2.5)        sd                                  mu2            
10                               sd                   strategy       mu2            
11                               sd Intercept         strategy       mu2            
12                               sd           strategy:subject       mu2            
13                               sd Intercept strategy:subject       mu2            
14 student_t(3, 18, 22.2) Intercept                                  mu3            
15   student_t(3, 0, 2.5)        sd                                  mu3            
16                               sd                   strategy       mu3            
17                               sd Intercept         strategy       mu3            
18                               sd           strategy:subject       mu3            
19                               sd Intercept strategy:subject       mu3            
20 student_t(3, 18, 22.2) Intercept                                  mu4            
21   student_t(3, 0, 2.5)        sd                                  mu4            
22                               sd                   strategy       mu4            
23                               sd Intercept         strategy       mu4            
24                               sd           strategy:subject       mu4            
25                               sd Intercept strategy:subject       mu4 

My understanding now, is that I should just specify rows 2, 5, 7, 8, 11, 13, 14, 17, 19, and 20. That way I stopped getting the message. Also, I changed to normal distributions instead of the skew normal I was using, since I think the way a multinomial model works is doing paired logistic regressions for the different options of y, right?

I still haven’t figured out how to answer the following questions:
Is there any meaningful difference between the posterior predictions for each mu parameter?
How to compare the parameter from let’s say mu1 vs mu2 and see if the difference is different from 0?

Thanks for your help!

I’m glad get_prior() helped. I found it really useful when learning about these kinds of models. As to “paired logistic regressions,” I’m not familiar enough with that term to comment.

You then ask:

  • Is there any meaningful difference between the posterior predictions for each mu parameter?

I suppose that really depends on what you mean by “meaningful difference.” Sounds like you’re basically talking about effect sizes, which is domain specific. Since you used the logit link, ask yourself whether the various mu parameters seemed impressively different on the log-odds scale. For me, this is why I actually prefer fitting models like this with the probit link, which basically puts parameters on a z-score scale. In my discipline, z-scores are easier to interpret than logits.

  • How to compare the parameter from let’s say mu1 vs mu2 and see if the difference is different from 0?

You can do this by retrieving the posterior samples with posterior_samples(), making new columns by subtracting one column from another, and summarizing those difference columns.

1 Like

Thanks for taking a look at this, Solomon. Yeah, that was my intuition but I wanted to double check in that. I compared the posterior samples for the parameters to check that. I also have seen in some resources that comparing a prior distribution with the posterior can give a BF for which of the two has more support.

The bottom line is that I want to test is if two parameters are equal or practically equivalent, since that is not something I could do on NHT approach, and that is an assumption that some of the decision strategies I am testing make. As you said, it’s more domain specific to talk about a meaningful difference, and I haven’t found something that can give me a criteria in the literature. That’s the reason I was wondering if there is a “standard” (I know canned methods are not ideal) way of checking this. If there is nothing you (or anyone who jumps in on the thread) can think of, I’ll keep digging. Thanks for your help!

As the docs say, using refcat can lead to all sorts of problems and is not very likely to be what you want - you can’t model all mu as absolute values, because your data only gives you information about the relative proportions of each outcome (that’s why one category is treated as “implied”).

But you want to compare, right?

I usually find it easier to use model predictions to determine differences. For example you can “ask the model” something like “If I repeated the experiment exactly, what would be the mean relative proportion in each category” - for this you would use posterior_epred. I wrote a bit on how this is done elsewhere (in a bit different context), feel free to ask for clarifications

Hope that clarifies more than it confuses :-)

1 Like

That makes a ton of sense! Thanks for linking those responses. Now, I am a bit confused on the different methods to extract information from the posterior (I’ve been looking to different packages so, sorry if I am referencing a mix-match of functions). Regardless of the package, I noticed that I can extract draws (I was using tidybayes::spread_draws) to get the estimates from each iteration, and I can look in there the parameters of interest for the group level random intercepts, which I then need to transform from log odds.

Also, there are functions that use the fitted draws (I found tidybayes::add_fitted_draws and brms::posterior_linpred) and I can give a new formula to get the main effect of the groups.

Finally, there are also functions that work with predicted draws (I checked tidybayes::add_predicted_draws, brms::posterior_predict, and I think your suggestion posterior_epred falls here too). I see the intervals I am getting to vary depending on the method used, and the functions themselves mention this, but I still can’t get around what they do differently and which one would be more appropriate in this case.

What is the difference between these? Sorry if this is a basic question, I’ve been learning how to do this by myself, and maybe I am missing some fundamentals.

The short answer is that:

  • the draws give you posteriors of the individual model parameters (in your case on log-odds scale)
  • posterior_linpred gives you the posterior of the whole linear predictor, i.e. the sum of all effects/terms (in your case on log-odds scale), without any observation noise.
  • posterior_epred gives you the posterior for the mean without observation noise (in most models, this means applying the link function to the result of posterior_linpred).
  • posterior_predict gives you the posterior of the observed values

As I said before, interpreting the individual parameters can get tricky in complex models. Which of the posterior_xxx functions you should choose if you want to use predictions in your inference depends on what you care about. For many “scientific” contexts you care about the difference in means so posterior_epred would be suitable. For many real-world applications, you would care about difference between actually observed values, so posterior_predict would be better. (I can also think of use cases for posterior_linpred but those are a bit contrived).

Does that make sense?

1 Like

Yes, that helps a lot. Thanks for your help Martin!