Prior and Model Selection for Multinomial Logistic Regression

Dear Stan Community,

For the analysis of a long-term study, I would like to perform a multinomial logistic regression to predict the affiliation to different progression groups (e.g. previously healthy, now ill or previously healthy, now still healthy, etc.) using different predictors. So far, I have only had experience with frequentist models in this context and have used packages such as mlogit. But now I would really like to try my hand at baysian approaches and find the brms package very clear and descriptive.

However, my prior knowledge of model specification is very limited. Does anyone have tips or sources for the brms code for a multinomial logistic regression (is family = categorical sufficient?) and how do I find or choose meaningful uninformative to weakly informative priors for my betas in this model?

I would be happy about every answer and every piece of advice!

Can you tell more about your data like the number of observations, the number of target categories, the number of predictors, possible multilevel/hierarchical structure, and do you plan to include interaction or smooth terms? This information would help to know how carefully you would need to think about the priors, as if you have many observations and few categories and predictors, the data will dominate anyway.

Couple examples of multinomial modeling with brms

2 Likes

Dear avehtari,
thank you very much for your response!

My data comprises approx. N = 1200 persons. The surveys have been conducted in 4 waves so far. There is also missing data.

The number of categories should be 4 to reflect the different trajectories:
Healthy - Healthy, Healthy - Sick, Sick - Healthy and Sick - Sick.
The reference category should probably be healthy-healthy.
As predictors, I would like to use several numerical variables (approx. 3) in the form of questionnaire sum scores as well as age (numerical, continuous) and gender (factor) as demographic variables.
I have not yet considered a hierarchical structure or nesting, as I am basically only interested in the comparison between the first and the last survey.

Thank you in advance for your advice!

Sounds like you have enough data and not too many predictors that you can start with weak priors, use prior-likelihood sensitivity analysis (e.g. with priorsense package supported by brms) to check whether data are sufficiently informative, and then think harder about the priors if needed. You can post here your model checking results for further comments.

1 Like

Thank you really much, I will try that!