Interpretation of multinomial regression coefficients

I think you actually should be leaving out a category, by e.g. fixing x_beta[i][1] = 0 for all i.

I discussed a similar issue recently at Two questions: ①Rejecting initial value but still sampling. ②regarding divergent transitions (note that categorical_logit and multinomial_logit actually include a softmax transformation). Feel free to ask for clarifications if it is hard to understand.

What seems to happen is that Stan is able to sample the posterior despite it being ill-behaved, but because the model has too many degrees of freedom, you don’t learn much about your indiviual parameters as witnessed by the very wide marginal posteriors. I would however expect that if you looked at samples of the transformed values from softmax(beta[1]'), the posterior would be reasonably narrow.

Best of luck with your model!