Possibility of Logit Parameterized Multnomial in Stan

I saw in the latest Stan release that there is now a logit parameterized glm categorical distribution (https://mc-stan.org/docs/2_23/functions-reference/categorical-logit-glm.html).

In the logit parameterized glm categorical distribution, you have M outcomes y\in\{1,\cdots,N\}^M, design matrix x\in\mathbb{R}^{M\times K}, intercepts \alpha\in\mathbb{R}^N, and regression coefficients \beta\in\mathbb{R}^{K\times N}. categorical_logit_glm calculates the log pmf corresponding to y_i\sim\textsf{Categorical}(softmax(\alpha + x_i \beta)).

You could imagine the case where some number m of the M data points share the same covariates in the design matrix (this could occur if all of the covariates are discrete). In this case, I believe it’s the case (based on https://github.com/stan-dev/math/blob/develop/stan/math/prim/prob/categorical_logit_glm_lpmf.hpp, but my c++ isn’t great so it’s hard to tell) that categorical_logit_glm will repeat the computations for the pmfs of the m data points m times, even though they only need to be done once (mainly the concern is doing a log_sum_exp m times when it only needs to be done once). More generally you could just imagine the case where you have multinomial instead of purely categorical data. Personally I’m motivated by using Stan for log-linear modelling of multivariate categorical data

This repetition of computations could be avoided by passing in the data as a matrix of counts y\in \mathbb{N}^{N \times M}, and having something like a multinomial_logit_glm that calculates the log pmf corresponding to y_i\in\mathbb{N}^N being multinomial, i.e. y_i\sim\textsf{Multinomial}(N_i, softmax(\alpha + x_i \beta)), where N_i=\sum_{i=1}^Ny_i.

Would it be possible for a logit parameterized glm multinomial distribution to be added to stan?

1 Like

There’s an open issue for (non-glm) multinomial_logit

It’s pretty old, I guess no one got around to implementing it.

As for glm, we don’t have binomial_logit_glm either.

1 Like