From what I understand, Stan’s softmax is just
softmax <- function(x) {
exp(x)/sum(exp(x))
}
But Koster and McElreath in
Koster, J., & McElreath, R. (2017). Multinomial analysis of behavior: statistical methods. Behavioral Ecology and Sociobiology , 71 (9), 1-14.
Use an alternative (that forces maximum of x to 0) in a personalized link function
softmax2 <- function(x) {
x <- max(x) - x
exp(-x)/sum(exp(-x))
}
despite drawing samples with Stan’s
categorical_logit( x )
I understand, why the normalization preventing extremely large numbers exp(integer>1) might be favourable, but they achieve different contrafactuals from the posterior samples when using an alternative function. Is there something I am missing?