Softmax versus softmax2

From what I understand, Stan’s softmax is just

softmax <- function(x) {
  exp(x)/sum(exp(x))
}

But Koster and McElreath in

Koster, J., & McElreath, R. (2017). Multinomial analysis of behavior: statistical methods. Behavioral Ecology and Sociobiology , 71 (9), 1-14.

Use an alternative (that forces maximum of x to 0) in a personalized link function

softmax2 <- function(x) {
 x <- max(x) - x
 exp(-x)/sum(exp(-x))
}

despite drawing samples with Stan’s

categorical_logit( x )

I understand, why the normalization preventing extremely large numbers exp(integer>1) might be favourable, but they achieve different contrafactuals from the posterior samples when using an alternative function. Is there something I am missing?

The stan math library implements a normalised version of softmax that subtracts the max from the input array:

If we are talking about precise operations on real numbers, not limited precision floating-point numbers, then softmax and softmax2 are equivalent functions.

E.g. let y = \max_{i=1}^n x_i. Then

\frac{\exp(x_j)}{\sum_{i=1}^n \exp(x_i)} = \frac{\exp(-y)}{\exp(-y)} \frac{\exp(x_j)}{\sum_{i=1}^n \exp(x_i)} = \frac{\exp(-y)\exp(x_j)}{\sum_{i=1}^n \exp(-y)\exp(x_i)} = \frac{ \exp(x_j -y )}{\sum_{i=1}^n \exp(x_i -y)} .

2 Likes

Fantastic! Thank you, of course. Sorry for a stupid question. :)