Softmax versus softmax2

esoturda · November 24, 2021, 11:23am

From what I understand, Stan’s softmax is just

softmax <- function(x) {
  exp(x)/sum(exp(x))
}

But Koster and McElreath in

Koster, J., & McElreath, R. (2017). Multinomial analysis of behavior: statistical methods. Behavioral Ecology and Sociobiology , 71 (9), 1-14.

Use an alternative (that forces maximum of x to 0) in a personalized link function

softmax2 <- function(x) {
 x <- max(x) - x
 exp(-x)/sum(exp(-x))
}

despite drawing samples with Stan’s

categorical_logit( x )

I understand, why the normalization preventing extremely large numbers exp(integer>1) might be favourable, but they achieve different contrafactuals from the posterior samples when using an alternative function. Is there something I am missing?

rfc · November 24, 2021, 7:57pm

The stan math library implements a normalised version of softmax that subtracts the max from the input array:

github.com

stan-dev/math/blob/develop/stan/math/prim/fun/softmax.hpp#L53-L54

    
      
          const auto theta = (v_ref.array() - v_ref.maxCoeff()).exp().eval();
          return theta.array() / theta.sum();

If we are talking about precise operations on real numbers, not limited precision floating-point numbers, then softmax and softmax2 are equivalent functions.

E.g. let y = \max_{i=1}^n x_i. Then

\frac{\exp(x_j)}{\sum_{i=1}^n \exp(x_i)} = \frac{\exp(-y)}{\exp(-y)} \frac{\exp(x_j)}{\sum_{i=1}^n \exp(x_i)} = \frac{\exp(-y)\exp(x_j)}{\sum_{i=1}^n \exp(-y)\exp(x_i)} = \frac{ \exp(x_j -y )}{\sum_{i=1}^n \exp(x_i -y)} .

esoturda · November 24, 2021, 9:09pm

Fantastic! Thank you, of course. Sorry for a stupid question. :)

Topic		Replies	Views
Assistance interpreting user's manual recommendations for using softmax function for hierarchical reinforcement learning model Modeling	2	674	December 16, 2020
Softmax(matrix) General	8	4854	May 17, 2019
Jacobian of softmax tansformation of a (n-1 degrees of freedom) unbounded parameter Modeling jacobian-adjustment	28	3085	August 23, 2023
I want to increase the speed of sampling using ”log_sum_exp” function Algorithms rstan	3	419	July 28, 2023
Categorical_logit_rng(vector beta) Modeling	5	1002	December 22, 2021

Softmax versus softmax2

Related topics