@betanalpha is it possible the the whole mistake comes from the fact that we are mapping K-1 to a K vector? Although it is not the softmax that does that per-se, I though to just make sure.
Our case I assume is number 2 (one-to many; i.e. K-1 vector to K simplex)