Two questions: ①Rejecting initial value but still sampling. ②regarding divergent transitions

Oh, I see! The problem is this: softmax basically removes one degree of freedom, i.e.: a multinomial with N classes is fully described by N-1 parameters (because the probabilities need to sum to 1). This is actually what the simplex type does under the hood: simplex[3] is represented by two parameters on the unconstrained scale. So to make the model work with softmax, you can either fix one of the softmax inputs to 0 (this is the typical way to do multinomial regression) or you need to constraint the parameters to sum to zero (this is a bit more tricky, some discussion at Test: Soft vs Hard sum-to-zero constrain + choosing the right prior for soft constrain

Does that make sense?