I was looking at implementing the simplex type in a model, rather than the non identified softmax transform. I had some questions about the simplex transform and how it scales to large simplexes.
- The source code refers to a “centered stick breaking process.” Is this the same as the regular stick-breaking process referred to in the manual?
- Why is the transform done in the real scale rather than the log scale? It seems like log-scale might be more stable, especially if you are interested in
log(p)in the end anyway. But maybe I’m overlooking something?
- Given the way the simplex is constructed, it seems like the ordering of the variables might be important for numerical stability/efficiency. Is it?
- It looks like with a simplex of size
kwould have the derivative propegate through
k-1nodes, but if it were constructed through recursive splitting it would only have to propegate through
2*log(k)nodes, which might be more numerically stable for large simplexes.
Is my intuition correct here, or are these wrong or unimportant?