Originally the transformations were largely motivated by the link functions typical in statistics – the log link function unconstrained positive variables, the logit link function unconstrained interval variables, etc. While an explicit argument has not been made for this choice within Stan, they are typical in statistics for a variety of theoretical and practical reasons. In particular these transformations are at the intersection of a variety of useful mathematical properties – convex, relatively uniform curvature (which also means that the Jacobians are nice), preservation of algebraic structure – that often manifest in nice practical properties.
Alternative parameterizations have occasionally been discussed, but none proved to be substantially better than the current implementations.
One general way of thinking about alternative transformations is that they all reduce to the current transformation composed with some smooth, one-to-one transformation of the unconstrained space into itself. More formally if X is the initial, one-dimensional space and \phi : X \rightarrow \mathbb{R} is unconstrained transformation then any* other smooth unconstraining transformation can be written as \psi = \gamma \circ \phi where \gamma : \mathbb{R} \rightarrow \mathbb{R}.
*Pretty sure this is true in one-dimension. In higher dimensions there may be exceptions without enough additional constraints, for example with the existence of the exotic \mathbb{R}^{4} s.
From this perspective the question of alternative parameterizations reduces to the increasingly common discussion of “what general differeomorphism will give the ideal posterior density function for my given computational method?”. For a formal discussion of this problem for Hamiltonian Monte Carlo in particular see for example [1910.09407] Incomplete Reparameterizations and Equivalent Metrics. These kinds of questions have become in vogue in the machine learning literature lately with the rise of “generative modeling” (quotes to indicate the machine learning use of “generative” and not the probabilistic modeling use that’s more common in Stan discussions) but I strongly believe that automatically tuning bespoke reparameterizations for each Stan program is an intractable problem which is one of the reasons why I’ve been trying to push back on the introduction of additional compositional features to the Stan compiler.
To clarify the unit vector discussion has been centered around two transformations which aren’t actually compatible. One is an approximate transformation and one is exact; the approximation introduces another layer of complexity to that particular discussion which can distract from the more relevant points here.
The typical use of Stan’s constrained types is for when the available domain expertise is most interpretable on the constrained space. For example a half-normal prior is much easier to specify with a positively-constrained variable than trying to work out what the corresponding density is for an unconstrained variable. Note that all of the constrained types have at least one natural, complementary prior model – gamma and inverse gamma for positive variables, beta for interval variables, Dirichlet for simplex variables, LKJ for positive-definite matrices, at the like.
When the available domain expertise better manifests through some latent construction then the most useful Stan program will follow that construction rather than rely on constrained types (although once the construction is well-understood then it can be abstracted into a prior model directly on the constrained space; see for example Ordinal Regression.
Sometimes these constructions are compatible with the existing constraining transformations, but often they’re not. For example because the stick breaking construction for a simplex treats each component asymmetrically it can be awkward for building exchangeable prior models. Not impossible, of course, just awkward. More often one needs a custom transformation that is better suited to the available domain expertise, which one can implement directly using the wonderful expressiveness of the Stan language.
We’ve long talked about exposing the transformation functions used for the constrained variable types in the Stan language. I do agree that this can be helpful in some cases and harmless in the worst cases, and hence worth exposing. That said I don’t think that adding transformations in the compilation/post-processing of a Stan program facilitates this kind of construction.