In the Stan documentation on GitHub, the half-normal(0,1) or half-t(4,0,1) are recommended as default choices for the prior of scale parameters in hierarchical models. Is the choice of 1 as sigma in the half-normal and half-t broadly applicable for most problems or should it be larger (e.g. 10)? For example when used as a scale parameter in Hierarchical logistic regression, I find that the mean of the scale parameter (from posterior samples) tend to be much larger than 1 and often exceed 10.
Also, why is it desirable to use a prior where the mode is at zero (half-normal and half-t). Would a distribution with a positive mode be better?