I am a little late to the party and @jsocolar has already said some things that I also had in mind, but since I promised to respond I will try to give my perspective anyway.
First of all, I am not sure I consider the posterior correlations between overall and varying coefficients a problem per se. They may be annoying to deal with in terms of somewhat lower sampling efficiency and perhaps some divergent transitions but this can be worked around most of the time in my personnel experience.
In reality, the different groups usually don’t know of each other (they are exchangable) so there is, to my current understanding, nothing in nature that implies the groups to interact in a way that they are actually summing (exactly) to zero. Accordingly, the “standard” multilevel model without this constraint is a much more sensible model of reality I think. The problem I see with the (hard) sum-to-zero constraint prior is that this will lead to bad uncertainty calibration especially of the overall coefficent \alpha (too narrow posterior) if the true data generating model would be the model without hard sum-to-zero constraint. We actually had a visiting researcher working on this kind of problem at Aalto who then apparently stopped working on it some time after he left. @avehtari do you remember? So if I remember his preliminary results right, indeed we would not achieve proper uncertainty calibration if we used the (hard) sum-to-zero contraint model on the “standard” data generating multilevel model. For small number of groups I think both actually did not perform so well but I cannot remember exactly right now.
If what I just said holds true, then it may come down to whether we like some more efficiency MCMC sampling or some better uncertainty calibration more. I argue for the latter at least if the efficiency issue can be worked around with some additional effort. In any case, I think this is a really interesting topic, and perhaps it makes sense to write a paper about it together (if you are interested) that complements the papers mentioned above in the thread. I did not read them in detail but my feeling was that they may not have answered the question completely satisfactorily yet.