I was curious about how the posterior would differ between enforcing the sum-to-zero constraint and not.    So I’m going to do a little simulation, because it’s easier than doing real math.
Simple hierarchical model
Let’s consider a simple, canonical random-effects model for binary data organized into K groups.
\mu \sim \textrm{normal}(0, 1)
\\
\sigma \sim \textrm{lognormal}(0, 1)
\\
\alpha_k \sim \textrm{normal}(\mu, \sigma)
\\
y_{k, n} \sim \textrm{Bernoulli}(\textrm{logit}^{-1}(\alpha_k))
This is essentially the same model as I used in my case study,  Hierarchical Partial Pooling for Repeated Binary Trials.  If you’re not familiar with these models, you might also want to check out Betancourt and Girolami’s paper, Hamiltonian Monte Carlo for Hierarchical Models, which discusses what all the fuss is about.
An alternative formulation of the same model keeps the same priors on \mu and \sigma, but moves the global mean parameter \mu from acting as the prior location parameter to an additive term in the sampling distribution.
\alpha_k \sim \textrm{normal}(0, \sigma)
\\
y_{k, n} \sim \textrm{Bernoulli}(\textrm{logit}^{-1}(\mu + \alpha_k))
It’s an alternative formulation of the same model in the sense that the posteriors on \alpha, \mu, and \sigma will be identical.
Stan code
There are a lot of ways to code this model in Stan.  Here’s the traditional parameterization that codes the first model pretty much as written.
data {
  int<lower=0> K;
  int<lower=0> N;
  array[K, N] int<lower=0, upper=1> y;
}
parameters {
  real mu;
  real<lower=0> sigma;
  vector<offset=mu, multiplier=sigma>[K] alpha;
}
model {
  alpha ~ normal(mu, sigma);
  mu ~ normal(0, 1);
  sigma ~ lognormal(0, 1);
  for (k in 1:K)
    y[k] ~ bernoulli_logit(alpha[k]);
}
The variable alpha is declared to have an offset of mu and multiplier of sigma, which provides a non-centered parameterization under the hood.
In contrast, here’s a slightly different, but closely related model that uses only K parameters instead of K + 1 by setting
\alpha_K = -\textrm{sum}(\alpha_{1:K-1})
and adjusting the prior scale accordingly; see the Stan User’s Guide, section 1.7 on parameterizing centered vectors for more details on the adjustment.
data {
  int<lower=0> K;
  int<lower=0> N;
  array[K, N] int<lower=0, upper=1> y;
}
parameters {
  real mu;
  real<lower=0> sigma;
  vector[K - 1] alpha_pre;
}
transformed parameters {
  vector[K] alpha = mu + append_row(alpha_pre, -sum(alpha_pre));
}
model {
  alpha ~ normal(0, inv(sqrt(1 - inv(K))) * sigma);
  mu ~ normal(0, 1);
  sigma ~ lognormal(0, 1);
  for (k in 1:K)
    y[k] ~ bernoulli_logit(alpha[k]);
}
In both cases, I also snuck in a generated quantities block that reports the posterior for the sample mean and standard deviation of the coefficients \alpha,
generated quantities {
  real mean_alpha = mean(alpha);
  real<lower=0> sd_alpha = sd(alpha);
}
Simulated data and fit
Now I’m going to simulate data for K = 20 and N = 200 and fit using cmdstanr with the following R script.
library('cmdstanr')
ilogit <- function(u) 1 / (1 + exp(-u))
set.seed(1234)
K <- 20
mu <- 1.3
sigma <- 2.3
alpha <- rnorm(K, mu, sigma)
N <- 200
y <- matrix(NA, K, N)
for (k in 1:K)
    y[k, ] <- rbinom(N, 1, ilogit(alpha[k]))
data = list(K = K, N = N, y = y);
mod <- cmdstan_model('hier.stan')
fit <- mod$sample(data = data, chains=2, parallel_chains=4)
modc <- cmdstan_model('hier-ctr.stan')
fitc <- modc$sample(data = data, chains=2, parallel_chains=4)
And here are the results (for seed 1234).
   variable  mean median   sd  mad    q5   q95 rhat ess_bulk ess_tail
 mu          0.53   0.56 0.45 0.41 -0.27  1.24 1.02      130      146
 sigma       2.20   2.16 0.38 0.37  1.68  2.92 1.02      152      345
 alpha[1]   -1.61  -1.61 0.19 0.20 -1.92 -1.30 1.00     2786     1454
 alpha[11]   0.29   0.29 0.14 0.14  0.07  0.51 1.00     2684     1168
 alpha[20]   5.62   5.49 1.04 1.02  4.16  7.52 1.00      750     1321
 mean_alpha  0.69   0.69 0.08 0.08  0.58  0.83 1.00     1141     1266
 sd_alpha    2.17   2.15 0.16 0.15  1.93  2.46 1.00      990     1332
   variable  mean median   sd  mad    q5   q95 rhat ess_bulk ess_tail
 mu          0.69   0.68 0.08 0.08  0.56  0.82 1.01      575      859
 sigma       2.22   2.17 0.41 0.37  1.66  2.98 1.00     1585     1202
 alpha[1]   -1.62  -1.62 0.19 0.19 -1.95 -1.32 1.00     3043     1547
 alpha[11]   0.28   0.28 0.14 0.14  0.05  0.51 1.00     2892     1550
 alpha[20]   5.64   5.50 1.12 1.07  4.11  7.69 1.00      755     1064
 mean_alpha  0.69   0.68 0.08 0.08  0.56  0.82 1.01      575      859
 sd_alpha    2.17   2.15 0.17 0.16  1.93  2.48 1.00      818     1048
All of the parameters other than \mu have indistinguishable posteriors.  But \mu is both larger and has much smaller posterior 95% intervals in the second model.
Did we just get lucky?  Let’s try a different seed (123456).
> prt(fit)
   variable  mean median   sd  mad    q5   q95 rhat ess_bulk ess_tail
 mu          1.90   1.90 0.49 0.48  1.06  2.66 1.00      250      412
 sigma       2.31   2.25 0.46 0.39  1.69  3.14 1.01      303      594
 alpha[1]    3.90   3.87 0.49 0.47  3.16  4.74 1.00     2530     1208
 alpha[11]  -0.94  -0.94 0.16 0.15 -1.20 -0.68 1.00     2193     1619
 alpha[20]   2.60   2.59 0.29 0.30  2.16  3.08 1.00     2625     1474
 mean_alpha  2.38   2.37 0.13 0.12  2.18  2.60 1.00     1477     1356
 sd_alpha    2.24   2.21 0.20 0.17  1.97  2.62 1.00     1226     1259
> prt(fitc)
   variable  mean median   sd  mad    q5   q95 rhat ess_bulk ess_tail
 mu          2.38   2.37 0.13 0.13  2.19  2.62 1.01      377      584
 sigma       3.22   3.15 0.57 0.53  2.41  4.22 1.00     1047     1246
 alpha[1]    3.86   3.81 0.48 0.47  3.16  4.66 1.00     1495     1416
 alpha[11]  -0.95  -0.95 0.16 0.16 -1.22 -0.69 1.00     3066     1528
 alpha[20]   2.60   2.58 0.29 0.27  2.15  3.12 1.00     1971      818
 mean_alpha  2.38   2.37 0.13 0.13  2.19  2.62 1.01      377      584
 sd_alpha    2.28   2.24 0.23 0.20  2.00  2.72 1.01      348      419
So we can conclude that @betanalpha is right—changing the model to use a sum-to-zero constraint does change the posterior.  And arguably the non-identifiable model is the one we want, because as @betanalpha said, this is the one where we can naturally make posterior predictions for new groups.  Making predictions for new groups is important.  There might not be new states in a political model, but there are new testing sites for Covid modeling, and Andrew and I used posterior predictions for new groups in our Covid test sensitivity and specificity paper.
Help with intuitions?
I’d like to develop some intuition around why these results are so different.  Of course, they have completely different correlation structure, but then so does the traditional centered vs. non-centered parameterization (though in that case you can easily show the transformed variables have the same distributions).  What I’m really surprised about is that the posteriors for the coefficients \alpha_k don’t change even though the posteriors on \mu and \sigma are so different.
Given the posteriors for \alpha were indistinguishable between the two models, it’s not surprising that the sample mean and standard deviations were identical.  But I was surprised that \mu, \sigma didn’t closely track the sample mean and standard deviation of \alpha in either model.  With both seeds, the value of \mu matches the sample mean of \alpha_{1:K} in the second model.  But the standard deviations are way off in both cases.