I tried to fit your model, and I think you do not need the a parameter. Indeed, in this formulation, beta0 represents a deviation around the global intercept, an absence of group effect being represented by beta0 = 0. Thus, it is unnecessary to estimate the mean of group effect’s distribution (it would quite naturally be 0, some group being higher and others being lower), and this is why you observe problematic sampling behaviors.
Writing a model as follow allows to estimate how much the group effects vary compared to an absence of group effect. It has the wonderfull consequence of limiting overfitting, the hierarchical prior preventing group effect to go too far from zero. Doing otherwise would allow model to fit to strongly too your particular dataset, and would reduce its ability to predict new data from the same generating process.
Technically, you will see that the n_eff are much better!
m13_stan <- map2stan(
  alist(
    afd_prop ~ dnorm(mu, sigma),    
    mu <- alpha + beta0[state_id],
    beta0[state_id] ~ dnorm(0, sigma2),
    sigma ~ dnorm(0, 1),
    sigma2 ~ dnorm(0, 1),
    alpha ~ dnorm(0, 1)
  ),
  data = d
)
Because you are modelling proportions, it would be better to use a beta distribution as likelihood. However, you have to use the reparametrization described in classical papers on beta regressions (0<mu<1, 0<phi). I am not sure how to do this with map2stan, I tried something but it did not work well. It is necessary to put constraints on parameters and play with returned ones, but I do not work with map2stan!
EDIT : Reparametrization is not mandatory, but makes the interpretation of parameters far more easier!
m14_stan <- map2stan(
  alist(
    afd_prop ~ dbeta(mu*exp(phi), (1-mu)*exp(phi)),    
    logit(mu) <- alpha + beta0[state_id],
    beta0[state_id] ~ dnorm(0, sigma2),
    phi ~ dnorm(0, 1),
    sigma2 ~ dnorm(0, 1),
    alpha ~ dnorm(0, 1)
  ),
  data = d
)
Do not worry about negative WAIC, nothing constrains it to be positive. The importance is the difference between models!