Divergent transitions for hierarchical model for binomial proportion

Hello,
I am trying to fit what I think is a relatively straight-forward model that ends up with divergent transitions. I have proportions that I want to estimate for a number of centers along with the “mean” proportion (model shown below). The model should be similar to a meta-analysis for a binomial proportion. The chains end up with divergent transitions when I fit the model, and it looks like it is due to the transformed parameters of alpha0 and beta0. Does anyone have a suggestion for how to change the code to no longer have divergent transitions?
Thank you in advance for any input.

data {                                    // Data Block
  int<lower=2> S;                         // number of strata
  int<lower=0> nyes[S];                   // number of "yes" responses for each stratum
  int<lower=0> nno[S];                    // number of "yes" responses for each stratum
  
}

transformed data{
 int<lower=0> ntot[S];                    // Total count for each stratum

 for (si in 1:S) {
      ntot[si]=nyes[si]+nno[si];
 }
 
}


parameters {                              // Parameters Block (primary parameters to be estimated)
  vector<lower=0,upper=1>[S] pis;         // Probability of "yes" for each stratum
  real<lower=0,upper=1> pibar;            // Probability of "yes" across all strata
  real<lower=0,upper=pibar*(1-pibar)> pi_sigma2;        // Variance for prior for pi_s

}


transformed parameters { 
  real<lower=0> psmax;            // maximum for uniform distribution for pi_sigma2

  alpha0=pibar/pi_sigma2*((1-pibar)*pibar-pi_sigma2);   
  beta0=alpha0*(1-pibar)/pibar;
  psmax=pibar*(1-pibar);

}

model {                       // Model Block
  
  
    //priors
    for (si in 1:S) {
         pis[si] ~ beta(alpha0,beta0);   // prior for image means
	}
    pibar~beta(0.5,0.5);
	pi_sigma2~uniform(0,psmax);
   
  for (si in 1:S){
       nyes[si] ~ binomial(ntot[si],pis[si]);
 }
 
}

Can you explain your motivation for your priors? Usually hierarchical models of binary outcomes involve a latent Gaussian that is connected to the likelihood via a logit or probit link; presumably you know that’s insufficient as a model for your data generating process?