Bias when using partial pooling model to estimate standard deviation?

Suppose \sigma > 0 and let N>0 be a non-negative integer (counting number of studies, say). Let y_{ij} \sim \operatorname{normal}(\mu, \sigma) for 1\leq i \leq N and 1\leq j \leq n_i, where n_i is the number of measurements in study i.

We are given S_i = \sqrt{\frac{1}{n_i -1} \sum_{j=1}^{n_i} \left(y_{ij} - Y_i\right)^2} with Y_i = \frac{1}{n_i} \sum_{j=1}^{n_i} y_{ij} for all i.

Based on the observations S_i I would now like to infer \sigma. I came up with the following hierarchical model

data {
	int<lower=1> N; 	         // total number of studies
	vector<lower=1>[N] NSamples; // Number of samples per study
	vector<lower=0>[N] Stds;	 // Standard deviations
	real scale_cauchy;           // used in cauchy hyper prior
parameters {
	real mu_hyp;
	real<lower=0> sigma_hyp;
	vector[N] log_sigmas_raw;
transformed parameters {
  vector<lower=0>[N] sigmas = exp(mu_hyp + sigma_hyp*log_sigmas_raw);
model {
   vector[N] c_stat;
   mu_hyp ~ cauchy(0,scale_cauchy);            
   sigma_hyp ~ cauchy(0,scale_cauchy) T[0,];    
   log_sigmas_raw ~ normal(0,1);               
   c_stat = (NSamples - 1) .* square(Stds ./ sigmas);
   c_stat ~ chi_square(NSamples-1);

I used the following R code to simulate and check how well the Stan model can recover the parameter \sigma.

sm_pp <- stan_model("partial_pooling.stan")

run_sd_pp <- function(sm_pp, sigma, mu=50, NStudies=100, MaxSamplesPerStudy=1000, p=0.3, scale_cauchy=1.) {
  SizePerStudy <- rbinom(NStudies, MaxSamplesPerStudy, p)
  samples_per_study <- map(1:NStudies, ~rnorm(SizePerStudy[.], mu, sigma) )
  sds <- map_dbl(samples_per_study, sd)
  means <- map_dbl(samples_per_study, mean)
  stan_data <- list(
    NSamples = SizePerStudy,
    Stds = sds,
    N = NStudies,
  fit_pp <- sampling(sm_pp, data=stan_data, iter=4000, chains=4, cores=2, control=list(adapt_delta=.9))
  post_pp <- as.array(fit_pp)
  quantile(exp(as.vector(post_pp[,,"mu_hyp"])),probs = c(0.025, .5, .975))
sigma <- 2
rslts_pp <- map(1:100, ~run_sd_pp(sm_pp, sigma=sigma))
diffs_pp <- map_dbl(1:100, ~rslts_pp[[.]][["50%"]] - sigma)

What surprised me is that I observe that in roughly 20 percent of the cases the posterior-median of exp(mu_hyp) is smaller than the true \sigma but in roughly 80 percent of the cases it is larger than the true \sigma.

This surprised me, since I was expecting that the median estimate should have roughly equal probability of under-/overestimating the true sigma. Does this indicate that my model has a bias? By the way, I observe the same when I use the following complete pooling model (and using the posterior median of sigma as a point estimate):

data {
	int<lower=1> N; 	
	vector<lower=1>[N] NSamples; 
	vector<lower=0>[N] Stds;	 
parameters {
	real<lower=0> sigma;
model {
	vector[N] c_stat;
	c_stat = (NSamples - 1) .* square(Stds / sigma);
	sigma ~ normal(100,100); 
	c_stat ~ chi_square(NSamples-1);

Regarding calibration, I observe that the 95 percent posterior credible interval for exp(mu_hyp) (or sigma in the complete pooling case) contains in (roughly) 94 percent of the cases the true sigma, which I presume is a good sign.

To give some more background, what underlies both models is Cochran’s theorem, here.