Our model has region effects whose variance is allowed to vary by c: \alpha_{i,c} \sim N(0, \sigma_{\text{region},c}) for regions i
We want to partially-pool information across c by modeling \sigma_{\text{region},c}.

I am not sure it is the absolute right approach, but I’ve had luck modeling by-group variances using an inverse gamma as a prior, and modeling the scale and rate hyperpriors with half-normals.

Just like how y[i] = mu + u[i] + e, you model:
log(sigma[i]) = lambda + u_lambda[i].

E.g., in brms, you can do y ~ 1 + (1|covar|c), sigma ~ 1 + (1|covar|c).
That puts a mixed model on both the location (mu) and scale (sigma). The scale model is log-linear, so it’s estimating: log(sigma_c) = lambda_0 + u_lambda[c], and both the random scale and location effects are permitted to covary.

Note: This also lets you predict variance for each observation, or each ‘c’, or whatever. So - not just hierarchical modelling, but linear modelling too.

Just wondering why you would prefer a gamma distribution over an exponential? Also, would love to see some code of just how to set up the transformed parameters block and sampling statements to understand fully how to set it up.

I won’t pretend to be an expert, but my reasoning for using the inverse gamma is that I often don’t expect there to be much probability density around zero for the variance parameters, so using an exponential doesn’t make sense to me. It’s also the case that in trying the exponential for reasonably complex models with lots of groups I’ve experienced sampling problems. With an inverse gamma, you can parameterize it such that the probability drops off immediately around zero, and also around some reasonable upper limit. But in my case I often have a pretty good sense of the reasonable expected scale of variation between groups, so feel pretty comfortable using a fairly strong prior.