I am fitting a model that is something like (simplified here for the example)

with a N(0,10) vague prior on \gamma and a half-Cauchy on \delta and \sigma.

My problem is that I have about k = 1, \dots, 10^5 groups, and the number of observations in each, N_k, ranges from 1-2 to a few hundred.

Stan is working OK (I was kind of surprised that it deals with this many groups, but am using sufficient statistics by k), but as expected, if I set up a non-centered parametrization, it gives me a not-so-good mixing (ESS 30 out of 1000) in the k's where I have a lot of observations, and conversely if I use a centered parametrization the mixing for the groups with few observations suffer (and also the \mu_k).

I have been trying to read up on this, and found that there are so-called partially non-centered parametrizations (eg Papaspiliopoulos et al 2003), but my understanding is that they would involve tuning a parameter. I understand it conceptually, but I wonder how to do this in Stan, or if there is an alternative that works better with Stan.