You can’t tune w dynamically because there’s no easily-differentiable criteria that you can use to inform updates, let alone one that one can easily compute online. Ultimately you have to run multiple chains for different values of w and choose the one that performs best which is a pain.
What ended up not performing well were the partial centerings themselves. I used the parameterization
data {
int<lower=1> N;
real<lower=0> sigma;
vector[N] y;
}
transformed data {
real w;
w = 0;
}
parameters {
real mu;
real<lower=0> tau;
vector[N] theta_tilde;
}
transformed parameters {
vector[N] theta;
{
real tau_tilde;
tau_tilde = pow(tau, 1 - w);
theta = (1 - w * tau_tilde) * mu
+ tau_tilde * theta_tilde;
}
}
model {
mu ~ normal(0, 10);
tau ~ cauchy(0, 10);
theta_tilde ~ normal(w * mu, pow(tau, w));
y ~ normal(theta, sigma);
}
and considered the performance between \sigma \rightarrow 0 which models informative data and \sigma \rightarrow \infty which models non-informative data. In the attached plot you can see that the fully centered and non-centered parameterizations dominate: save for the very, very narrow crossover where partial centerings may be slightly better either the fully centered or fully non-centered performs best.
time_per_ess.pdf (18.7 KB)
Perhaps more relevant to your application, the entire hierarchy need not be centered or non-centered all at once. You center or non-center each individual in the hierarchy so you can gather your individual groups into “informative data” and “non-informative data”, implement the former with a centered parameterization and the latter with a non-informative parameterization, and see if that improves anything.