Partial non-centered parametrizations in Stan

You can’t tune w dynamically because there’s no easily-differentiable criteria that you can use to inform updates, let alone one that one can easily compute online. Ultimately you have to run multiple chains for different values of w and choose the one that performs best which is a pain.

What ended up not performing well were the partial centerings themselves. I used the parameterization

data {
  int<lower=1> N;
  real<lower=0> sigma;
  vector[N] y;
}

transformed data {
  real w;
  w = 0;
}

parameters {
  real mu;
  real<lower=0> tau;
  vector[N] theta_tilde;
}

transformed parameters {
  vector[N] theta;
  {
    real tau_tilde;
    tau_tilde = pow(tau, 1 - w);
    theta = (1 - w * tau_tilde) * mu
            + tau_tilde * theta_tilde;
  }
}

model {
  mu ~ normal(0, 10);
  tau ~ cauchy(0, 10);
  theta_tilde ~ normal(w * mu, pow(tau, w));

  y ~ normal(theta, sigma);
}

and considered the performance between \sigma \rightarrow 0 which models informative data and \sigma \rightarrow \infty which models non-informative data. In the attached plot you can see that the fully centered and non-centered parameterizations dominate: save for the very, very narrow crossover where partial centerings may be slightly better either the fully centered or fully non-centered performs best.

time_per_ess.pdf (18.7 KB)

Perhaps more relevant to your application, the entire hierarchy need not be centered or non-centered all at once. You center or non-center each individual in the hierarchy so you can gather your individual groups into “informative data” and “non-informative data”, implement the former with a centered parameterization and the latter with a non-informative parameterization, and see if that improves anything.

1 Like