Non-centered model not fitting

I’ve got a standard centered version of my relatively straight forward model working (unfortunately can’t easily share the data).
However, it had 4.5% divergences with the default adapt_delta=0.8 so I tried to fit a non-centered version since there are only 5 groups (J).

I started off with just non-centering the alpha group-level intercept as in the model below.

However, I kept getting this warning:

Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: normal_lpdf: Location parameter is inf, but must be finite! (in '/tmp/Rtmp7SzXGt/model-7b4a715c15ec.stan', line 46, column 4 to column 89)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine

I printed out the alpha values to see what was going on, and alpha is nan from the very first iterations:

Chain 2 alpha: [nan,nan,nan,nan,nan] mu_alpha: 1.74325 sigma_alpha: 1.59339 
Chain 3 alpha: [nan,nan,nan,nan,nan] mu_alpha: 1.90013 sigma_alpha: 0.417693 
Chain 1 alpha: [nan,nan,nan,nan,nan] mu_alpha: 1.45901 sigma_alpha: 1.10584 
Chain 4 alpha: [nan,nan,nan,nan,nan] mu_alpha: 0.478545 sigma_alpha: 1.41995

Afterwards, I stop receiving so many of the warnings but the sampler seems stuck in some crazy high region of mu_alpha with sigma_alpha stuck at 0.
I kept the program running but I gave up after 10 hours - the original centered version took 30 mins.

Chain 2 alpha: [nan,nan,nan,nan,nan] mu_alpha: 2.31385e+08 sigma_alpha: 0 
Chain 1 alpha: [nan,nan,nan,nan,nan] mu_alpha: -1.13414e+08 sigma_alpha: 0 
Chain 3 alpha: [nan,nan,nan,nan,nan] mu_alpha: -1.40449e+08 sigma_alpha: 0 
Chain 4 alpha: [nan,nan,nan,nan,nan] mu_alpha: 0.478545 sigma_alpha: 1.41995

Have I made any obvious mistakes in this non-centered parameterisation?
I have read that sometimes for large N (which I guess I have, with ~100K samples) a centered version works fine, maybe that’s the case here?

One other question, after giving up on the non-centered version I tried different increasing adapt_delta instead, which surely reduced divergences to 0.25% with adapt_delta=0.99, but this actually reduced my n_eff of some parameters from 1000 to 800. Is this an issue?

data {
  int<lower=0> N;
  int<lower=0> J;
  int<lower=0> n_terms;
  int sensor[N];
  row_vector[n_terms] x[N];
  real O3[N];
parameters {
  // Group parameters
  real mu_alpha;
  real<lower=0> sigma_alpha;
  vector<lower=0>[n_terms] sigma_beta;
  vector[n_terms] mu_beta;
  real<lower=0> mu_measurement_error;
  real<lower=0> sigma_measurement_error;

  // Observation level parameters
  vector[J] alpha_raw;
  vector[n_terms] beta[J];
  vector<lower=0>[J] measurement_error;
transformed parameters {
  // Observation level
  vector[J] alpha;
  print("alpha: ", alpha, " mu_alpha: ", mu_alpha, " sigma_alpha: ", sigma_alpha);
  alpha = mu_alpha + sigma_alpha * alpha_raw;
model {
  // Group level priors
  mu_alpha ~ normal(0, 10);
  sigma_alpha ~ cauchy(0, 5);
  mu_beta ~ normal(0, 2.5);
  sigma_beta ~ cauchy(0, 5);
  mu_measurement_error ~ cauchy(0, 1);
  sigma_measurement_error ~ cauchy(0, 5);

  // Observation level priors
  measurement_error ~ cauchy(mu_measurement_error, sigma_measurement_error);
  for (j in 1:J)
    beta[j] ~ normal(mu_beta, sigma_beta);   
  alpha_raw ~ std_normal(); 

  // Observation equation
  for (n in 1:N) {
    O3 ~ normal(alpha[sensor[n]] + x[n] * beta[sensor[n]], measurement_error[sensor[n]]);


A few suggestions:

  • Are you sure the observation equation is right? At present the entire O3 vector is used on every loop. Perhaps you meant O3[n]?
  • Try a smaller value for the init argument (ex. 1 or .1).
  • Try student-t with df>1 or normal everywhere you currently have cauchy; cauchy tends to behave badly both numerically and in terms of what it implies as a prior.
  • If using cauchy, definitely run some prior predictive checks to ensure it implies your real expectations; you might be surprised by the consequences of the cauchy heavy-tail.
  • If using cauchy, try the reparameterization suggested by the SUG
  • Instead of doing the non-centering by hand, use the new-ish <offset=...,multiplier=...> syntax for less chance of implementation errors. (Also see here)
  • Your priors for the measurement error are such that they convey peak credibility for zero measurement error. Consider a peaked-away-from-zero prior, which can be elegantly achieved by replacing the 0-bound measurement_error parameter with an unbounded log_measurement_error parameter that is then exponentiated when it comes to its use in the likelihood ( ... , exp(measurement_error[sensor[n]]) );)

One issue is that you have multiple hierarchical models stacked together here. In particular measurement_error and beta are also being modeled hierarchically and may need different parameterizations.

The Cauchy population model on measurement_error is very, very, very bad. It will generally result in very pathological funnel geometry that is nearly impossible to remedy even with careful parameterizations. I strongly recommend using a normal population model here, at least to start.

Note also that monolithic centered and non-centered parameterizations are just two extremes – if the likelihood functions in each group are heterogeneous then you may very well need to parameterize each differently. See Hierarchical Modeling for more including example code to help organize the parameterizations.