I’ve got a standard centered version of my relatively straight forward model working (unfortunately can’t easily share the data).
However, it had 4.5% divergences with the default adapt_delta
=0.8 so I tried to fit a non-centered version since there are only 5 groups (J
).
I started off with just non-centering the alpha
group-level intercept as in the model below.
However, I kept getting this warning:
Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 4 Exception: normal_lpdf: Location parameter is inf, but must be finite! (in '/tmp/Rtmp7SzXGt/model-7b4a715c15ec.stan', line 46, column 4 to column 89)
Chain 4 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine
I printed out the alpha values to see what was going on, and alpha
is nan
from the very first iterations:
Chain 2 alpha: [nan,nan,nan,nan,nan] mu_alpha: 1.74325 sigma_alpha: 1.59339
Chain 3 alpha: [nan,nan,nan,nan,nan] mu_alpha: 1.90013 sigma_alpha: 0.417693
Chain 1 alpha: [nan,nan,nan,nan,nan] mu_alpha: 1.45901 sigma_alpha: 1.10584
Chain 4 alpha: [nan,nan,nan,nan,nan] mu_alpha: 0.478545 sigma_alpha: 1.41995
Afterwards, I stop receiving so many of the warnings but the sampler seems stuck in some crazy high region of mu_alpha
with sigma_alpha
stuck at 0.
I kept the program running but I gave up after 10 hours - the original centered version took 30 mins.
Chain 2 alpha: [nan,nan,nan,nan,nan] mu_alpha: 2.31385e+08 sigma_alpha: 0
Chain 1 alpha: [nan,nan,nan,nan,nan] mu_alpha: -1.13414e+08 sigma_alpha: 0
Chain 3 alpha: [nan,nan,nan,nan,nan] mu_alpha: -1.40449e+08 sigma_alpha: 0
Chain 4 alpha: [nan,nan,nan,nan,nan] mu_alpha: 0.478545 sigma_alpha: 1.41995
Have I made any obvious mistakes in this non-centered parameterisation?
I have read that sometimes for large N (which I guess I have, with ~100K samples) a centered version works fine, maybe that’s the case here?
One other question, after giving up on the non-centered version I tried different increasing adapt_delta
instead, which surely reduced divergences to 0.25% with adapt_delta
=0.99, but this actually reduced my n_eff
of some parameters from 1000 to 800. Is this an issue?
data {
int<lower=0> N;
int<lower=0> J;
int<lower=0> n_terms;
int sensor[N];
row_vector[n_terms] x[N];
real O3[N];
}
parameters {
// Group parameters
real mu_alpha;
real<lower=0> sigma_alpha;
vector<lower=0>[n_terms] sigma_beta;
vector[n_terms] mu_beta;
real<lower=0> mu_measurement_error;
real<lower=0> sigma_measurement_error;
// Observation level parameters
vector[J] alpha_raw;
vector[n_terms] beta[J];
vector<lower=0>[J] measurement_error;
}
transformed parameters {
// Observation level
vector[J] alpha;
print("alpha: ", alpha, " mu_alpha: ", mu_alpha, " sigma_alpha: ", sigma_alpha);
alpha = mu_alpha + sigma_alpha * alpha_raw;
}
model {
// Group level priors
mu_alpha ~ normal(0, 10);
sigma_alpha ~ cauchy(0, 5);
mu_beta ~ normal(0, 2.5);
sigma_beta ~ cauchy(0, 5);
mu_measurement_error ~ cauchy(0, 1);
sigma_measurement_error ~ cauchy(0, 5);
// Observation level priors
measurement_error ~ cauchy(mu_measurement_error, sigma_measurement_error);
for (j in 1:J)
beta[j] ~ normal(mu_beta, sigma_beta);
alpha_raw ~ std_normal();
// Observation equation
for (n in 1:N) {
O3 ~ normal(alpha[sensor[n]] + x[n] * beta[sensor[n]], measurement_error[sensor[n]]);
}
}
i