Divergences and low EBFMI in multi-normal model with measurement noise

I am trying to set up a model for inferring a covariance matrix among 2 (or more) variables when each variable is observed with measurement noise. This is my first attempt at a model:

N and B are two vectors of length n, \theta and \delta are the “true” values of these two variables and Sigma is the covariance matrix between them.

data {
int<lower=0> n;
row_vector[n] N;
row_vector[n] B;
}
transformed data {
int<lower=0> D = 2;
}
parameters {
vector[D] mu;

vector<lower=0, upper=pi()/2>[D] tau_unif;
cholesky_factor_corr[D] L_Omega;

real<lower=0> sigma_B;
real<lower=0> sigma_N;

matrix[D, n] theta_delta_alpha;
}
transformed parameters {
cholesky_factor_cov[D] L_Sigma;
vector<lower=0>[D] tau;

matrix[D, n] theta_delta;
row_vector[n] theta;
row_vector[n] delta;

tau = 2.5 * tan(tau_unif);
L_Sigma = diag_pre_multiply(tau, L_Omega);

for(n_i in 1:n) {
theta_delta[, n_i] = mu + L_Sigma * theta_delta_alpha[, n_i];
}

delta = theta_delta[2, ];
theta = theta_delta[1, ];
}
model {
mu ~ std_normal();
L_Omega ~ lkj_corr_cholesky(2);

to_vector(theta_delta_alpha) ~ std_normal();

sigma_N ~ std_normal();
sigma_B ~ std_normal();

N ~ normal(delta, sigma_N);
B ~ normal(theta, sigma_B);
}


I have attempted here to apply all of the reparameterization tricks mentioned in the manual. However, when I use the attached python code (run_joint_model_test.py (1005 Bytes) ) to generate fake data and run the models I get very poor sampling (even when boosting the adapt_delta to try to deal with divergences). Here are the warnings I got on a recent run of the model:

WARNING:pystan:n_eff / iter below 0.001 indicates that the effective sample size has likely been overestimated
WARNING:pystan:Rhat above 1.1 or below 0.9 indicates that the chains very likely have not mixed
WARNING:pystan:77 of 4000 iterations ended with a divergence (1.93 %).
WARNING:pystan:Try running with adapt_delta larger than 0.99 to remove the divergences.
WARNING:pystan:73 of 4000 iterations saturated the maximum tree depth of 10 (1.82 %)
WARNING:pystan:Run again with max_treedepth larger than 10 to avoid saturation
WARNING:pystan:Chain 1: E-BFMI = 0.0357
WARNING:pystan:Chain 2: E-BFMI = 0.0521
WARNING:pystan:Chain 3: E-BFMI = 0.047
WARNING:pystan:Chain 4: E-BFMI = 0.0141
WARNING:pystan:E-BFMI below 0.2 indicates you may need to reparameterize your model

I’m not necessarily seeing anything obvious in parallel plots/pairs plots, e.g.

Are there any obvious fixes that could help with sampling here?
Thanks so much for all the help!

This kind of model needs informative priors; half-cauchys aren’t going to cut it.

If sigma_N, sigma_B and tau are given as data your fake data works

INFO:pystan:No divergent transitions found.
INFO:pystan:No transitions that ended prematurely due to maximum tree depth limit
INFO:pystan:Chain 1: E-BFMI (= 1.07) equals or exceeds threshold of 0.2.
INFO:pystan:Chain 2: E-BFMI (= 0.857) equals or exceeds threshold of 0.2.
INFO:pystan:Chain 3: E-BFMI (= 0.926) equals or exceeds threshold of 0.2.
INFO:pystan:Chain 4: E-BFMI (= 0.886) equals or exceeds threshold of 0.2.
INFO:pystan:E-BFMI indicated no pathological behavior


(There are some spurious warnings because L_Omega[1,1] is fixed and hence has n_eff=NaN but that’s of no consequence.)
If they’re given narrow lognormal priors centered on the true value

tau ~ lognormal(0, 0.1);
sigma_N ~ lognormal(log(0.5), 0.1);
sigma_B ~ lognormal(log(0.5), 0.1);


it still works (although there are clear signs of things starting to break down.)

INFO:pystan:No divergent transitions found.
INFO:pystan:No transitions that ended prematurely due to maximum tree depth limit
INFO:pystan:Chain 1: E-BFMI (= 0.632) equals or exceeds threshold of 0.2.
INFO:pystan:Chain 2: E-BFMI (= 0.685) equals or exceeds threshold of 0.2.
INFO:pystan:Chain 3: E-BFMI (= 0.581) equals or exceeds threshold of 0.2.
INFO:pystan:Chain 4: E-BFMI (= 0.699) equals or exceeds threshold of 0.2.
INFO:pystan:E-BFMI indicated no pathological behavior


But even moderate lognormal(0,0.5) seems to cause divergencies.

The problem is that if sigma_N/B is small then the data constrain the correlation tightly but if sigma is large the the data are uninformative and posterior distribution of the correlation is mostly determined by the prior. The data can’t really tell you which is the case so the sampler tries to switch back and forth between these two very different distributions and cannot adapt properly to either.

2 Likes

Wow thank you so much! That makes a lot of sense