Hello,
the latest version of my model has started throwing some “2 of 4 chains have a NaN E-BFMI” warning. Can anybody provide any tips to help me track down the problem (let alone fixing it!).
My current modelling pipeline involves fitting the model to the first half of my data, and then using a generated_quantities call to make predictions for the held-out second half of the data.
If I inspect the fitted model object, I get the following summary for the main parameters:
variable | mean | median | sd | mad | q5 | q95 | rhat | ess_bulk | ess_tail |
---|---|---|---|---|---|---|---|---|---|
lp__ | -1.5e+04 | -1.4e+04 | 3.16 | 0.00 | -1.4e+04 | -1.4e+04 | 1 | 2019 | NA |
b_a[1] | 0.0e+00 | 0.0e+00 | 0.05 | 0.04 | -7.0e-02 | 7.0e-02 | 1 | 2268 | 1249 |
b_a[2] | 1.7e+00 | 1.7e+00 | 0.53 | 0.50 | 8.3e-01 | 2.6e+00 | 1 | 1205 | 1398 |
b_stick[1] | 6.9e-01 | 6.9e-01 | 0.21 | 0.21 | 3.4e-01 | 1.0e+00 | 1 | 720 | 1194 |
b_stick[2] | 2.9e+00 | 2.9e+00 | 0.41 | 0.39 | 2.2e+00 | 3.5e+00 | 1 | 1404 | 1444 |
rho_delta[1] | 1.2e+00 | 1.2e+00 | 0.08 | 0.07 | 1.1e+00 | 1.4e+00 | 1 | 859 | 1154 |
rho_delta[2] | 9.1e-01 | 9.1e-01 | 0.03 | 0.03 | 8.7e-01 | 9.5e-01 | 1 | 1462 | 1369 |
rho_psi[1] | -2.1e+00 | -2.1e+00 | 0.24 | 0.24 | -2.5e+00 | -1.7e+00 | 1 | 1136 | 1292 |
rho_psi[2] | -1.1e+00 | -1.1e+00 | 0.14 | 0.13 | -1.3e+00 | -8.8e-01 | 1 | 1812 | 1694 |
sigma_u[1] | 9.0e-02 | 8.0e-02 | 0.06 | 0.06 | 1.0e-02 | 1.9e-01 | 1 | 594 | 1010 |
sigma_u[2] | 8.3e-01 | 8.1e-01 | 0.15 | 0.14 | 6.2e-01 | 1.1e+00 | 1 | 1637 | 1582 |
sigma_u[3] | 3.0e-01 | 3.0e-01 | 0.06 | 0.06 | 2.2e-01 | 4.1e-01 | 1 | 1276 | 1326 |
sigma_u[4] | 8.9e-01 | 8.6e-01 | 0.21 | 0.19 | 6.2e-01 | 1.3e+00 | 1 | 1241 | 1485 |
sigma_u[5] | 2.1e+00 | 2.1e+00 | 0.45 | 0.42 | 1.5e+00 | 3.0e+00 | 1 | 1333 | 1594 |
sigma_u[6] | 1.6e+00 | 1.6e+00 | 0.33 | 0.30 | 1.2e+00 | 2.2e+00 | 1 | 1470 | 1139 |
sigma_u[7] | 9.0e-02 | 9.0e-02 | 0.02 | 0.02 | 6.0e-02 | 1.4e-01 | 1 | 1650 | 1557 |
sigma_u[8] | 4.3e-01 | 4.2e-01 | 0.14 | 0.14 | 2.1e-01 | 6.8e-01 | 1 | 924 | 763 |
If I look further down, there are a load of NA values for my variance-covariance matrix, but I am assuming this is normal behaviour?
|L_u[1,1] | 1.0e+00| 1.0e+00| 0.00| 0.00| 1.0e+00| 1.0e+00| NA| NA| NA|
|L_u[2,1] | -1.2e-01| -1.3e-01| 0.29| 0.30| -5.9e-01| 3.7e-01| 1| 400| 650|
|L_u[3,1] | 0.0e+00| 0.0e+00| 0.29| 0.29| -4.7e-01| 4.7e-01| 1| 437| 738|
|L_u[4,1] | -1.3e-01| -1.4e-01| 0.28| 0.30| -5.8e-01| 3.3e-01| 1| 551| 923|
|L_u[5,1] | -9.0e-02| -1.0e-01| 0.27| 0.28| -5.3e-01| 3.7e-01| 1| 537| 750|
|L_u[6,1] | 1.8e-01| 2.0e-01| 0.28| 0.28| -3.1e-01| 6.1e-01| 1| 561| 727|
|L_u[7,1] | 4.0e-02| 5.0e-02| 0.29| 0.32| -4.5e-01| 5.2e-01| 1| 704| 1195|
|L_u[8,1] | -5.0e-02| -5.0e-02| 0.30| 0.31| -5.3e-01| 4.7e-01| 1| 893| 1385|
|L_u[1,2] | 0.0e+00| 0.0e+00| 0.00| 0.00| 0.0e+00| 0.0e+00| NA| NA| NA|
|L_u[2,2] | 9.5e-01| 9.7e-01| 0.07| 0.04| 8.0e-01| 1.0e+00| 1| 790| NA|
If I check the test set results, I find that the accuracy is high and the model is doing a good job at predicting behaviour.
My r-hat statistics are generally good. 99% of them are < 1.01 , and the largest value is around 1.02.
My model is a little complex but the main code is below.
As always, I really appreciate this community.
parameters {
// these are the parameters we wish to fit to the data
////////////////////////////////////
// fixed effects
////////////////////////////////////
array[K] real b_a; // weights for class A compared to B
array[K] real b_stick; // stick-switch rates
array[K] real<lower = 0> rho_delta; // distance tuning
array[K] real rho_psi; // direction tuning
///////////////////////////////
// random effects
///////////////////////////////
// random effect variances:
// 4*K as we have four fixed effect parameters x K conditions
vector<lower=0>[4*K] sigma_u;
// declare L_u to be the Choleski factor of a correlation matrix
cholesky_factor_corr[4*K] L_u;
// random effect matrix
matrix[4*K,L] z_u;
}
transformed parameters {
/*
combine fixed and random effects
we do this here so that the code in the model{} block is easier to read
*/
// this transform random effects so that they have the correlation
// matrix specified by the correlation matrix above
matrix[4*K, L] u;
u = diag_pre_multiply(sigma_u, L_u) * z_u;
// create empty arrays for everything
array[K] vector[L] u_a, u_stick, u_delta, u_psi;
// calculate
for (kk in 1:K) {
u_a[kk] = to_vector(b_a[kk] + u[4*(kk-1)+1]);
u_stick[kk] = to_vector(b_stick[kk] + u[4*(kk-1)+2]);
u_delta[kk] = to_vector(rho_delta[kk] + u[4*(kk-1)+3]);
u_psi[kk] = to_vector(rho_psi[kk] + u[4*(kk-1)+4]);
}
}
model {
/////////////////////////////////////////////////////
// Define Priors
////////////////////////////////////////////////////
// priors for random effects
sigma_u ~ exponential(1);
L_u ~ lkj_corr_cholesky(1.5); // LKJ prior for the correlation matrix
to_vector(z_u) ~ normal(0, 1); // centred prior for random effects, so this should always be N(0,1)
// priors for fixed effects
for (kk in 1:K) {
target += normal_lpdf(b_a[kk] | prior_mu_b_a, prior_sd_b_a);
target += normal_lpdf(b_stick[kk] | prior_mu_b_stick, prior_sd_b_stick);
target += normal_lpdf(rho_delta[kk] | prior_mu_rho_delta, prior_sd_rho_delta);
target += normal_lpdf(rho_psi[kk] | prior_mu_rho_psi, prior_sd_rho_psi);
}
// create some variables
vector[n_targets] weights;
int t, z, x; // trial, person, and condition
//////////////////////////////////////////////////
// // step through data row by row and define LLH
//////////////////////////////////////////////////
for (ii in 1:N) {
t = trial[ii];
z = Z[t];
x = X[t];
weights = compute_weights(
u_a[x, z], u_stick[x, z], u_delta[x, z], u_psi[x, z],
to_vector(item_class[t]), S[ii], delta[ii], psi[ii],
found_order[ii], n_targets, remaining_items[ii],
d0);
// get likelihood of item selection
target += log(weights[Y[ii]]);
}
}