Rogue chain

Hello all,

I am running my model with 4 chains, and periodically, I can see one chain that goes temporary rouge (like the green chain in the plot):

I see no divergent transitions and all diagnostics are great (except for the R-hat with the green chain).

Any ideas why this could be and how to solve it?

Oh wow, that’s a really interesting case of the diagnostics failing to catch a clear sampling failure. Tagging @avehtari so they see this.

I would suspect that the model has a non-identifiability possibly a subtle one and you lucked out with inits that mostly stayed in one mode. Is there another parameter that shows a similar step behaviour over those iterations? It looks like the parameter you’re showing has modes at 3 and 0; can you try sampling with init=0 then again with init=3? I’d expect that most chains will stay at the 0 in the former and 3 in the latter. (though maybe not; there’s clearly enough overlap that even when initially at 3 chains can jump to 0, and presumably vice-versa).

Maybe also post the full model so we can see if the identifiability issue is obvious in the structure.

1 Like

Also tagging in @Lu.Zhang

Can you clarify which all diagnostics including the function calls and package versions?

There seems to be a valid second mode near 0. If those values are not sensible considering the application specific information, you may add a prior reflecting that application specific information.

1 Like

The example I presented was with init=0, but I see similar convergence when I don’t define init values.

Sure – I use cmdstan 2.25.0 through cmdstanpy 0.9.67.
I run /bin/diagnose and get the following output:

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory for all transitions.

Effective sample size satisfactory.

The following parameters had split R-hat greater than 1.05:
  risk_lt_pr[38,7], risk_lt_pr[3,12], risk_lt_pr[86,12], beta_lt_pr[86,12], risk_lt[86,12]
Such high values indicate incomplete mixing and biasedestimation.
You should consider regularizating your model with additional prior information or a more effective parameterization.

Processing complete.

In the example in my first message is one of the risk_lt_pr.

I don’t have a prior in the typical sense for these parameters - they are sampled from a linear dynamic system.

Ah, my bad @avehtari, I misread the original report and missed that Rhat did indeed signal that sampling went awry. Had I seen that I would not have tagged you in.

@nerpa , can you post your model so we can track down the identifiability issue?

1 Like

Hi @mike-lawrence, sorry for the delay!
This is my model part:

model {
    A ~ normal(0.5, 0.05);
    to_vector(B) ~ normal(0, 0.05);
    to_vector(C) ~ normal(0, 0.05);
    to_vector(X1) ~ normal(0, 0.1);
    mu_d ~ normal(0, 0.1);
    sigma_r ~ normal(0, 0.1);
    real X[N,W,Xdim];

     for (s in 1:N) {                
        for (w in 1:W) {

            if (w == 1) {
                X[s,w,] = to_array_1d(inv_logit(X1[s,])); 
            } else {
                X[s,w,] = to_array_1d(inv_logit(diag_matrix(A) * to_vector(X[s,w-1,]) + B * to_vector(U[s,w-1,])));
            }  

            risk_lt_pr[s,w] ~    normal(mu_d[1] + C[1,] * to_vector(X[s,w,]),sigma_r[1]);
            beta_lt_pr[s,w] ~ normal(mu_d[2] + C[2,] * to_vector(X[s,w,]),sigma_r[2]);  

	        vector[Tr_lt[s,w]] Utility1 = to_vector(hi_p_lt[s,w,:Tr_lt[s,w]]) .* pow(to_vector(hi_narr_lt[s,w,:Tr_lt[s,w]]), risk_lt[s,w]) + to_vector(low_p_lt[s,w,:Tr_lt[s,w]]) .* pow(to_vector(low_narr_lt[s,w,:Tr_lt[s,w]]), risk_lt[s,w]);
	        vector[Tr_lt[s,w]] Utility2 = to_vector(hi_p_lt[s,w,:Tr_lt[s,w]]) .* pow(to_vector(hi_wide_lt[s,w,:Tr_lt[s,w]]), risk_lt[s,w]) + to_vector(low_p_lt[s,w,:Tr_lt[s,w]]) .* pow(to_vector(low_wide_lt[s,w,:Tr_lt[s,w]]), risk_lt[s,w]);
	                
	        choice_lt[s,w,:Tr_lt[s,w]] ~ bernoulli_logit(beta_lt[s,w]*(Utility1 - Utility2));
	               	                 
 	    }  
    }                                     
}  

Any idea about what might cause the identifiability issue will be very helpful!