Low MC error per chain, but no convergence of chains

Hi all,

Without getting too specific about my model (which is added at the end of this post) I am getting a result that I never had before. If I test-run my model on 1 chain, the MC error is very low, looking like it converges. However, when I do an actual run over multiple chains, the parameters do not converge. Looking at the traceplots, I can see that - indeed - per chain the MC error is very low again. Thus, this implies my model finds multiple solutions. See the attached picture;

These results thus imply that the results depend on my starting values. My question is thus very broad. What does this mean? Is it a lack of identification still? I don’t think throwing more iterations against it will help, right? Do you think this has to do with rotation? Even though I am not literally running a PCM, there are some similarities. And if so, would you now of an rotation example in stan (like Varimax) that I can easily implement.

Any help is welcome,

model_string = "
data {
 	int<lower=0>          N;       // number of observations
        int<lower=0>	      Nt;      // number of time periods
        int<lower=0>          Np;      // number of explanatory variables
        int<lower=0>          Nh;      // number of knots (only 1 layer)
        int<lower=0>          Ntest;   // size of testing dataset

	vector[N]             yVar;    // explained variable
	int<lower=0,upper=Nt> sell[N]; // period sold, sell = 1, ..., Nt
        matrix[N,Np]          x;       // explanatory variables

        vector[Ntest]         yTest;        // test y variable
        int<lower=0,upper=Nt> sTest[Ntest]; // test period sold    
        matrix[Ntest,Np]      xTest;        // testing dataset
transformed data {
        matrix [Np,Np] A = diag_matrix(rep_vector(1,Np));
        matrix [Np,Np-1] A_qr;
        for(i in 1:Np-1)
          A[Np,i]    = -1;
        A[Np,Np]     = 0;
        A_qr = qr_Q(A)[,1:(Np-1)];
parameters {
        vector[Nt-1]  innovMu;         // time series parameter
        real          fMu;             // first Mu, i.e. the constant.
	ordered[Nh]   beta;            // bias, CONSTRAINT #1 
        matrix<lower=-10,upper=10>[Np-1,Nh] omega_clean;   // weights in layer
        vector[Nh] lambda;             // estimates measurement Eq.
        real<lower=0,upper=2> sigMu;   // variance of innovations (mu)
        real<lower=0,upper=2> sigEps;  // RMSE of measurement Eq.
transformed parameters {
	vector[N]     yHat;
        vector[Nt]    mu;
	matrix[N,Nh]  h;
        matrix[Np,Nh] omega;   

        for(k in 1:Nh)
	  omega[,k]  = A_qr * omega_clean[,k];

        mu[1] = fMu;
        for(t in 2:Nt)
          mu[t] = mu[t-1] + innovMu[t-1]*sigMu;

	for(k in 1:Nh)
	  h[,k]   = x*omega[,k] + beta[k]; 
        yHat      = mu[sell] + (inv_logit(h)*100)*lambda; // inv_logit()*100; 
model {
	lambda            ~ normal(0,1);
	beta              ~ normal(0,1); 
        innovMu           ~ normal(0,0.1);
        fMu               ~ normal(0,10);
        sigEps            ~ normal(0,0.5);
        sigMu             ~ normal(0,0.5);
        for(k in 1:Nh)
	  sum(omega_clean[,k])  ~ normal(0,1/sqrt(1-inv(Np)));
	yVar                    ~ normal(yHat, sigEps);
generated quantities {
        vector[Ntest]    yPred;
  	matrix[Ntest,Nh] hTest;

        for(k in 1:Nh)
	  hTest[,k] = xTest*omega[,k] + beta[k]; 
        yPred       = yTest - (mu[sTest] + (inv_logit(hTest)*100)*lambda);
1 Like

Just a quick reply:

Sounds like the chains may get stuck in different modes. I think if the lp__ are similar this may indicate a “lack of identification”, if they are vastly different then this may not be the main issue.

@martinmodrak wrote something (which I have not read), maybe this helps: Identifying non-identifiability


Ha thanks,

actually, that post you recommended shows the exact same issue I have, plus some tips. I’ll delve into it.
Many thanks.

1 Like

There is discussion of potentially similar case with some links at Constraining Latent Factor model / baysian probabalisic matrix factorization to remove multimodality But AFAIK the problem there ended up unresolved (although post processing the samples with procrustes or something similar might work). The model you shared is a bit too complex for me to understand quickly, so cannot really comment in-depth.

1 Like

(Markov chain) Monte Carlo error is well-defined only if the Markov transition enjoys a central limit theorem, in which case multiple chains should quickly mix into the same equilibrium behavior. When a diagnostic like Rhat indicates non-equilibrium behavior like this it doesn’t make sense to talk about estimator error at all. For more see Markov Chain Monte Carlo in Practice.

Running longer Markov chains definitely won’t help. Exactly why Stan is getting stuck will depend on the specific degeneracies exhibited by your model. For some strategies see Identity Crisis.