Stan got stuck when sampling in chains

Please share your Stan program and accompanying data if possible.


When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use

  data {

    // Gaussian model
    int W;                                                          // Number of weeks (typically 12)
    int N;                                                          // Number of subjects
    int Xdim;                                                       // Dimension of X - latent low dimensional structure of the phenotype

    // Exogenous variables
    int exo_q_num;                                                  // number of exogenous survey questions
    real U[N,W,exo_q_num];                                       // exogenous survey questions - missing weeks were linearly interpolated outside of Stan 

    // Intertemporal choice
    int<lower=0> idx_itc_obs[N,W];                            // Indices of weeks WITH data
    int<lower=1> P_itc;                                             // number of parameters
    int<lower=0> T_max_itc;                                         // Max number of trials across subjects across weeks
    int<lower=0> Tr_itc[N, W];                              // Number of trials for each subj for each week
    real<lower=0> delay_later[N, W, T_max_itc];             // Delay later - subj x weeks x trials
    real<lower=0> amount_later[N, W, T_max_itc];            // Amount later - subj x weeks x trials      
    real<lower=0> amount_sooner[N, W, T_max_itc];           // Amount sooner - subj x weeks x trials
    int<lower=-1, upper=1> choice_itc[N, W, T_max_itc];     // choice itc - 0 for instant reward, 1 for delayed reward - subj x weeks x trials
    
    // reward over the tasks
    real reward[N, W];

}

parameters {
    real itc_k_pr[N,W];                  // itc
    real itc_beta_pr[N,W];               // itc
    matrix[2,Xdim] C;
    matrix[N, Xdim] X1;
    real<lower=0, upper=1> eta[N]; //eta
}

transformed parameters {
    real<lower=0> itc_k[N,W];                // change for the sake of consistency  - k_itc
    real<lower=0> itc_beta[N,W];

    itc_k = exp(itc_k_pr);                        // itc    
    itc_beta = exp(itc_beta_pr);                // itc
}

model {
       
    to_vector(C) ~ normal(0, 0.05);
    to_vector(X1) ~ normal(0, 0.1);
    eta ~ normal(0.5, 0.1);// prior for eta
    real X[N,W,Xdim];
    real itc_grad_sum[N, W, 2];
    
     for (s in 1:2) {
        real reward_sum = 0;                
        for (w in 1:5) {
            reward_sum += reward[s, w];
            matrix[2, Tr_itc[s,w]] itc_grad ;
            
            if (w == 1) {
                X[s,w,] = to_array_1d(inv_logit(X1[s,]));
            } else {

                vector[Xdim] a = eta[s].*(reward[s, w] - reward_sum/w)* C' * to_vector(itc_grad_sum[s, w-1]);
                vector[Xdim] b = to_vector(X[s,w-1,]);
                X[s,w,] = to_array_1d( b + a); // 68
    
  
            }  
            itc_k_pr[s,w] ~    normal(C[1,] * to_vector(X[s,w,]),1);
            itc_beta_pr[s,w] ~ normal(C[2,] * to_vector(X[s,w,]),1);  
            
            vector[Tr_itc[s,w]] ev_later   = to_vector(amount_later[s,w,:Tr_itc[s,w]])  ./ (1 + itc_k[s,w] * to_vector(delay_later[s,w,:Tr_itc[s,w]])/7);
 	        vector[Tr_itc[s,w]] ev_sooner  = to_vector(amount_sooner[s,w,:Tr_itc[s,w]]);
 	        choice_itc[s,w,:Tr_itc[s,w]] ~ bernoulli_logit(itc_beta[s,w] * (ev_later - ev_sooner)); 
            
            vector[Tr_itc[s,w]] itc_p = 1 ./ (1 + exp(- itc_beta[s,w] * (ev_later - ev_sooner)));
            vector[Tr_itc[s,w]] log_likelihood_p_grad = to_vector(choice_itc[s,w,:Tr_itc[s,w]]) ./ itc_p - (1 - to_vector(choice_itc[s,w,:Tr_itc[s,w]])) ./ (1 - itc_p );
            
            itc_grad[1,]= to_row_vector(log_likelihood_p_grad .* itc_p .* (1 - itc_p) * itc_beta[s,w] * exp(itc_k_pr[s,w]) .* (- ev_later .* to_vector(delay_later[s,w,:Tr_itc[s,w]])/7 ./ (1 + itc_k[s,w] * to_vector(delay_later[s,w,:Tr_itc[s,w]])/7))); // grad of k
            itc_grad[2,] = to_row_vector(log_likelihood_p_grad .* itc_p .* (1 - itc_p)*exp(itc_beta_pr[s, w]) .* (ev_later - ev_sooner)); // grad of beta

            vector[Tr_itc[s,w]] ones_rep = to_vector(rep_array(1, Tr_itc[s,w]));
            itc_grad_sum[s, w, ] = to_array_1d(itc_grad * ones_rep);
            
 	    }  
    }                                     
} 


I tried to fit this model but it got stuck at chain 1 - done: 100/210. Everytime it got stuck here. Is there any idea what would cause it and what I could do to further investigate it? Sorry I am new to Stan and do not quite know the underground theories about it.

Does it simply hang, or does it crash? What OS are you on? Can you pull up a process monitor (“Activity monitor” on OS X, “Task Manager” on Windows, htop on unix) to see if the process is still using CPU? Also check whether you’re running out of RAM at that point in the sampling.

It simply hangs. I am using Google colab to run it. What should I do to check the RAM and CPU I used?

I’m not familiar with google colab; does that let you run parallel chains and is only one getting stuck? If it only permits you to run chains in serial and your first chain is getting stuck, then I wonder if there is a resource-usage limit to your colab account that you are hitting.

I tried using my local machine to run it. And it run parallel chains. Both of the chains are stuck at 100/210. So I think this maybe because the gradient is stuck at somewhere? But I am not sure how to check it.

I’ve never heard of sampling getting stuck like that. Can you post your data or a simulated data set that also gets stuck?

I have solved that problem! It is basically because there is nan appearing in the results. Thank you so much for your help!

1 Like

Hi Evangeline,
I am facing the same situation. How exactly did you address the nan issue? Did you have to program a code that if nan results are generated to skip the chain?
Thanks,
Richie_M