Divergent transitions after warmup

Hi there! I got some warning messages, and I have no idea how to deal with them. I have already changed adapted_delta to 0.90, but there are still some divergent transitions. Is it means my model is not good? And I’m not sure whether this model is still possible to do model comparison with other models. Really appreciate it if someone could help me!

Here are the warning messages:
1: There were 163 divergent transitions after warmup.
2: Examine the pairs() plot to diagnose sampling problems.
3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help.
4: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help.


My model is a hierarchical model with some reparameterization in codes:

data {
  int<lower=1> nSubjects;
  int<lower=96,upper=100> nTrials[nSubjects];
  int<lower=0,upper=2> choice_green[nSubjects,100];
  int<lower=0,upper=2> advice_follow[nSubjects,100];
  int<lower=-1,upper=1> choice_acc[nSubjects,100];
  int<lower=-1,upper=1> advice_acc[nSubjects,100];
}

transformed data {
  vector[2] initv1;
  initv1 = rep_vector(0.0,2);
  vector[2] initv2;
  initv2 = rep_vector(0.0,2);
}

parameters {
  // group-Level parameters
  // own experience
  real lr1_mu_raw;
  real tau1_mu_raw;
  real<lower=0> lr1_sd_raw;
  real<lower=0> tau1_sd_raw;
  // advice following experience
  real lr2_mu_raw;
  real tau2_mu_raw;
  real<lower=0> lr2_sd_raw;
  real<lower=0> tau2_sd_raw;
  // weighting for belief of advice
  real omega_mu_raw;
  real<lower=0> omega_sd_raw;
  
  // subject-Level parameters
  // own experience
  vector[nSubjects] lr1_raw;
  vector[nSubjects] tau1_raw;
  // advice following experience
  vector[nSubjects] lr2_raw;
  vector[nSubjects] tau2_raw;
  
  // weighting for belief of advice
  vector[nSubjects] omega_raw;
}

transformed parameters{
  // group-level parameters
  vector<lower=0,upper=1>[nSubjects] lr1;
  vector<lower=0,upper=3>[nSubjects] tau1;
  vector<lower=0,upper=1>[nSubjects] lr2;
  vector<lower=0,upper=3>[nSubjects] tau2;
  
  vector<lower=0,upper=1>[nSubjects] omega;
  
  for (s in 1:nSubjects){
    lr1[s] = Phi_approx( lr1_mu_raw + lr1_sd_raw * lr1_raw[s] );
    tau1[s] = Phi_approx( tau1_mu_raw + tau1_sd_raw * tau1_raw[s] )*3;
    lr2[s] = Phi_approx( lr2_mu_raw + lr2_sd_raw * lr2_raw[s] );
    tau2[s] = Phi_approx( tau2_mu_raw + tau2_sd_raw * tau2_raw[s] )*3;
    
    omega[s] = Phi_approx( omega_mu_raw + omega_sd_raw * omega_raw[s] );
  }

}

// The model to be estimated. We model the output
// 'y' to be normally distributed with mean 'mu'
// and standard deviation 'sigma'.
model {
  lr1_mu_raw ~ normal(0,1);
  tau1_mu_raw ~ normal(0,1);
  lr1_sd_raw ~ cauchy(0,3);
  tau1_sd_raw ~ cauchy(0,3);
  
  
  lr1_raw ~ normal(0,1);
  tau1_raw ~ normal(0,1);
  
  lr2_mu_raw ~ normal(0,1);
  tau2_mu_raw ~ normal(0,1);
  lr2_sd_raw ~ cauchy(0,3);
  tau2_sd_raw ~ cauchy(0,3);
  
  lr2_raw ~ normal(0,1);
  tau2_raw ~ normal(0,1);
  
  omega_mu_raw ~ normal(0,1);
  omega_sd_raw ~ cauchy(0,3);
  omega_raw ~ normal(0,1);
  
  for (s in 1:nSubjects){
    vector[2] v1; // expectation of value for green card and blue card
    real pe1; // prediction error
    v1 = initv1;
    
    vector[2] v2; // expectation of value for following advice and not following advice
    real pe2; // prediction error
    v2 = initv2;
    
    for (t in 1:nTrials[s]){
      //combine own experience and advice-following experience
     choice_green[s,t] ~ categorical(omega[s]*softmax(tau2[s]*v2) + (1-omega[s])*softmax(tau1[s]*v1));
     
     // cognitive model
     //  own experience
     pe1 = choice_acc[s,t] - v1[choice_green[s,t]];
     v1[choice_green[s,t]] = v1[choice_green[s,t]] + lr1[s] * pe1;
     // advice following experience
     pe2 = advice_acc[s,t] - v2[advice_follow[s,t]];
     v2[advice_follow[s,t]] = v2[advice_follow[s,t]]+ lr2[s] * pe2;
    }
  }
}

generated quantities {
  real<lower=0,upper=1> lr1_mu;
  real<lower=0,upper=3> tau1_mu;
  real<lower=0,upper=1> lr2_mu;
  real<lower=0,upper=3> tau2_mu;
  
  real<lower=0,upper=1> omega_mu;
  
  real log_lik[nSubjects];
  
  lr1_mu = Phi_approx(lr1_mu_raw);
  tau1_mu = Phi_approx(tau1_mu_raw)*3;
  lr2_mu = Phi_approx(lr2_mu_raw);
  tau2_mu = Phi_approx(tau2_mu_raw)*3;
  
  omega_mu = Phi_approx(omega_mu_raw);
  
  {// local section, this saves time and space
    for (s in 1:nSubjects){
    vector[2] v1; // expectation of value for green card and blue card
    real pe1;
    v1 = initv1;
    
    vector[2] v2; // expectation of value for following advice and not following advice
    real pe2;
    v2 = initv2;
    
    log_lik[s] = 0;
    
    for (t in 1:nTrials[s]){
      log_lik[s] = log_lik[s] + categorical_lpmf(choice_green[s,t] | omega[s]*(softmax(tau2[s]*v2))+(1-omega[s])*(softmax(tau1[s]*v1)));
     
     // cognitive model
     //  own experience
     pe1 = choice_acc[s,t] - v1[choice_green[s,t]];
     v1[choice_green[s,t]] = v1[choice_green[s,t]] + lr1[s] * pe1;
     // advice following experience
     pe2 = advice_acc[s,t] - v2[advice_follow[s,t]];
     v2[advice_follow[s,t]] = v2[advice_follow[s,t]] + lr2[s] * pe2;
      }
    }
  }
}

Here are my other codes to run the model:

fit_rl1_3 <- stan(modelFile1,
                data   = dataList,
                chains = 4,
                iter   = 4000,
                warmup = floor(nIter/2),
                thin   = 1,
                init   = 'random',
                control=list(adapt_delta=0.90),
                seed   = 145015635)
print(fit_rl1_3)
1 Like

It means your model is hard for HMC to fit. That doesn’t necessarily mean there’s anything wrong with the model mathematically or conceptually.

This can just be a one-liner, and you don’t need the .0.

 vector[2] initv1 = rep_vector(0, 2);

It looks like you’re using Phi_approx. I’d suggest just using a logistic regression. The approx version already uses logistic tails and all that’s being changed is a bit of scale.

This is going to be very very hard to identify:

omega[s] * softmax(tau2[s] * v2) + (1 - omega[s]) * softmax(tau1[s] * v1))

Not only is the sum not identified, but the components like tau2[s] * v2 introduce multiplicative non-identifiability if there are parameters going into v2

I imagine this is mostly where the problems are coming from. Do you really want to perform a mixture on the parameter side here?

I’d strongly suggest indenting everything in a standard way so it’s easier to see the nesting structure.

You can simplify the code by programming non-centered parameterizations this way:

vector<offset=mu_omega, multiplier=sigma_omega> omega_full;
...
omega_full ~ normal(mu_omega, sigma_omega);

If you have multiple random effects, as you appear to do, it can help identifying them by using a sum-to-zero parameterization. Otherwise, you can run into all sorts of computational problems with slow convergence with only soft identifying through the prior.

I didn’t know what to call the variable because you actually define omega to be the normal cdf of what I called omega_full here.

Unless you’re expecting really extreme values, I’d suggest lighter tailed priors than Cauchy. We’ve found those can let the values wander too far.

1 Like