Divergent transitions in my Cognitive Latent Variable Model - Caused by identifiability issues?

blam3 · February 6, 2024, 6:55am

I am trying to fit a cognitive/computational model and then regress the parameter values from the cognitive model onto a dependent variable in a regression. In other words, I am trying to simultaneously estimate a cognitive model and a multiple linear regression, in which the parameters from the cognitive models become predictors in a regression. I am doing this with cmdstanr. However, when I try to fit the model, I get a high number of divergent transitions:

 Warning: 11310 of 16000 (71.0%) transitions ended with a divergence.
 See https://mc-stan.org/misc/warnings for details.

 All 6 chains finished successfully.
 Mean chain execution time: 967.8 seconds.
 Total execution time: 2177.3 seconds.

 Warning: 1911 of 12000 (16.0%) transitions ended with a divergence.
 See https://mc-stan.org/misc/warnings for details.

 Warning: 4908 of 12000 (41.0%) transitions hit the maximum treedepth limit of 10.
 See https://mc-stan.org/misc/warnings for details.

 Warning: 5 of 6 chains had an E-BFMI less than 0.2.
 See https://mc-stan.org/misc/warnings for details.

I have been reading about divergent transitions and before I try things like reparameterizing the model, I was wondering if anyone has other guesses about the causes of these divergent transitions? One concern I have is that the model is unidentified as I have not seen this type of model fit before in stan. I am sharing my stan code below and would appreciate any thoughts you may have. Thank you.

When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use

functions {
    real SV(real alpha, real beta, real p, real A) {
    return (p - ( (beta*A) / 2))*pow(1.875, alpha);
  }
}

// Note about reward amount being 1.875: Since WTP is scaled to be 
// between 0 and 1, I'm scaling the reward amount so that it's still 
// 1.875x greater than the max WTP option -> 16 WTP/30 R = 1 WTP/1.875 R

data {
  int<lower=1> N; // num of subjects
  // Predictors
  vector[N] ones; // 1s for intercept in design matrix
  vector[N] ADI;
  vector<lower=0, upper=1>[N] sex;

  // K (which is also a predictor)
  int<lower=1> Tr; // max number of trials
  array[N] int<lower=1, upper=Tr> Tsubj; // num of trials for each subject
  array[N,Tr] real<lower=0> Red_Amount; // subjective prob of winning
  array[N,Tr] real<lower=0> Blue_Amount; // subjective prob of losing
  array[N,Tr] real<lower=0> A;
  array[N,Tr] real<lower=-1, upper=1> WTP; // Willingness to Pay
  
  // Dependent Variable
  vector[N] ext;
}

transformed data {
  // p = subjective probability of wining
  array[N,Tr] real<lower=0> p;
  
  for (i in 1:N) {
    for (t in 1:(Tsubj[i])) {
      if (Red_Amount[i, t] + Blue_Amount[i, t] == 0) {
        p[i,t] = 0;
  }
      else {
        p[i,t] = Red_Amount[i, t] / (Red_Amount[i, t] + Blue_Amount[i, t]);
    }
  }
}

  // for (i in 1:N) {
  //   for (t in 1:(Tsubj[i])) {
  //     p[i,t] = Red_Amount[i, t] / (Red_Amount[i, t] + Blue_Amount[i, t] + 0.00001);
  //   }
  // }
}

parameters {
  // Regression parameters
  vector[3] alpha_regr;
  vector[3] beta_regr; // regression weights w/ ZS prior
  real log_sigma2_ext; // log of variance

  // Declare all parameters as vectors for vectorizing
  // Hyper(group)-parameters  
  vector[2] mu_pr;
  vector<lower=0>[2] sigma; // group level variance
  vector<lower=0>[N] WTP_sigma;

    
  // Subject-level raw parameters (for Matt trick)
  vector[N] z_alpha;
  vector[N] z_beta;
}

transformed parameters {
  // Transform subject-level raw parameters 
  vector[N] inter;
  
  // inverse logit transformation
  vector[N] alpha; // risk sens
  vector[N] beta; // ambiguity sens
  
  real<lower=0> sigma_ext;

  for (i in 1:N) {
    alpha[i]    = Phi_approx(mu_pr[1] + sigma[1] * z_alpha[i]) * 2;
    beta[i]     = mu_pr[2] + sigma[2] * z_beta[i];
  }
  
  // Compute interaction term
  for (i in 1:N) {
    inter[i] = ( ADI[i] - mean(ADI) ) * ( beta[i] - mean(beta) );
  }
  //  // Individual mean
  // for (i in 1:N) {
  //   mu_ext[i] = beta_0 + beta_lambda*lambda[i] + 
  //   + beta_ADI*ADI[i] + beta_lambdaxADI*inter[i];
  // }
  
  // Transform sigma_ext from log scale
  sigma_ext = sqrt(exp(log_sigma2_ext));
  matrix[N, 3] Z; // Design matrix for intercept + covariates.
  
  // Covariates in regression
  Z[1:N, 1] = ones; // intercept
  Z[1:N, 2] = alpha; // risk-taking 
  Z[1:N, 3] = sex;
  
  matrix[N, 3] X;
  matrix[3, 3] XtX_inverse;
  cov_matrix[3] Sigma0;
  
  X[1:N, 1] = ADI;
  X[1:N, 2] = beta; // ambiguity sensitivity
  X[1:N, 3] = inter;
  
  // Relevant matrices
  XtX_inverse = inverse_spd(X'*X);
  
  // Prior covariance matrix of beta (given sigma_ext)
  Sigma0 = N*(sigma_ext^2)*XtX_inverse;
}

model {
  
  // hyper-parameters
  mu_pr ~ normal(0,1);
  sigma ~ normal(0,5);
  
  // Cognitive model priors
  z_alpha ~ normal(0,1);
  z_beta  ~ normal(0, 1);
  WTP_sigma ~ student_t(3, 0, 1);

  // Regression priors
  sigma_ext ~ student_t(3, 0, 1);

  // Psychological model  
  for (i in 1:N) {
      // Define values
      real mu_WTP;
      
    for (t in 1:(Tsubj[i])) {
      mu_WTP   = SV(alpha[i], beta[i], p[i,t], A[i,t]);
      WTP[i,t] ~ normal(mu_WTP, WTP_sigma[i]);
    }
  }
  
// Regression priors
  
  // Recall that uniform priors (on alphas) don't need to be specified.
  // Jeffreys prior on sigma2_ext <-> uniform prior on log_sigma2_ext
  // uniform (Jeffreys) prior on alpha (intercept)
  // Zellner-Siow prior on other regression parameters (multivariate Cauchy = multivariate t with df = 1)
  beta_regr ~ multi_student_t(1, [0, 0, 0], Sigma0);
  alpha_regr ~ uniform(-4,4);
  
  // Regression
  ext ~ normal(Z*alpha_regr + X*beta_regr, sigma_ext);

}

generated quantities {

  vector[N] mu_ext;
  vector[N] log_lik;
  
  log_lik = rep_vector(0, N);
  
  { // Local section
    
    for (i in 1:N) {
      // Getting conditional mean for PPC
      mu_ext[i] = Z[i, ]*alpha_regr + X[i, ]*beta_regr;
      
      // Log-likelihood for regression
      log_lik[i] = normal_lpdf(ext[i] | mu_ext[i], sigma_ext);
    }
  }
    // Generating predictions for posterior predictive check
    array[N] real pred_ext = normal_rng(mu_ext, sigma_ext);
}

}

Here is more information about my model:

Levy_clvm_fit <- Levy_clvm_mod$sample(
  data = data_list, 
  seed = 905, 
  chains = 8, 
  parallel_chains = 4,
  iter_warmup = 3000,
  iter_sampling = 2000,
  adapt_delta = 0.99,
  refresh = 10 # print update every 10 iters
)

I am using Jeffrey-Zellner-Siow priors. My cognitive model is from Levy et al. (2010) and I am regressing the “ambiguity aversion” parameter onto a dependent variable called externalizing. I appreciate any advice you might have to offer. Thanks!

stijn · February 6, 2024, 8:21am

  vector[3] alpha_regr;

It’s likely not the only thing, but these two lines together can cause problems. It’s better to specify

  vector<lower=-4, upper=4>[3] alpha_regr;

which makes the first statement obsolete.

Topic		Replies	Views
Divergent transitions Modeling	12	939	July 17, 2019
Latent variables, divergent transitions and prior selection Modeling rstan , techniques , fitting-issues	22	1426	September 19, 2021
Divergent transitions after warmup Modeling rstan , fitting-issues , performance	1	628	February 21, 2024
RStan divergence issue - assistance with experimental research Modeling rstan , fitting-issues	6	110	May 21, 2025
Memory retention case study: modeling full individual differences RStan cognitive-science	1	554	January 13, 2020

Divergent transitions in my Cognitive Latent Variable Model - Caused by identifiability issues?

Related topics