I am trying to fit a cognitive/computational model and then regress the parameter values from the cognitive model onto a dependent variable in a regression. In other words, I am trying to simultaneously estimate a cognitive model and a multiple linear regression, in which the parameters from the cognitive models become predictors in a regression. I am doing this with cmdstanr. However, when I try to fit the model, I get a high number of divergent transitions:
Warning: 11310 of 16000 (71.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.
All 6 chains finished successfully.
Mean chain execution time: 967.8 seconds.
Total execution time: 2177.3 seconds.
Warning: 1911 of 12000 (16.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.
Warning: 4908 of 12000 (41.0%) transitions hit the maximum treedepth limit of 10.
See https://mc-stan.org/misc/warnings for details.
Warning: 5 of 6 chains had an E-BFMI less than 0.2.
See https://mc-stan.org/misc/warnings for details.
I have been reading about divergent transitions and before I try things like reparameterizing the model, I was wondering if anyone has other guesses about the causes of these divergent transitions? One concern I have is that the model is unidentified as I have not seen this type of model fit before in stan. I am sharing my stan code below and would appreciate any thoughts you may have. Thank you.
When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use
functions {
real SV(real alpha, real beta, real p, real A) {
return (p - ( (beta*A) / 2))*pow(1.875, alpha);
}
}
// Note about reward amount being 1.875: Since WTP is scaled to be
// between 0 and 1, I'm scaling the reward amount so that it's still
// 1.875x greater than the max WTP option -> 16 WTP/30 R = 1 WTP/1.875 R
data {
int<lower=1> N; // num of subjects
// Predictors
vector[N] ones; // 1s for intercept in design matrix
vector[N] ADI;
vector<lower=0, upper=1>[N] sex;
// K (which is also a predictor)
int<lower=1> Tr; // max number of trials
array[N] int<lower=1, upper=Tr> Tsubj; // num of trials for each subject
array[N,Tr] real<lower=0> Red_Amount; // subjective prob of winning
array[N,Tr] real<lower=0> Blue_Amount; // subjective prob of losing
array[N,Tr] real<lower=0> A;
array[N,Tr] real<lower=-1, upper=1> WTP; // Willingness to Pay
// Dependent Variable
vector[N] ext;
}
transformed data {
// p = subjective probability of wining
array[N,Tr] real<lower=0> p;
for (i in 1:N) {
for (t in 1:(Tsubj[i])) {
if (Red_Amount[i, t] + Blue_Amount[i, t] == 0) {
p[i,t] = 0;
}
else {
p[i,t] = Red_Amount[i, t] / (Red_Amount[i, t] + Blue_Amount[i, t]);
}
}
}
// for (i in 1:N) {
// for (t in 1:(Tsubj[i])) {
// p[i,t] = Red_Amount[i, t] / (Red_Amount[i, t] + Blue_Amount[i, t] + 0.00001);
// }
// }
}
parameters {
// Regression parameters
vector[3] alpha_regr;
vector[3] beta_regr; // regression weights w/ ZS prior
real log_sigma2_ext; // log of variance
// Declare all parameters as vectors for vectorizing
// Hyper(group)-parameters
vector[2] mu_pr;
vector<lower=0>[2] sigma; // group level variance
vector<lower=0>[N] WTP_sigma;
// Subject-level raw parameters (for Matt trick)
vector[N] z_alpha;
vector[N] z_beta;
}
transformed parameters {
// Transform subject-level raw parameters
vector[N] inter;
// inverse logit transformation
vector[N] alpha; // risk sens
vector[N] beta; // ambiguity sens
real<lower=0> sigma_ext;
for (i in 1:N) {
alpha[i] = Phi_approx(mu_pr[1] + sigma[1] * z_alpha[i]) * 2;
beta[i] = mu_pr[2] + sigma[2] * z_beta[i];
}
// Compute interaction term
for (i in 1:N) {
inter[i] = ( ADI[i] - mean(ADI) ) * ( beta[i] - mean(beta) );
}
// // Individual mean
// for (i in 1:N) {
// mu_ext[i] = beta_0 + beta_lambda*lambda[i] +
// + beta_ADI*ADI[i] + beta_lambdaxADI*inter[i];
// }
// Transform sigma_ext from log scale
sigma_ext = sqrt(exp(log_sigma2_ext));
matrix[N, 3] Z; // Design matrix for intercept + covariates.
// Covariates in regression
Z[1:N, 1] = ones; // intercept
Z[1:N, 2] = alpha; // risk-taking
Z[1:N, 3] = sex;
matrix[N, 3] X;
matrix[3, 3] XtX_inverse;
cov_matrix[3] Sigma0;
X[1:N, 1] = ADI;
X[1:N, 2] = beta; // ambiguity sensitivity
X[1:N, 3] = inter;
// Relevant matrices
XtX_inverse = inverse_spd(X'*X);
// Prior covariance matrix of beta (given sigma_ext)
Sigma0 = N*(sigma_ext^2)*XtX_inverse;
}
model {
// hyper-parameters
mu_pr ~ normal(0,1);
sigma ~ normal(0,5);
// Cognitive model priors
z_alpha ~ normal(0,1);
z_beta ~ normal(0, 1);
WTP_sigma ~ student_t(3, 0, 1);
// Regression priors
sigma_ext ~ student_t(3, 0, 1);
// Psychological model
for (i in 1:N) {
// Define values
real mu_WTP;
for (t in 1:(Tsubj[i])) {
mu_WTP = SV(alpha[i], beta[i], p[i,t], A[i,t]);
WTP[i,t] ~ normal(mu_WTP, WTP_sigma[i]);
}
}
// Regression priors
// Recall that uniform priors (on alphas) don't need to be specified.
// Jeffreys prior on sigma2_ext <-> uniform prior on log_sigma2_ext
// uniform (Jeffreys) prior on alpha (intercept)
// Zellner-Siow prior on other regression parameters (multivariate Cauchy = multivariate t with df = 1)
beta_regr ~ multi_student_t(1, [0, 0, 0], Sigma0);
alpha_regr ~ uniform(-4,4);
// Regression
ext ~ normal(Z*alpha_regr + X*beta_regr, sigma_ext);
}
generated quantities {
vector[N] mu_ext;
vector[N] log_lik;
log_lik = rep_vector(0, N);
{ // Local section
for (i in 1:N) {
// Getting conditional mean for PPC
mu_ext[i] = Z[i, ]*alpha_regr + X[i, ]*beta_regr;
// Log-likelihood for regression
log_lik[i] = normal_lpdf(ext[i] | mu_ext[i], sigma_ext);
}
}
// Generating predictions for posterior predictive check
array[N] real pred_ext = normal_rng(mu_ext, sigma_ext);
}
}
Here is more information about my model:
Levy_clvm_fit <- Levy_clvm_mod$sample(
data = data_list,
seed = 905,
chains = 8,
parallel_chains = 4,
iter_warmup = 3000,
iter_sampling = 2000,
adapt_delta = 0.99,
refresh = 10 # print update every 10 iters
)
I am using Jeffrey-Zellner-Siow priors. My cognitive model is from Levy et al. (2010) and I am regressing the “ambiguity aversion” parameter onto a dependent variable called externalizing. I appreciate any advice you might have to offer. Thanks!