Hi!
It took me while to really get through the code but I think I got an idea of how that works.
(For reference, this paper helped me to understand how the covariance matrix is set up using the cholesky factor matrix)
I adapted the code or my model (see code below) skipping the subject level variables for now (note that I changed *visit*
for *cond*
for “condition”).
However, running this model I get warnings that I’ll have to get solved:
# 1: There were 9 divergent transitions after warmup. See
# http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
# to find out why this is a problem and how to eliminate them.
# 2: Examine the pairs() plot to diagnose sampling problems
#
# 3: The largest R-hat is NA, indicating chains have not mixed.
# Running the chains for more iterations may help. See
# http://mc-stan.org/misc/warnings.html#r-hat
# 4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
# Running the chains for more iterations may help. See
# http://mc-stan.org/misc/warnings.html#bulk-ess
# 5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
# Running the chains for more iterations may help. See
# http://mc-stan.org/misc/warnings.html#tail-ess
I ran the model with 4000 iterations (warmup: 1000). I could try to increase the number of iterations as suggested by the warning message but I’ll need to transfer this to our Linux server as running on my own computer took +13 hours…
Given the large number of parameters in the model I am quite confused as to what I could look at in the pairs()
plot.
I also wonder if I still have any obvious mistakes in the model specification and/or if there are other things I could try to optimize.
I have set no initial values for the parameters. Could it help to use mean values from a vb
model as initial values?
I’m also not sure about how to set the priors. In my simple (one session) model I set priors on the SD of the parameters to somewhat limit their range (e.g. simga[4] ~ cauchy(0, 1)
for the R
parameter to get values approximately in the range of [-30, 30]
. Where would I apply such priors in this model? Would that be the *_cond_s
parameters, i.e. the SD of the condition effect?
Did I understand correctly that the *_subj_s ~ student_t(2,0,2)
refers to a t distribution with nC-1
degrees of freedom (i.e. (number of conditions - 1) df) for subject specific SD?
I also have troubles interpreting the scale of the parameters. My first thought was that the *_m
parameters would correspond to the mu_*
parameters of my simple model in the placebo condition. However, comparing the summary statistics of these parameters show completely different values:
Mixed model:
> as.data.frame(summary(fit2, par = paste(params, '_m'), probs = c(0.025, 0.975))$summary)
mean se_mean sd 2.5% 97.5% n_eff Rhat
Arew_m -1.2689189 0.5469930 1.1553174 -3.13506823 1.170741 4.461070 1.336139
Apun_m -1.3192008 0.4809529 1.1011440 -3.06025171 1.120082 5.241831 1.261846
R_m 0.1496364 0.4710745 1.0429759 -1.77837267 2.170962 4.901956 1.290238
P_m 0.2059749 0.5774399 1.1430911 -1.92587057 2.323436 3.918755 1.421119
tau_m 0.5416581 0.1075133 0.4205064 0.02273234 1.578753 15.297513 1.080292
Simple model:
> as.data.frame(summary(rw5pI.plac$fit, par = paste('mu_', params), probs = c(0.025, 0.975))$summary)
mean se_mean sd 2.5% 97.5% n_eff Rhat
mu_Arew 0.29181257 0.008509647 0.23279699 3.099822e-02 0.9205625 748.3963 1.004831
mu_Apun 0.03047293 0.003672411 0.08013723 2.756524e-04 0.1235017 476.1744 1.003805
mu_tau 1.32378567 0.021963432 1.05626779 2.348340e-01 4.3081507 2312.8511 1.000700
mu_P 14.98569057 0.504753426 24.23932814 -3.370878e+01 64.3443769 2306.1237 1.000911
mu_R 3.34486372 0.591496086 27.32114081 -5.104459e+01 56.3401627 2133.5062 1.002821
I have not yet tried to run parameter recovery or other posterior predicitve checks. Maybe I’m still on a completely wrong path
Mixed model code
data {
int<lower=1> nS; // number of subjects
int<lower=1> nT_max; // max number of trials
int<lower=1> nC; // number of conditions
int<lower=0, upper=nT_max> nT_subj[nS, nC]; // subjects x n_trial
int<lower=-1, upper=2> choice[nS, nT_max, nC]; // subjects x choices x conditions
real outcome[nS, nT_max, nC]; // subjects x outcomes x conditions
real missing_cond[nS, nC]; // subjects x conditions (1=missing, 0=data available)
int kC; // number of contrasts on conditions
real cond_vars[nS, nC, kC]; // condition level matrix (contrasts on conditions)
real R_scale; // scale parameter of cauchy distribution used to scale sigma (range) of R parameter
real P_scale; // scale parameter of cauchy distribution used to scale sigma (range) of P parameter
}
transformed data {
vector[2] initV; // initial values for EV
initV = rep_vector(0.0, 2);
}
parameters {
// Declare all parameters as vectors for vectorizing
// Hyper(group)-parameters
// group level means (intercept for Placebo/control condition)
real Arew_m; // reward learning rate
real Apun_m; // punishment learning rate
real R_m; // reward sensitivity
real P_m; // reward sensitivity
real<lower=0> tau_m; // inverse temperature
// condition level effects (change of parameter estimate with treatment/condition)
vector[kC] Arew_cond_grp_m;
vector[kC] Apun_cond_grp_m;
vector[kC] R_cond_grp_m;
vector[kC] P_cond_grp_m;
vector[kC] tau_cond_grp_m;
// condition-level (within subject) SDs (sigma2_y) (overall SD for intercept?)
real<lower=0> Arew_cond_s;
real<lower=0> Apun_cond_s;
real<lower=0> R_cond_s;
real<lower=0> P_cond_s;
real<lower=0> tau_cond_s;
// SDs of condition-level effects (SD of treatment effect over participants)
// will be transformed with subjects' raw parameter to yield subject specific covarying deviation per condition
vector<lower=0>[kC+1] Arew_subj_s;
vector<lower=0>[kC+1] Apun_subj_s;
vector<lower=0>[kC+1] R_subj_s;
vector<lower=0>[kC+1] P_subj_s;
vector<lower=0>[kC+1] tau_subj_s;
// non-centered parameterization (NCP) variance effect per cond & subject
matrix[nS,nC] Arew_cond_raw;
matrix[nS,nC] Apun_cond_raw;
matrix[nS,nC] R_cond_raw;
matrix[nS,nC] P_cond_raw;
matrix[nS,nC] tau_cond_raw;
// NCP variance effect on subj-level effects
matrix[kC+1,nS] Arew_subj_raw;
matrix[kC+1,nS] Apun_subj_raw;
matrix[kC+1,nS] R_subj_raw;
matrix[kC+1,nS] P_subj_raw;
matrix[kC+1,nS] tau_subj_raw;
// Cholesky factors of correlation matrices for subj-level variances
cholesky_factor_corr[kC+1] Arew_subj_L;
cholesky_factor_corr[kC+1] Apun_subj_L;
cholesky_factor_corr[kC+1] R_subj_L;
cholesky_factor_corr[kC+1] P_subj_L;
cholesky_factor_corr[kC+1] tau_subj_L;
}
transformed parameters {
// transformed parameters
matrix<lower=0,upper=1>[nS, nC] Arew; // reward learning rate
matrix<lower=0,upper=1>[nS, nC] Apun; // punishment learning rate
matrix[nS, nC] R; // reward sensitivity
matrix[nS, nC] P; // reward sensitivity
matrix[nS, nC] tau; // inverse temperature
// additional inv_logit transform for learning rate to restrict to [0, 1]
matrix[nS, nC] Arew_normal; // reward learning rate
matrix[nS, nC] Apun_normal; // punishment learning rate
// Convert Cholesky factorized correlation matrix into SDs per visit-level effect
// creates matrix of nS rows and kC+1 correlated random variables
// (*_subj_raw defined as random variable by to_vector(*_subj_raw[,s]) ~ normal(0, 1); statements in model block)
matrix[nS,kC+1] Arew_vars = (diag_pre_multiply(Arew_subj_s, Arew_subj_L) * Arew_subj_raw)';
matrix[nS,kC+1] Apun_vars = (diag_pre_multiply(Apun_subj_s, Apun_subj_L) * Apun_subj_raw)';
matrix[nS,kC+1] R_vars = (diag_pre_multiply(R_subj_s, R_subj_L) * R_subj_raw)';
matrix[nS,kC+1] P_vars = (diag_pre_multiply(P_subj_s, P_subj_L) * P_subj_raw)';
matrix[nS,kC+1] tau_vars = (diag_pre_multiply(tau_subj_s, tau_subj_L) * tau_subj_raw)';
//create transformed parameters using non-centered parameterization for all model parameters
// add in subject and visit-level effects as shifts in means
for (s in 1:nS) {
// transformed parameter for each subject equals
// overall mean + // group level mean across conditions
// SD[cond]*raw[cond] + // group level SD * condition offset of each subject (variation of subjects around group mean)
// SD[subject]*raw[subject] // single value reflecting subject specific offset across conditions (i.e. for Placebo in case of it being the control condition)
Arew_normal[s,] = Arew_m + Arew_cond_s * Arew_cond_raw[s,] + Arew_vars[s,1];
Apun_normal[s,] = Apun_m + Apun_cond_s * Apun_cond_raw[s,] + Apun_vars[s,1];
R[s,] = R_m + R_cond_s * R_cond_raw[s,] + R_vars[s,1];
P[s,] = P_m + P_cond_s * P_cond_raw[s,] + P_vars[s,1];
tau[s,] = tau_m + tau_cond_s * tau_cond_raw[s,] + tau_vars[s,1];
//add subj- and visit-level effects
for (v in 1:nC) {
if (missing_cond[s,v]==0) { // no effect if condition is missing
// loop over contrasts
for (kk in 1:kC) {
// main effects of visit-level variables
// for each contrast add
// contrast specific offset + // effect of treatment (change in parameter value with treatment)
// subject specific offset // that was defined to be correlated with general subject specific offset (see above)
Arew_normal[s,v] += cond_vars[s,v,kk] * (Arew_cond_grp_m[kk] + Arew_vars[s,kk+1]);
Apun_normal[s,v] += cond_vars[s,v,kk] * (Apun_cond_grp_m[kk] + Apun_vars[s,kk+1]);
R[s,v] += cond_vars[s,v,kk] * (R_cond_grp_m[kk] + R_vars[s,kk+1]);
P[s,v] += cond_vars[s,v,kk] * (P_cond_grp_m[kk] + P_vars[s,kk+1]);
tau[s,v] += cond_vars[s,v,kk] * (tau_cond_grp_m[kk] + tau_vars[s,kk+1]);
} // end of contrast loop
} // end no effect if condition is missing
} // end of condition loop
//transform alpha to [0,1]
Arew[s,] = inv_logit(Arew_normal[s,]);
Apun[s,] = inv_logit(Apun_normal[s,]);
} // end of subject loop
}
model {
//define priors
// group level mean across conditions
Arew_m ~ normal(0, 1);
Apun_m ~ normal(0, 1);
R_m ~ normal(0, 1);
P_m ~ normal(0, 1);
tau_m ~ normal(0, 1);
// condition specific effect
Arew_cond_grp_m ~ normal(0, 1);
Apun_cond_grp_m ~ normal(0, 1);
R_cond_grp_m ~ normal(0, 1);
P_cond_grp_m ~ normal(0, 1);
tau_cond_grp_m ~ normal(0, 1);
Arew_cond_s ~ cauchy(0, 2);
Apun_cond_s ~ cauchy(0, 2);
R_cond_s ~ cauchy(0, 2);
P_cond_s ~ cauchy(0, 2);
tau_cond_s ~ cauchy(0, 2);
for (s in 1:nS) {
// distribution of subjects around group level mean
Arew_cond_raw[s,] ~ normal(0, 1);
Apun_cond_raw[s,] ~ normal(0, 1);
R_cond_raw[s,] ~ normal(0, 1);
P_cond_raw[s,] ~ normal(0, 1);
tau_cond_raw[s,] ~ normal(0, 1);
// distribution of of subject specific/random terms around subject mean
// (before transformation to correlated subject specific SD of condition effect)
to_vector(Arew_subj_raw[,s]) ~ normal(0, 1);
to_vector(Apun_subj_raw[,s]) ~ normal(0, 1);
to_vector(R_subj_raw[,s]) ~ normal(0, 1);
to_vector(P_subj_raw[,s]) ~ normal(0, 1);
to_vector(tau_subj_raw[,s]) ~ normal(0, 1);
}
Arew_subj_s ~ student_t(2,0,2); // nu is df, so number of conditions (nC) -1 ??? or rather (kC+1) - 1?
Apun_subj_s ~ student_t(2,0,2);
R_subj_s ~ student_t(2,0,3);
P_subj_s ~ student_t(2,0,3);
tau_subj_s ~ student_t(2,0,2);
Arew_subj_L ~ lkj_corr_cholesky(1);
Apun_subj_L ~ lkj_corr_cholesky(1);
R_subj_L ~ lkj_corr_cholesky(1);
P_subj_L ~ lkj_corr_cholesky(1);
tau_subj_L ~ lkj_corr_cholesky(1);
// subject loop and trial loop
for (i in 1:nS) {
int n_trials;
// condition loop
for (j in 1:nC) {
vector[2] ev; // expected value
real PE; // prediction error
real alpha; // effective learning rate (Arew or Apun, respectively)
real sensitivity;// effective sensitivity (R or P, respectively)
ev = initV;
n_trials = nT_subj[i, j];
if (n_trials <= 0) continue;
// trial loop
for (t in 1:n_trials) {
// compute action probabilities
choice[i, t, j] ~ categorical_logit(tau[i, j] * ev);
// prediction error
sensitivity = (outcome[i, t, j] == 1 ? R[i, j] : P[i, j]);
PE = (outcome[i, t, j] * sensitivity) - ev[choice[i, t, j]];
// value updating (learning)
alpha = (PE >= 0) ? Arew[i, j] : Apun[i, j];
ev[choice[i, t, j]] += alpha * PE;
}
}
}
}
generated quantities {
// For log likelihood calculation
real log_lik[nS, nC];
// correlation matrices for coefficients (transforms cholesky factor matrices back to correlation matrices of random/subject effects)
corr_matrix[kC+1] Arew_cor = multiply_lower_tri_self_transpose(Arew_subj_L);
corr_matrix[kC+1] Apun_cor = multiply_lower_tri_self_transpose(Apun_subj_L);
corr_matrix[kC+1] R_cor = multiply_lower_tri_self_transpose(R_subj_L);
corr_matrix[kC+1] P_cor = multiply_lower_tri_self_transpose(P_subj_L);
corr_matrix[kC+1] tau_cor = multiply_lower_tri_self_transpose(tau_subj_L);
// For posterior predictive check
real y_pred[nS, nC, nT_max];
real PE_pred[nS, nC, nT_max];
real ev_pred[nS, nC, nT_max, 2];
// Set all posterior predictions to 0 (avoids nSULL values)
for (i in 1:nS) {
for (j in 1:nC){
log_lik[i, j] = -1;
for (t in 1:nT_max) {
y_pred[i, j, t] = -1;
PE_pred[i, j, t] = -1;
ev_pred[i, j, t, 1] = -1;
ev_pred[i, j, t, 2] = -1;
}
}
}
{ // local section, this saves time and space
// subject loop
for (i in 1:nS) {
// condition loop
for (j in 1:nC) {
vector[2] ev; // expected value
real PE; // prediction error
int co;
real alpha;
real sensitivity;
int n_trials;
n_trials = nT_subj[i, j];
if (n_trials <= 0) continue;
// Initialize values
ev = initV;
ev_pred[i, j, 1, 1] = 0;
ev_pred[i, j, 1, 2] = 0;
log_lik[i, j] = 0;
// trial loop
for (t in 1:n_trials) {
// get choice of current trial (either 1 or 2)
co = choice[i, t, j];
// compute log likelihood of current trial
log_lik[i, j] += categorical_logit_lpmf(choice[i, t, j] | tau[i, j] * ev);
// generate posterior prediction for current trial
y_pred[i, j, t] = categorical_rng(softmax(tau[i, j] * ev));
// prediction error
sensitivity = (outcome[i, t, j] == 1 ? R[i, j] : P[i, j]);
PE = (outcome[i, t, j] * sensitivity) - ev[choice[i, t, j]];
PE_pred[i, j, t] = PE;
// value updating (learning)
alpha = (PE >= 0) ? Arew[i, j] : Apun[i, j];
ev[co] += alpha * PE;
if (t > 1) {
ev_pred[i, j, t] = ev_pred[i, j, t-1];
}
ev_pred[i, j, t, co] += alpha * PE;
}
}
}
}
}