We’ve got this dataset where different participants did a learning and decision making task based on symbols chosen, rewards and effort for each symbol option chosen, and we’re trying to assess their behaviour by considering several parameters such as learning rate.
I’ve attached an example dataset below with the first 10 values of each variable for the first participant master_result - Copy.csv (981 Bytes)
So we’re attempting to run this simple reinforcement learning model using rstan.
data {
int ntr; // number of trials
int nsub; // total number of participants
int ParticipantID[ntr]; // which subject (schedule) which data point belongs to
real Effort_left[ntr]; //z-score all real variables
real Effort_right[ntr];
real Prob_left[ntr];
real Prob_right[ntr];
real Rew_left[ntr];
real Rew_right[ntr];
int Symbol_L[ntr];
int Symbol_R[ntr];
int Button[ntr];
int Trial[ntr];//trial number
}
// The parameters accepted by the model.
parameters {
real<lower=0> invTemp[nsub]; //inverse temperature describing noise for each participant
real<lower=0,upper=1> b_learningRate_reward[nsub];
real<lower=0,upper=1> b_learningRate_effort[nsub];
// Transformed version of data variables; we want to know how much
// each individual takes into account each of the above variables relative
// to one of the variables (so we don't get confounded by noise)
// Will help us identify individual differences:
real b_rewMag_pr[nsub];
real b_effortMag_pr[nsub];
}
transformed parameters {
real k[nsub]; // sum of all absolute regression weights
real b_rewMag[nsub];
real b_effortMag[nsub];
real b_probability[nsub]; //this is our reference variable against which other variables are transformed/weighed
for (is in 1:nsub){
k[is] = fabs(1) + fabs(b_rewMag_pr[is])+fabs(b_effortMag_pr[is]);
b_probability[is] = -1/k[is];
b_rewMag[is] = b_rewMag_pr[is]/k[is];
b_effortMag[is] = b_effortMag_pr[is]/k[is];
}
}
// The model to be estimated. We model the output
// 'y' to be normally distributed with mean 'mu'
// and standard deviation 'sigma'.
model {
//Here the model will learn: 1. reward of symbol1, reward of symbol2, effort of symbol1, effort symbol2
real predicted_reward_perSymbol[ntr,2];//matrix of size [ntr,c(symbol1, symbol2)]
real predicted_effort_perSymbol[ntr,2];//matrix of size [ntr,c(symbol1, symbol2)]
real prediction_error; //temporary variable we'll keep overwriting
real util_L[ntr];
real util_R[ntr];
real invTxUtil[ntr];//taking into account the invTemp
// Priors
invTemp ~ cauchy(0,10);//cauchy(0,2)
// b_learningRate_reward ~ cauchy(0.5,0.3);//center on 0.5 or don't give prior bc range already very constrained
// b_learningRate_effort ~ cauchy(0.5,0.3);//center on 0.5 or don't give prior bc range already very constrained
b_rewMag_pr ~ cauchy(0,2);
b_effortMag_pr ~ cauchy(0,2);
// Learning loop
//Calculate prediction error for Reward and Effort
for (itr in 1:ntr) {
if (Trial[itr]<max(Trial)) { // if wo 0 which is the mean
//Compute prediction error for left option are not on the last trial
if (Trial[itr]==1) { //if this is the first trial, reset the predicted reward
predicted_reward_perSymbol[itr,1] = 0; //set first reward to 0 which is the mean
predicted_reward_perSymbol[itr,2] = 0; //set first reward t
predicted_effort_perSymbol[itr,1] = 0; //set first effort to 0 which is the mean
predicted_effort_perSymbol[itr,2] = 0; //set first effort t
}
//Compute prediction error for Left Option
if (Symbol_L[itr]<3) { //only do if symbol_L is 1 or 2, ignore if 3 or 4
//Reward
prediction_error = Rew_left[itr] - predicted_reward_perSymbol[itr,Symbol_L[itr]];
predicted_reward_perSymbol[itr+1,Symbol_L[itr]] = predicted_reward_perSymbol[itr,Symbol_L[itr]]+b_learningRate_reward[ParticipantID[itr]]*prediction_error;
//Effort
prediction_error = Effort_left[itr] - predicted_effort_perSymbol[itr,Symbol_L[itr]];
predicted_effort_perSymbol[itr+1,Symbol_L[itr]] = predicted_effort_perSymbol[itr,Symbol_L[itr]]+b_learningRate_effort[ParticipantID[itr]]*prediction_error;
} else { //if Symbol_L[itr] is 3 or 4
predicted_reward_perSymbol[itr+1,(3-Symbol_R[itr])] = predicted_reward_perSymbol[itr,(3-Symbol_R[itr])]; //no update
predicted_effort_perSymbol[itr+1,(3-Symbol_R[itr])] = predicted_effort_perSymbol[itr,(3-Symbol_R[itr])]; //no update
}
//Compute prediction error for right option
if (Symbol_R[itr]<3){ //only do if symbol_L is 1 or 2, ignore if 3 or 4
//Reward
prediction_error = Rew_right[itr] - predicted_reward_perSymbol[itr,Symbol_R[itr]];
predicted_reward_perSymbol[itr+1,Symbol_R[itr]] = predicted_reward_perSymbol[itr,Symbol_R[itr]]+b_learningRate_reward[ParticipantID[itr]]*prediction_error;
//Effort
prediction_error = Effort_right[itr] - predicted_effort_perSymbol[itr,Symbol_R[itr]];
predicted_effort_perSymbol[itr+1,Symbol_R[itr]] = predicted_effort_perSymbol[itr,Symbol_R[itr]]+b_learningRate_effort[ParticipantID[itr]]*prediction_error;
} else { //if Symbol_R[itr] is 3 or 4
predicted_reward_perSymbol[itr+1,(3-Symbol_L[itr])] = predicted_reward_perSymbol[itr,(3-Symbol_L[itr])]; //no update
predicted_effort_perSymbol[itr+1,(3-Symbol_L[itr])] = predicted_effort_perSymbol[itr,(3-Symbol_L[itr])]; //no update
}
}
}
//Decision loop
// compute utilities for choice_l and choice_r (remember special cases for options 3 and 4)
for (itr in 1:ntr) {
// Compute utiltiy for left option
if (Symbol_L[itr]<3){ //only do if symbol_L is 1 or 2
util_L[itr] = b_rewMag[ParticipantID[itr]]*predicted_reward_perSymbol[itr,Symbol_L[itr]] + b_effortMag[ParticipantID[itr]]*predicted_effort_perSymbol[itr,Symbol_L[itr]]
+ b_probability[ParticipantID[itr]]*Prob_left[itr];
} else { //if symbol is 3 or 4
util_L[itr] = b_rewMag[ParticipantID[itr]]*Rew_left[itr] + b_effortMag[ParticipantID[itr]]*Effort_left[itr]
+ b_probability[ParticipantID[itr]]*Prob_left[itr];
}
//Compute utility for right option
if (Symbol_R[itr]<3) { //only do if symbol_R is 1 or 2
util_R[itr] = b_rewMag[ParticipantID[itr]]*predicted_reward_perSymbol[itr,Symbol_R[itr]] + b_effortMag[ParticipantID[itr]]*predicted_effort_perSymbol[itr,Symbol_R[itr]]
+ b_probability[ParticipantID[itr]]*Prob_right[itr];
} else {
util_R[itr] = b_rewMag[ParticipantID[itr]]*Rew_right[itr] + b_effortMag[ParticipantID[itr]]*Effort_right[itr]
+ b_probability[ParticipantID[itr]]*Prob_right[itr];
} //if symbol is 3 or 4
invTxUtil[itr]= invTemp[ParticipantID[itr]]*(util_R[itr]-util_L[itr]); //(or maybe L-R)
// link utilities to actual choices
Button ~ bernoulli_logit(invTxUtil);
}//end model section
The initialization errors:
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: bernoulli_logit_lpmf: n[1] is 2, but must be in the interval [0, 1] (in 'model23f841761eee_RL_model1_version2' at line 185)
Chain 1:
Chain 1: Initialization between (-2, 2) failed after 100 attempts.
Chain 1: Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Error in sampler$call_sampler(c(args, dotlist)) : Initialization failed.
We have:
- checked all the values that go into the model, there are no NaN values
- defined all our variables, and added proper priors and constraints to our variables
It looks like it’s computing the way it should but some reason the bernoulli_logit function won’t accept the vector in the final line, We want to know why it’s failing to initialize, though we’ve properly defined and set all of our parameters.
Any help would be greatly appreciated, thank you!