# Stan code for fitting simple RL model

Dear. Stan Users

Hi, I am trying to estimate learning rates of Reinforcement Learning model using Stan. The Reinforcement Learning model is somewhat modified from traditional one for my experimental task.

The equation of my model is :
SV(t)=πΊπ½(πβπ)+πβ(π­π©(π)βπΊπ½(πβπ)) if trial > 1
SV(t)=πβπ­π©(π) if trial = 1
The equation for the decision probability is :
Logit[P(Decision)] = Ξ²(SV) xSV + Ξ²(OQ) xOQ+ c

I tried to make code with stan code of others who used RL model (but not exactly same one with mine) but I do not have confident about this code and it gives error like :
Too many indexes for variable; variable name = PE; num dimensions given = 2; variable array dimensions = 0
error in βmodelfe43c8a760b_Only3Subjectsβ at line 65, column 16

63:       // prediction error
64:       if(t == 1){
65:       PE[i,t] = feedback[i,t] - 0;
^
66:       } else {


Maybe there are other errors other than this one. I am not that familiar to R or Stan and instead more familiar to Matlab so that you may see some confusing parts in this code.
Below is the full code of mine. I would be greatly appreciated if anyone provides wrong parts and how I can deal with them.
Thank you so much.

This data is for 3 individuals who have 75 trials each.

// model with alpha and beta
data {
int<lower=1> N; // number of subjects
int<lower=1> T; // number of trials
int<lower=1> Tsubj[N]; // number of trials per usbject
int<lower=-1,upper=1> choice[N,T]; // decision data
real<lower=-1, upper=1> feedback[N,T];  // feedback data
int<lower=1, upper=5> OQ[N,T]; // data of the variable named OQ other than feedback
}

transformed data {
vector[1] initV;  // initial value for SV (Self-Value or Stimulus Value)
initV = rep_vector(0.0, 1);
}

parameters {
// Declare all parameters as vectors for vectorizing
// Hyper(group)-parameters
vector[3] mu_p;
vector<lower=0>[3] sigma;

// Subject-level raw parameters (for Matt trick)
vector<lower=0, upper=1>[N] alpha_pr;    // learning rate [0, 1]
vector[N] beta_FB_pr;  // inverse temperature for feedback (outcome)
vector[N] beta_OQ_pr; // inverse temperature for OQ
}

transformed parameters {
// subject-level parameters
vector<lower=0,upper=1>[N] alpha;
vector[N] beta_FB; // for FB
vector[N] beta_OQ; // for OQ

for (i in 1:N) {
alpha[i] = Phi_approx( mu_p[1] + sigma[1] * alpha_pr[i] );
beta_FB[i] = Phi_approx( mu_p[2] + sigma[2] * beta_FB_pr[i] ) * 5;
beta_OQ[i] = Phi_approx( mu_p[3] + sigma[3] * beta_OQ_pr[i] ) * 5;
}
}

model {
// Hyperparameters
mu_p  ~ normal(0, 1);
sigma ~ normal(0, 1);

// individual parameters
alpha_pr ~ normal(0,1);
beta_FB_pr ~ normal(0,1);
beta_OQ_pr ~ normal(0,1);

// subject loop and trial loop
for (i in 1:N) {
vector[1] sv; // self-value (stimulus value)
real PE;      // prediction error

sv = initV;

for (t in 1:(Tsubj[i])) {

// prediction error
if(t == 1){
PE[i,t] = feedback[i,t] - 0;
} else {
PE[i,t] = feedback[i,t] - sv[i,t-1];
}

// value updating (learning)
if(t == 1){
sv[i,t] = alpha[i]*feedback[i,t];
} else {
sv[i,t] = sv[i,t-1]+alpha[i]*PE[i,t];
}

// compute decision probabilities
choice[i,t] ~ categorical_logit( beta_FB[i]*sv + beta_OQ[i]*OQ );

}
}
}

generated quantities {
}


To include mathematical notation in your post put LaTeX syntax between two \$ symbols, e.g.,
p(\theta | y) \propto p(\theta) p(y | \theta).

The probelm is that PE is being declared as a scalar but is being used as a 2D array. If itβs just a local variable, then you probably just want:

real PE = t == 1 ? feeback[i, t] : feedback[i, t] - sv[i, t-1];


Hi, lyoon,
Does your code work now?
Iβm trying to fit a RL model with stan, but I got
the βdivergent after warmupβ warning,
And I try the transform in your code:
alpha[i] = Phi_approx( mu_p[1] + sigma[1] * alpha_pr[i] );
beta_FB[i] = Phi_approx( mu_p[2] + sigma[2] * beta_FB_pr[i] ) * 5;
beta_OQ[i] = Phi_approx( mu_p[3] + sigma[3] * beta_OQ_pr[i] ) * 5;
and it doesnβt work.