Dear. Stan Users
Hi, I am trying to estimate learning rates of Reinforcement Learning model using Stan. The Reinforcement Learning model is somewhat modified from traditional one for my experimental task.
The equation of my model is :
SV(t)=πΊπ½(πβπ)+πβ(ππ©(π)βπΊπ½(πβπ)) if trial > 1
SV(t)=πβππ©(π) if trial = 1
The equation for the decision probability is :
Logit[P(Decision)] = Ξ²(SV) xSV + Ξ²(OQ) xOQ+ c
I tried to make code with stan code of others who used RL model (but not exactly same one with mine) but I do not have confident about this code and it gives error like :
Too many indexes for variable; variable name = PE; num dimensions given = 2; variable array dimensions = 0
error in βmodelfe43c8a760b_Only3Subjectsβ at line 65, column 16
63: // prediction error
64: if(t == 1){
65: PE[i,t] = feedback[i,t] - 0;
^
66: } else {
Maybe there are other errors other than this one. I am not that familiar to R or Stan and instead more familiar to Matlab so that you may see some confusing parts in this code.
Below is the full code of mine. I would be greatly appreciated if anyone provides wrong parts and how I can deal with them.
Thank you so much.
This data is for 3 individuals who have 75 trials each.
// model with alpha and beta
data {
int<lower=1> N; // number of subjects
int<lower=1> T; // number of trials
int<lower=1> Tsubj[N]; // number of trials per usbject
int<lower=-1,upper=1> choice[N,T]; // decision data
real<lower=-1, upper=1> feedback[N,T]; // feedback data
int<lower=1, upper=5> OQ[N,T]; // data of the variable named OQ other than feedback
}
transformed data {
vector[1] initV; // initial value for SV (Self-Value or Stimulus Value)
initV = rep_vector(0.0, 1);
}
parameters {
// Declare all parameters as vectors for vectorizing
// Hyper(group)-parameters
vector[3] mu_p;
vector<lower=0>[3] sigma;
// Subject-level raw parameters (for Matt trick)
vector<lower=0, upper=1>[N] alpha_pr; // learning rate [0, 1]
vector[N] beta_FB_pr; // inverse temperature for feedback (outcome)
vector[N] beta_OQ_pr; // inverse temperature for OQ
}
transformed parameters {
// subject-level parameters
vector<lower=0,upper=1>[N] alpha;
vector[N] beta_FB; // for FB
vector[N] beta_OQ; // for OQ
for (i in 1:N) {
alpha[i] = Phi_approx( mu_p[1] + sigma[1] * alpha_pr[i] );
beta_FB[i] = Phi_approx( mu_p[2] + sigma[2] * beta_FB_pr[i] ) * 5;
beta_OQ[i] = Phi_approx( mu_p[3] + sigma[3] * beta_OQ_pr[i] ) * 5;
}
}
model {
// Hyperparameters
mu_p ~ normal(0, 1);
sigma ~ normal(0, 1);
// individual parameters
alpha_pr ~ normal(0,1);
beta_FB_pr ~ normal(0,1);
beta_OQ_pr ~ normal(0,1);
// subject loop and trial loop
for (i in 1:N) {
vector[1] sv; // self-value (stimulus value)
real PE; // prediction error
sv = initV;
for (t in 1:(Tsubj[i])) {
// prediction error
if(t == 1){
PE[i,t] = feedback[i,t] - 0;
} else {
PE[i,t] = feedback[i,t] - sv[i,t-1];
}
// value updating (learning)
if(t == 1){
sv[i,t] = alpha[i]*feedback[i,t];
} else {
sv[i,t] = sv[i,t-1]+alpha[i]*PE[i,t];
}
// compute decision probabilities
choice[i,t] ~ categorical_logit( beta_FB[i]*sv + beta_OQ[i]*OQ );
}
}
}
generated quantities {
}
To include mathematical notation in your post put LaTeX syntax between two $
symbols, e.g.,
p(\theta | y) \propto p(\theta) p(y | \theta).