Stan code for fitting simple RL model

Dear. Stan Users

Hi, I am trying to estimate learning rates of Reinforcement Learning model using Stan. The Reinforcement Learning model is somewhat modified from traditional one for my experimental task.

The equation of my model is :
SV(t)=𝑺𝑽(π’•βˆ’πŸ)+π’‚βˆ—(𝑭𝑩(𝒕)βˆ’π‘Ίπ‘½(π’•βˆ’πŸ)) if trial > 1
SV(t)=π’‚βˆ—π‘­π‘©(𝒕) if trial = 1
The equation for the decision probability is :
Logit[P(Decision)] = Ξ²(SV) xSV + Ξ²(OQ) xOQ+ c

I tried to make code with stan code of others who used RL model (but not exactly same one with mine) but I do not have confident about this code and it gives error like :
Too many indexes for variable; variable name = PE; num dimensions given = 2; variable array dimensions = 0
error in β€˜modelfe43c8a760b_Only3Subjects’ at line 65, column 16


63:       // prediction error 
64:       if(t == 1){
65:       PE[i,t] = feedback[i,t] - 0; 
                   ^
66:       } else {

Maybe there are other errors other than this one. I am not that familiar to R or Stan and instead more familiar to Matlab so that you may see some confusing parts in this code.
Below is the full code of mine. I would be greatly appreciated if anyone provides wrong parts and how I can deal with them.
Thank you so much.

This data is for 3 individuals who have 75 trials each.

// model with alpha and beta
data {
  int<lower=1> N; // number of subjects
  int<lower=1> T; // number of trials
  int<lower=1> Tsubj[N]; // number of trials per usbject    
  int<lower=-1,upper=1> choice[N,T]; // decision data
  real<lower=-1, upper=1> feedback[N,T];  // feedback data
  int<lower=1, upper=5> OQ[N,T]; // data of the variable named OQ other than feedback
}

transformed data {
  vector[1] initV;  // initial value for SV (Self-Value or Stimulus Value)
  initV = rep_vector(0.0, 1);
}

parameters {
// Declare all parameters as vectors for vectorizing
  // Hyper(group)-parameters  
  vector[3] mu_p;  
  vector<lower=0>[3] sigma;
    
  // Subject-level raw parameters (for Matt trick)
  vector<lower=0, upper=1>[N] alpha_pr;    // learning rate [0, 1]
  vector[N] beta_FB_pr;  // inverse temperature for feedback (outcome)
  vector[N] beta_OQ_pr; // inverse temperature for OQ
}

transformed parameters {
  // subject-level parameters
  vector<lower=0,upper=1>[N] alpha;
  vector[N] beta_FB; // for FB
  vector[N] beta_OQ; // for OQ
   
  for (i in 1:N) {
    alpha[i] = Phi_approx( mu_p[1] + sigma[1] * alpha_pr[i] );
    beta_FB[i] = Phi_approx( mu_p[2] + sigma[2] * beta_FB_pr[i] ) * 5;
    beta_OQ[i] = Phi_approx( mu_p[3] + sigma[3] * beta_OQ_pr[i] ) * 5;
  }
}

model {
  // Hyperparameters
  mu_p  ~ normal(0, 1); 
  sigma ~ normal(0, 1);  
  
  // individual parameters
  alpha_pr ~ normal(0,1);
  beta_FB_pr ~ normal(0,1);
  beta_OQ_pr ~ normal(0,1);
    
  // subject loop and trial loop
  for (i in 1:N) {
    vector[1] sv; // self-value (stimulus value)
	real PE;      // prediction error
	

    sv = initV;
    
   
    for (t in 1:(Tsubj[i])) {
    	
    	    
      // prediction error 
      if(t == 1){
      PE[i,t] = feedback[i,t] - 0; 
      } else {
      PE[i,t] = feedback[i,t] - sv[i,t-1]; 
      }

      // value updating (learning) 
      if(t == 1){
      	sv[i,t] = alpha[i]*feedback[i,t]; 
      	} else {
      	sv[i,t] = sv[i,t-1]+alpha[i]*PE[i,t];
      	}
      
      // compute decision probabilities
      choice[i,t] ~ categorical_logit( beta_FB[i]*sv + beta_OQ[i]*OQ ); 
     
	
    }
  }
}

generated quantities {
}

To include mathematical notation in your post put LaTeX syntax between two $ symbols, e.g.,
p(\theta | y) \propto p(\theta) p(y | \theta).

The probelm is that PE is being declared as a scalar but is being used as a 2D array. If it’s just a local variable, then you probably just want:

real PE = t == 1 ? feeback[i, t] : feedback[i, t] - sv[i, t-1];

Hi, lyoon,
Does your code work now?
I’m trying to fit a RL model with stan, but I got
the β€œdivergent after warmup” warning,
And I try the transform in your code:
alpha[i] = Phi_approx( mu_p[1] + sigma[1] * alpha_pr[i] );
beta_FB[i] = Phi_approx( mu_p[2] + sigma[2] * beta_FB_pr[i] ) * 5;
beta_OQ[i] = Phi_approx( mu_p[3] + sigma[3] * beta_OQ_pr[i] ) * 5;
and it doesn’t work.