Stan model 'RL_RW' does not contain samples

I want to use the reinforcement learning_Rescorla Wagner model to explore how subjects learn the underlying 4 hierarchies. Left[nTrials] and right[n] are stimuli with potential hierarchies presented in pairs, and the subjects are asked to select the high-hierarchy stimuli (choice[nTrials]), if correct reward=1, incorrect reward=-1. But my model keeps getting errors. as follows:

‘Stan model ‘RL_RW’ does not contain samples.’

Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: categorical_logit_lpmf: categorical outcome out of support is 4, but must be in the interval [1, 2] (in ‘model21585021459_RL_RW’ at line 30)

This is my stan model code

  int<lower=1>  nTrials;
  int<lower=1,upper=4> left[nTrials];
  int<lower=1,upper=4> right[nTrials];
  int<lower=1,upper=4> choice[nTrials];
  int<lower=-1,upper=1> reward[nTrials];


  real<lower=0,upper=1> alpha;
  real<lower=0,upper=3> tau;

  vector[4] V_4;
  vector[2] V_2;
  real pe_l;
  real pe_r;
  for(t in 1:nTrials){
    choice[t]~ categorical_logit(tau*V_2);

    //value update
    if((choice[t]==left[t] && reward[t]==1) || (choice[t]==right[t] && reward[t]==-1)){

V_2 is a vector with only 2 entriea, which you seem to use in a choice rule for 4 options.
You need a vector of dour values to choose among 4 options.

1 Like

But for each trial, I only present 2 option for subjects.

If the a choice in a particular trial is 4, but V_2 has only to values, this wont work because the probability of choosing option 4 can’t be calculated.
It should work if you rewrite the model such that in each trials V_2 has the values of the two available options and choice is always 1 for the first and 2 for the second available option

Sorry, I just started learning stan, can you please help me to modify the code directly?

Sorry, I don’t have time to work directly with the code.

just shortly: It looks like as if here you are already taking care that only the two relevant action values are used:

One way to procede is to actually leave the Stan code unchanged and to mofidy the choice vector in the data, so that is always has 1 if people chose left and 2 if people chose right (assuming I understand your data structure correctly)

Thank you very much. Problem seems solved.

If one of my answers put you on the right path, you could mark it as solution. 😀