How can I simulate data for my model?(Reinforcement Learning Model)

my intended stan code is like below:

data {
  int<lower=1,upper=39> nSubjects;
  int<lower=1,upper=100> nTrials;
  int<lower=1,upper=4> choice[nSubjects, nTrials];     
  real<lower=-1150, upper=100> reward[nSubjects, nTrials]; 
  matrix [nSubjects, nTrials] NumA[nSubjects, nTrials];
  matrix [nSubjects, nTrials] NumB[nSubjects, nTrials];
  matrix [nSubjects, nTrials] NumC[nSubjects, nTrials];
  matrix [nSubjects, nTrials] NumD[nSubjects, nTrials];
}

transformed data {
  vector[4] initV;  // initial values for V
  vector[4] initu;
  vector[1] initA;
  vector[1] initB;
  vector[1] initC;
  vector[1] initD;
  initV = rep_vector(0.0, 4);
  initu = rep_vector(0.0, 4);
  initA = rep_vector(0.0, 1);
  initB = rep_vector(0.0, 1);
  initC = rep_vector(0.0, 1);
  initD = rep_vector(0.0, 1);
}

parameters {
  // real<lower=0,upper=1>lr_mu;
  // real<lower=0,upper=1>tau_mu;
  
  // real<lower=0>lr_sd;
  // real<lower=0>tau_sd;
  
  real<lower=0,upper=1>   lr[nSubjects];
  real<lower=0,upper=2>   c[nSubjects];
  real<lower=0,upper=1>   A[nSubjects];
  real<lower=0,upper=5>   loss_aversion[nSubjects];
}

model {
  // lr_sd  ~  cauchy(0,1);
  /// tau_sd ~  cauchy(0,3);
  // lr     ~  normal(lr_mu,lr_sd);
  // tau    ~  normal(tau_mu,tau_sd);
 
}

generated quantities {
  matrix [nSubjects,nTrials]choice_sim[nSubjects,nTrials];
  matrix [nSubjects,nTrials]reward_sim[nSubjects,nTrials];
  for (s in 1:nSubjects) {
    vector[4] v; 
    vector[4] U;
    real pe;
    real theta;
    v     = initV;
    U     = initu;
    
    for (t in 1:nTrials) {
      // complete the foo-loop here
      theta = pow(3,c[s])-1;
      choice[s,t] = softmax(theta*v);
      
      if(choice[s,t] == 1){
        reward[s,t] = IGT_netincome[1,NumA];
        NumA = NumA+1;
      }else if(choice[s,t] == 2){
        reward[s,t] = IGT_netincome[2,NumB];
        NumB = NumB+1;
      }else if(choice[s,t] == 3){
        reward[s,t] = IGT_netincome[3,NumC];
        NumC = NumC+1;
      }else{
        reward[s,t] = IGT_netincome[4,NumD];
        NumD = NumD+1;
      }
         
      if (reward[s,t]>=0) {
        U[choice[s,t]] = pow(reward[s,t],A[s]);
        }else {
        U[choice[s,t]] = -loss_aversion[s]*(pow(-reward[s,t],A[s]));}
      v[choice[s,t]]=v[choice[s,t]]+lr[s]*pe;
      pe = U[choice[s,t]]-v[choice[s,t]];
        
  }
 }
}

But I found it not applicable. Though it’s a model based on Rescolar-wagnar reinforcement learning model, it’s under the context of Iowa Gambling Task. In brief, I would like to produce choice behaviour based on RL model, however the choice should be interger from 1 to 4 because they can only select among the four given decks. The reward is hidden beneath the card and you will get a reward after picking a deck. Their relationship is stored in a matrix which means certain choices produce certain reward.
And as you know, the reward may be high or low, positive or negative, and it impacts the next trial of choice probability. The probability is described by softmax rules.

So How can I simulate such a dataset because I am in a mess between model fitting and parameter estimation and model simulation. If you have specifc suggestions on code above, I would really appreciate it! Thank you in advcance!

Can you run your model using “algorithm=fixed” and then extract the generated quantities for one iteration?

What you do you mean your model is “not applicable”?

1 Like

It is not 100% clear for me what this model is doing.
For this to be a reinforcement learning model, the model sections needs code to update reward expectations and to make a choice given the reward expectations. The model has neither.

Even if your goal is to ultimately fit IGT data, you could start by writing a model/simulating data for a simpler situation with only two options.

Also note that other’s have fit IGT data with Stan before. You could search for that and take it as a starting point. (Though these models are oftentimes more complex than a simple Rescorla Wagner model.

2 Likes

You should take a look at the IGT models of hBayesDM: http://ccs-lab.github.io/hBayesDM/reference/index.html

You can use them as starting points for working hierarchical models of the IGT. You should be able to get your intended model by simplifying them.

unfortunately, on Github, the codes relevant to IGT are not availabe due to the 404 not found error.

Alternatively, download the package in R and then look through the package contents.

Have you checked out the hBayesDM core files directory on GitHub? All Stan code is in the stan_files folder, and documentation is in the models folder (e.g. this IGT model is showing up fine for me).

OK, my first answer was a bit fast, because you actually have lots of what you need in the generated quantities.

What you seem to be missing is to update the thetas. You define the prediction error here:

Bu you are not updating theta after that. You need something like

theta[choice[s,t]] = theta[choice[s,t]] + alpha * pe;

where alpha is the learning rate, which should have a upper and lower bound of 0 and 1, and which can use the beta distribution as a prior. (I can’t see the learning rate in your parameter block, but maybe that’s my unfamiliarity with your code.)

Lastly, It looks like you are trying to implement something along the lines of prospect theory for the utility function. I would first try a simpler model without a complex utility function, and move only on to something more complicated after you successfully recovered parameters of the simpler model.