Hi all,

I am running a fairly standard reinforcement learning model with decay rate on a data set. I am fitting the participant’s choice (0 or 1- corresponding to the stim) using the softmax function which is then run as a bernoulli trial (1 if they select the second stimuli 0 if they select the first).

When I run a simple model that is identical but does not include the following decay line I don’t get any issues- but as soon as I introduce the decay rate to the expected value based on the time elapsed

`> ev *= exp(-lambda_1*((times[t]-times[t-1])/(1000*3600*24)));`

`current_choice[t] ~ bernoulli(exp(beta_1

current[2])/(exp(beta_1current[1])+exp(beta_1*current[2])))

generates the error:

`> Exception: bernoulli_lpmf: Probability parameter is nan, but must be finite!`

I’m not able to figure out why a change in ev would generate this error . I am attaching the full code below.

Thank you!

```
data {
int Trial;
int outcome[Trial];
int choice[Trial];
int stim1[Trial];
int stim2[Trial];
real times[Trial];
int<lower=0, upper=1> current_choice[Trial] ;
}
transformed data {
vector<lower=0, upper=3>[99] initV; // initial value for EV
vector[2] initVcur;
initV = rep_vector(.5, 99);
initVcur = rep_vector(0.5, 2);
}
parameters {
real <lower=0, upper=1> alpha_1; //the learning rate
real <lower=0, upper=10> beta_1; // the temperature
real lambda_1;
}
model {
real PE;
vector[99] ev;
vector[2] current;
current=initVcur;
ev=initV;
// the prior distribution
alpha_1~ beta(1, 9);
beta_1 ~ gamma(5, 1);
lambda_1 ~ normal(0,1);
for (t in 1:1){
current[1] = ev[stim1[t]];
current[2] = ev[stim2[t]];
current_choice[t] ~ bernoulli(exp(beta_1*current[2])/(exp(beta_1*current[1])+exp(beta_1*current[2]))); // fitting a softmax function to the actual choice
// prediction error
PE = outcome[t] - ev[choice[t]];
// value updating (learning)
//ev[choice[t]] += ev[choice[t]] + alpha_1 * PE;
ev[choice[t]] += alpha_1 * PE;
}
//introduce a decay rate
for (t in 2:Trial){
//decay rate with t in hours
ev *= exp(-lambda_1*((times[t]-times[t-1])/(1000*3600*24)));
current[1] = ev[stim1[t]];
current[2] = ev[stim2[t]];
current_choice[t] ~ bernoulli(exp(beta_1*current[2])/(exp(beta_1*current[1])+exp(beta_1*current[2])));
// prediction error
PE = outcome[t] - ev[choice[t]];
// value updating (learning)
//ev[choice[t]] += ev[choice[t]] + alpha_1 * PE;
ev[choice[t]] += alpha_1 * PE;
}
}
```