Hi,

I’m new to stan (and modeling in general) and managed to create a simple reinforcement learning model with learning rate and inverse temperature, now i’m trying to make the learning rate dynamic based on some literature. I think I didn’t implement it correctly, could someone look at my code and see what I need to change? I wanted to have eta (learning rate) as an output and get a posterior for it, but I don’t know how to do that because right now I’m just computing it and not drawing it from any distribution. basically i want the learning rate to be computed based on eta = 1 / (epsilon + Nobservedoutcomes[t,stim]); where epsilon is a free parameter that indicated the initial learning rate and N is how many times an outcome of the stimulus was observed in previous trials.

Also, I get warning messages:

1: There were 2913 divergent transitions after warmup. Increasing adapt_delta above 0.9 may help. [[i’m trying to run the model right now with 0.99 but it’s taking ages]]

See

http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

2: Examine the pairs() plot to diagnose sampling problems

3: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.

Running the chains for more iterations may help. See

http://mc-stan.org/misc/warnings.html#bulk-ess

```
data {
int < lower = 0 > t; // number of trials
int < lower = 0 > N; // number of stimuli
int < lower = 0, upper = 1 > outcome[t]; // win or loss
int < lower = 0, upper = 1 > choice[t]; // choice data
real < lower = 0, upper = 1 > feedback[t];// feedback or no feedback
int stim1[t]; // left stimuli
int stim2[t]; // right stimuli
int stimchosen[t]; // chosen stimuli
int Nobservedoutcomes[t,N]; // number of previously observed outcomes for the chosen image
}
parameters {
real betatransformed;
real epsilon; // initial value of learning rate
}
transformed parameters {
real < lower = 0 > beta;
beta = exp(betatransformed); // inverse temperature
}
model {
// int y[t] ; -- choice is the data(y) here so we don't need this
real theta ; // probability of choosing the right stimulus
real PE ; // prediction error
int stim ; // number of chosen stimulus
real v[N] ; // stimulus value
real eta ; // dynamic learning rate
for (i in 1:N) {
v[i] = 0.5 ; // initial stimulus value
}
// prior distribution
betatransformed ~ normal(0,2) ;
epsilon ~ gamma(1,1) ;
// trial loop
for (i in 1:t) {
// decision probability — theta = prob of choosing the right stimulus
theta = (exp (beta * v[stim2[i]])) / ( (exp (beta * v[stim2[i]]) ) + exp (beta * v[stim1[i]]) ) ;
choice[i] ~ bernoulli(theta);
stim = stimchosen[i] ;
// only update model after feedback trials
if (feedback[i] == 1) {
// prediction error
PE = outcome[i] - v[stim] ;
// value updating (learning)
eta = 1 / (epsilon + Nobservedoutcomes[t,stim]); // dynamic learning rate
v[stim] = v[stim] + eta * PE ;
}
}
}
```