Hello, I am trying to fit a parameter called the learning rate of a reinforcement learning model (Q-learning). The learning rate is bounded between 0 and 1.

I am using data from rats’ choices and replays (a brain phenomenon whose definition I am hoping I can skip for the purposes of this question) to fit the model parameter and check if the learning rate for replay is different than zero.

fit_replays_1subj_simple.stan (1.8 KB)

The problem is that when I check if the parameter recovery is working by generating data in Python and then trying to estimate it with stan, I see a consistent overestimation of the learning rate (alphaR), especially when the true learning rate used to generate the synthetic data is zero.

Am I doing something wrong? Please let me know if I should provide any further information.

I have tried both having a symmetric flat prior [-0.5, 0.5], and a non symmetric one [0, 1]. But both were still overestimating the parameter.

Having negative values in Q-learning may be problematic, because it introduces an exponential growth in Q, so I am not sure if the symmetric option is a good one.

Another idea is to have a mass of 50% at zero, and the rest from 0 to 1. Would that work better? If yes, how would I do it?

Thank you for your time, attention and energy.

Gratefully,

Homero