I have a reinforcement learning model with 2 choices per trial (coded as 1
or 2
) for which I calculate expectancy values (i.e. 1 value for each of the two choices) per trial to predict the choice.
I have implemented this (extending a model from hBayesDM package) using categorical_logit()
to which I can input my 2 expectancy values (stored in the vector[2] ev;
) and also calculate the log likelihood and predicted choice:
model {
...
choice[i, t, j] ~ categorical_logit(tau[i, j] * ev);
...
}
generated quantities {
real y_pred[nS, nC, nT_max];
real PE_pred[nS, nC, nT_max];
real ev_pred[nS, nC, nT_max, 2];
...
// compute log likelihood of current trial
log_lik[i, j] += categorical_logit_lpmf(choice[i, t, j] | tau[i, j] * ev);
// generate posterior prediction for current trial
y_pred[i, j, t] = categorical_rng(softmax(tau[i, j] * ev));
...
}
However, one of the choices is always associated with a higher chance to obtain a reward (“correct choice”), thus, I thought to change this to a bernoulli_logit()
model.
I recoded my choice to be 1
for correct choice and 0
for incorrect choice.
My issue, however, is which value to input to bernoulli_logit()
. I understand that categorical_logit()
takes a vector
with one real number for each of n
categories (is this the correct interpretation of \theta \in \mathbb{R}^{N}?) that is transformed using softmax(\beta). So I can input my vector[2] ev
of expectancy values multiplied by an inverse temperature value tau
to predict the choice.
The documentation (Stan function reference 12.2) states that bernoulli_logit()
takes a “chance-of-success parameter” \alpha where \alpha \in \mathbb{R}. Thus, I understand that this is a single value which is not necessarily a probability value (i.e. \in [0, 1])}. But I cannot just use the expectancy value for the “correct choice” (i.e. ev[1]
) since I must set this in relation to the expectancy value of the “incorrect choice” (ev[2]
).
So one option would be to calculate the vector[2] softmax_ev = softmax(tau*ev);
and use real alpha = softmax_ev[1]
as input for bernoulli_logit()
.
Another option would be to calculate a difference between the 2 expectancy values (real diff_ev = ev[1]-ev[2];
) and then use this as input to bernoulli_logit()
. With this option I’m not sure how to include my inverse temperature balue tau
…
I hope I could explain my question intelligibly! I am happy to provide more information if necessary!
Any help of how solve this and/or any explanation how to best model such a choice prediction is highly appreciated!
Thanks in advance!