I have a reinforcement learning model with 2 choices per trial (coded as `1`

or `2`

) for which I calculate expectancy values (i.e. 1 value for each of the two choices) per trial to predict the choice.

I have implemented this (extending a model from hBayesDM package) using `categorical_logit()`

to which I can input my 2 expectancy values (stored in the `vector[2] ev;`

) and also calculate the log likelihood and predicted choice:

```
model {
...
choice[i, t, j] ~ categorical_logit(tau[i, j] * ev);
...
}
generated quantities {
real y_pred[nS, nC, nT_max];
real PE_pred[nS, nC, nT_max];
real ev_pred[nS, nC, nT_max, 2];
...
// compute log likelihood of current trial
log_lik[i, j] += categorical_logit_lpmf(choice[i, t, j] | tau[i, j] * ev);
// generate posterior prediction for current trial
y_pred[i, j, t] = categorical_rng(softmax(tau[i, j] * ev));
...
}
```

However, one of the choices is always associated with a higher chance to obtain a reward (â€ścorrect choiceâ€ť), thus, I thought to change this to a `bernoulli_logit()`

model.

I recoded my choice to be `1`

for correct choice and `0`

for incorrect choice.

My issue, however, is which value to input to `bernoulli_logit()`

. I understand that `categorical_logit()`

takes a `vector`

with one real number for each of `n`

categories (is this the correct interpretation of \theta \in \mathbb{R}^{N}?) that is transformed using softmax(\beta). So I can input my `vector[2] ev`

of expectancy values multiplied by an inverse temperature value `tau`

to predict the choice.

The documentation (Stan function reference 12.2) states that `bernoulli_logit()`

takes a â€śchance-of-success parameterâ€ť \alpha where \alpha \in \mathbb{R}. Thus, I understand that this is a single value which is not necessarily a probability value (i.e. \in [0, 1])}. But I cannot just use the expectancy value for the â€ścorrect choiceâ€ť (i.e. `ev[1]`

) since I must set this in relation to the expectancy value of the â€śincorrect choiceâ€ť (`ev[2]`

).

So one option would be to calculate the `vector[2] softmax_ev = softmax(tau*ev);`

and use `real alpha = softmax_ev[1]`

as input for `bernoulli_logit()`

.

Another option would be to calculate a difference between the 2 expectancy values (`real diff_ev = ev[1]-ev[2];`

) and then use this as input to `bernoulli_logit()`

. With this option Iâ€™m not sure how to include my inverse temperature balue `tau`

â€¦

I hope I could explain my question intelligibly! I am happy to provide more information if necessary!

Any help of how solve this and/or any explanation how to best model such a choice prediction is highly appreciated!

Thanks in advance!