 # RL-Model-Error in evalutating the initial log probability

Here is my stan code below:

``````data {
int<lower=1> nSubjects;
int<lower=1> nTrials;
int<lower=1,upper=4> choice[nSubjects, nTrials];
real<lower=-1150, upper=100> reward[nSubjects, nTrials];
}
transformed data {
vector initV;  // initial values for V
vector initu;
initV = rep_vector(0.0, 4);
initu = rep_vector(0.0, 4);
}
parameters {
real<lower=0,upper=1>   lr[nSubjects];
real<lower=0,upper=5>   c[nSubjects];
real<lower=0,upper=1>   A[nSubjects];
real<lower=0,upper=5>   loss_aversion[nSubjects];
}
model {
for (s in 1:nSubjects) {
vector v;
vector U;
real pe;
real theta;
v     = initV;
U     = initu;

for (t in 1:nTrials) {
// complete the foo-loop here
if (reward[s,t]>=0) {
U[choice[s,t]] = pow(reward[s,t],A[s]);}
else {
U[choice[s,t]] = -loss_aversion[s]*(pow(reward[s,t],A[s]));}

theta = pow(3,c[s])-1;
choice[s,t] ~ categorical_logit(theta*v);
pe = U[choice[s,t]]-v[choice[s,t]];
v[choice[s,t]]=v[choice[s,t]]+lr[s]*pe;
}
}
}
``````

And the error reports like:

``````Chain 1: Rejecting initial value: Chain 1: Error evaluating the log probability at the initial value. Chain 1: Exception: categorical_logit_lpmf: log odds parameter is nan, but must be finite! (in 'model313873be45dd_PVL_Delta_learning_Model' at line 53)
``````

(Here line 53 is where choice[s,t] ~ categorical_logit(theta*v) is located.

The model is a specific model for fitting RL theory in a gambling task.But previously similar models can run perfectly and fit my data. I am confused why this one reports error.

Till now, I have tried following ways to debug the model fitting:
1.deleting theta; put the function directly in the categorical_logit();but sytactly wrong.
2.leaving ‘v’ alone in the function of categorical_logit, but report the same error;
3.changing constraint of c[0,5] to [1,5], reporting the same error.

Hope someone can help me with my problem, though I am not very adept at understanding stan code. Thank you guys in advance!

I think the problem might be the second branch here

``````if (reward[s,t]>=0) {
U[choice[s,t]] = pow(reward[s,t],A[s]);}
else {
U[choice[s,t]] = -loss_aversion[s]*(pow(reward[s,t],A[s]));}
``````

Doesn’t make sense to raise a negative number to a fractional power. Maybe it should be `pow(-reward[s,t],A[s])` or something?

wow！it’s a greaaaat hint! I will try very soon! Thank you bro!

Guess what? It starts to run without bugging! wow! thank you so much! I’ve been stuck here for several days!

I’ve edited your post so it’s easier to read the code block. I’ve also marked the response you got as the solution. Please let me know if that is not the case.

Glad to see that @nhuurre was able to help! Thanks Niko!