Hi All,
I am trying to run model recovery for Q-learning RL model (fit with the true and wrong model, and compare using loo). Parameter recovery seems to be very good, and overall it seems like the model is predicting the simulated data. However, I get very bad diagnostic from the loo package… I saw that this was discussed a few years ago, but I didn’t find how to solve this. Really frustrating since my aim is to go for more complicated RL models, but I can’t seem to get the basic model recovery.
Please - any thoughts on what I might be doing wrong here?
Thank you!
Nitzan
//General fixed parameters for the experiment/models
int<lower = 1> Nsubjects; //number of subjects
int<lower = 1> Ntrials; //maximum number of trials per subject (without missing data). Used to form the subject x trials matricies.
int<lower = 1> Ntrials_per_subject[Nsubjects]; //number of trials left for each subject after data omission
int<lower = 2> Narms; //number of overall alternatives
int<lower = 2> Nraffle; //number of offers per trial
//Behavioral data:
//each variable being a subject x trial matrix
//the data is padded in make_standata function so that all subjects will have the same number of trials
int<lower = 0> action[Nsubjects,Ntrials]; //index of which arm was pulled coded 1 to 4
int<lower = 0> reward[Nsubjects,Ntrials]; //outcome of bandit arm pull
int<lower = 0> offer1[Nsubjects,Ntrials]; //outcome of bandit arm pull
int<lower = 0> offer2[Nsubjects,Ntrials]; //outcome of bandit arm pull
int<lower = 0> selected_offer[Nsubjects,Ntrials]; //outcome of bandit arm pull
}
transformed data{
int<lower = 1> Nparameters=2; //number of parameters
vector[Narms] Qvalue_initial; // initial values for Qvalues (defined here to aviod doing this many times across iterations)
Qvalue_initial = rep_vector(0.0, Narms);
}
parameters {
// Declare parameters vectors. the notation "aux" indicate that the values are before transformation
//population level parameters
vector[Nparameters] mu_aux; //vector with the population level mean for each model parameter
vector<lower=0>[Nparameters] sigma_aux; //vector of random effects variance for each model parameter
//individuals level
vector[Nsubjects] alpha_individal_aux;
vector[Nsubjects] beta_indvidial_aux;
}
transformed parameters {
//declare variables and parameters
vector<lower=0, upper=1>[Nsubjects] alpha;
vector<lower=0, upper=10>[Nsubjects] beta;
for (subject in 1:Nsubjects) {
alpha[subject] = Phi_approx(mu_aux[1] + sigma_aux[1] * alpha_individal_aux[subject]);
beta[subject] = Phi_approx(mu_aux[2] + sigma_aux[2] * beta_indvidial_aux[subject]) * 10;
}
}
model {
// population level priors (hyper-parameters)
mu_aux ~ normal(0, 1);
sigma_aux ~ normal(0, 0.5);
// indvidual level priors (subjects' parameters)
alpha_individal_aux~ normal(0, 1);
beta_indvidial_aux ~ normal(0, 1);
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//Likelihood function per subject per trial
for (subject in 1:Nsubjects){
vector[Narms] Qcard;
vector[Nraffle] Qoffer;
Qcard=Qvalue_initial;
for (trial in 1:Ntrials_per_subject[subject]){
Qoffer[1]=Qcard[offer1[subject,trial]];
Qoffer[2]=Qcard[offer2[subject,trial]];
//liklihood function
selected_offer[subject, trial] ~ categorical_logit(beta[subject] * Qoffer);
//Qvalues update
Qcard[action[subject,trial]] += alpha[subject] * (reward[subject,trial] - Qcard[action[subject,trial]]);
}
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
}
generated quantities {
//define parameters that will be saved in the final output
real<lower=0, upper=1> mu_alpha;
real<lower=0, upper=10> mu_beta;
real log_lik[Nsubjects];
real y_pred[Nsubjects, Ntrials];
// Set all posterior predictions to -1 (avoids NULL values)
for (i in 1:Nsubjects) {
for (t in 1:Ntrials) {
y_pred[i, t] = -1;
}
}
mu_alpha=Phi_approx(mu_aux[1]);
mu_beta=Phi_approx(mu_aux[2])*10;
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//Likelihood function per subject per trial (placed in generetaed quantities block to save time and memory)
{ //
for (subject in 1:Nsubjects) {
vector[Narms] Qcard;
vector[Nraffle] Qoffer;
Qcard=Qvalue_initial;
log_lik[subject] = 0;
for (trial in 1:Ntrials_per_subject[subject]){
//offer values
Qoffer[1]=Qcard[offer1[subject,trial]];
Qoffer[2]=Qcard[offer2[subject,trial]];
// compute log likelihood of current trial
log_lik[subject] += categorical_logit_lpmf(selected_offer[subject, trial] | beta[subject] * Qoffer);
// generate posterior prediction for current trial
//y_pred[subject, trial] = categorical_rng(softmax(beta[subject] * Qoffer));
y_pred[subject, trial] = softmax(beta[subject] * Qoffer)[selected_offer[subject, trial]];
//Qvalues update
Qcard[action[subject,trial]] += alpha[subject] * (reward[subject,trial] - Qcard[action[subject,trial]]);
}
}
}
}
Computed from 12000 by 20 log-likelihood matrix
Estimate SE
elpd_loo -1875.5 83.0
p_loo 32.6 2.0
looic 3751.0 166.0
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 0 0.0% <NA>
(0.5, 0.7] (ok) 0 0.0% <NA>
(0.7, 1] (bad) 16 80.0% 21
(1, Inf) (very bad) 4 20.0% 9
See help('pareto-k-diagnostic') for details.