Dear @caesoma ,

Thank you very much for your reply. To be honest, I also think itâ€™s weird that shorter warmup perform better than longer warmup.

I would like to provide more details and would be nice to have your professional suggestions.

I am fitting a Reinforcement learning model to my data, as I would like to have trial-by-trial model regressors. However, there are 3 things should be noted:

- I have many missing data (nearly 1/3), but I cannot remove them, because in my experiment the trial sequence is meaningful.
- My DV is skin conductance reaction, which can only take positive values.
- My model is hierarchical, meaning that I include hyper parameter priors at the participant-level.

Following is my Stan code:

```
data {
int<lower=0> N_obs; // number of observed data points
int<lower=0> N_mis; // number of missing data points
array[N_obs]int<lower=1, upper=N_obs + N_mis> ii_obs; // indexes of the observed data
array[N_mis]int<lower=1, upper=N_obs + N_mis> ii_mis; // indexes of the missing data
int<lower=1> S; // number of participants
int<lower=1> Tri; // maximum number of trials per participant
array[S]int<lower=1, upper=Tri> subj_trial; // an array in which specify the trial numbers per subject
array[S, Tri]int<lower=1, upper=2> CS; // CS per trial for each subject
array[S, Tri]int<lower=0, upper=1> US; // US per trial for each subject
array[N_obs] real<lower=0> SCR_obs; // observed SCR data
}
transformed data{
int<lower=0> N_all = N_obs + N_mis;
vector[2] init;
init = rep_vector(0.5, 2);
}
parameters {
// missing data
array[N_mis] real<lower=0> SCR_mis; // missing SCR data
// Hierarchical mu parameters (at the population-level)
real mu_b0; // intercept
real mu_b1; // slope
real mu_sigma; // sigma
real mu_kappa; // scaling parameter for updating the learning rate trial by trial
real mu_eta; // learning rate
// sigma parameters
real<lower=0.001> sigma_b0; // from the hierarchical distribution of intercept
real<lower=0.001> sigma_b1; // from the hierarchical distribution of slope
real<lower=0.001> sigma_par; // from the hierarchical distribution of sigma
real<lower=0.001> sigma_kappa; // for kappa
real<lower=0.001> sigma_eta; // for the learning rate
// Normal distributions for non-center parameterization, i.e., Matt trick
real z_b0[S];
real z_b1[S];
real z_sigma[S];
real z_kappa[S];
real z_eta[S];
}
transformed parameters {
// data
array[N_all] real<lower=0> SCR;
SCR[ii_obs] = SCR_obs;
SCR[ii_mis] = SCR_mis;
// participant-level parameters
real b0[S];
real b1[S];
real<lower=0> sigma[S];
real<lower=0, upper=1> kappa[S];
real<lower=0, upper=1> eta[S];
for (s in 1:S) {
b0[s] = mu_b0 + z_b0[s]*sigma_b0;
b1[s] = mu_b1 + z_b1[s]*sigma_b1;
sigma[s] = exp(mu_sigma + z_sigma[s]*sigma_par);
kappa[s] = Phi_approx(mu_kappa + z_kappa[s]*sigma_kappa);
eta[s] = Phi_approx(mu_eta + z_eta[s]*sigma_eta);
}
}
model {
SCR_mis ~ gamma(2, 2);
mu_b0 ~ normal(0, 2);
mu_b1 ~ normal(0, 2);
mu_sigma ~ normal(0, 2);
mu_kappa ~ normal(0, 2);
mu_eta ~ normal(0, 2);
sigma_b0 ~ gamma(2, 2);
sigma_b1 ~ gamma(2, 2);
sigma_par ~ gamma(2, 2);
sigma_kappa ~ gamma(2, 2);
sigma_eta ~ gamma(2, 2);
z_b0 ~ normal(0, 1);
z_b1 ~ normal(0, 1);
z_sigma ~ normal(0, 1);
z_kappa ~ normal(0, 1);
z_eta ~ normal(0, 1);
/* trial-by-trial update */
for (s in 1:S) {
vector[2] q; // initial q values for CS+/CS-
q = init;
real eta_updated; // local variable to store the updated value of eta[s]
eta_updated = eta[s]; // Initialize the updated_eta with the original eta[s]
for (t in 1:subj_trial[s]) {
int row_number; // current row number, cause we are doing trial-by-trial update
row_number = sum(subj_trial[1:s-1]) + t;
target += normal_lpdf(SCR[row_number] | b0[s] + b1[s]*eta_updated, sigma[s]);
eta_updated = kappa[s]*abs(q[CS[s, t]] - US[s, t]) + (1 - kappa[s])*eta_updated;
q[CS[s, t]] += eta_updated*(US[s, t] - q[CS[s, t]]);
}
}
}
generated quantities {
// hierarchical parameter
real<lower=0, upper=1> transf_mu_kappa = Phi_approx(mu_kappa);
}
```

I actually have no idea whatâ€™s going wrong, I think there might be some issue with my Stan code that seriously hinder the model fit or sampling efficiency. And I am very much appreciated if you have any suggestions.

Best