The values of the group parameter values in the hierarchical model are estimated to be much smaller than the simulated true values

The task was: there were a total of 3 picture stimuli with different potential ratings, and 2 stimuli were presented in each pair for subjects to choose from (choice[nSubjects,nTrials]), with a feedback of 1 if the subject chose the higher-ranked one, and 0 if they did the opposite (reward[nSubjects,nTrials]).
The parameters alpha, gamma are from a normal distribution with mean 0.4 and standard deviation 0.1, and tau is from a normal distribution with mean 2 and standard deviation 1. When I model no pooling, the estimates of the parameters for the 10 subjects don’t differ too much from the true values (for alpha the maximum is 0.2, and most of the values differ by about 0.05). However, when I model the hierarchy, the parameter values for the 10 subjects are very similar and the group parameter values are very small compared to the true values (alpha_mu=0.001, alpha_sd=0.002; gamma_mu=0.006, gamma_sd=0.01, tau_mu=0.0004, tau_sd=1.19)

data {
  int<lower=1> nTrials;
  int<lower=1> nSubjects;
  int<lower=1,upper=3> choice[nSubjects,nTrials];
  int<lower=1,upper=3> nchoice[nSubjects,nTrials];
  int<lower=0,upper=1> reward[nSubjects,nTrials];
  }
transformed data {
  vector[3] initv;  // initial values for V
  initv = rep_vector(0.0, 3);
  }
parameters {
  // group parameters
  
  // alpha
  real <lower=0,upper=1> alpha_mu;
  real <lower=0> alpha_sd;
  
  // gamma
  real <lower=0,upper=1> gamma_mu;
  real <lower=0> gamma_sd;

  // tau
  real <lower=0,upper=3> tau_mu;
  real <lower=0> tau_sd;

  
  vector <lower=0,upper=1>[nSubjects] alpha_raw;
  vector <lower=0,upper=1>[nSubjects] gamma_raw;
  vector <lower=0,upper=3>[nSubjects] tau_raw;

  }

transformed parameters {  
  vector <lower=0,upper=1>[nSubjects] alpha;
  vector <lower=0,upper=1>[nSubjects] gamma;
  vector <lower=0,upper=3>[nSubjects] tau;

  
  alpha = Phi_approx(alpha_mu+alpha_sd*alpha_raw);
  gamma = Phi_approx(gamma_mu+gamma_sd*gamma_raw);
  tau = Phi_approx(tau_mu+tau_sd*tau_raw)*3;
}


model {
  alpha_sd ~ cauchy(0,1);
  gamma_sd ~ cauchy(0,1);
  tau_sd ~ cauchy(0,3);

  
  alpha_raw ~ normal(0,1);
  gamma_raw ~ normal(0,1);  
  tau_raw ~ normal(0,1);

  
  for (s in 1:nSubjects){
    vector[3] q;
    q = initv;
    vector[2] stim;
    for (t in 1:nTrials){
      stim[1]=q[choice[s,t]];
      stim[2]=q[nchoice[s,t]];      
      1 ~ categorical_logit(tau[s]*stim);
      if (t < nTrials){

        q[choice[s,t]] = q[choice[s,t]] + alpha[s]*(reward[s,t]+
        gamma[s]*fmax(q[choice[s,t+1]],q[nchoice[s,t+1]])-q[choice[s,t]]);
          
        q[nchoice[s,t]] = q[nchoice[s,t]] + alpha[s]*(1-reward[s,t]+
        gamma[s]*fmax(q[choice[s,t+1]],q[nchoice[s,t+1]])-q[nchoice[s,t]]);
        
      } else {

        q[choice[s,t]] = q[choice[s,t]] + alpha[s]*(reward[s,t]-q[choice[s,t]]);
          
        q[nchoice[s,t]] = q[nchoice[s,t]] + alpha[s]*(1-reward[s,t]-q[nchoice[s,t]]);
          

        }
        
      }
    }
  }
generated quantities {
  real log_lik[nSubjects];
  int y_pred[nSubjects, nTrials];
  y_pred = rep_array(-999,nSubjects ,nTrials);
  
  for (s in 1:nSubjects){
    vector[3] q;
    q = initv;
    log_lik[s]=0;
    vector[2] stim;
    for (t in 1:nTrials){
      stim[1]=q[choice[s,t]];
      stim[2]=q[nchoice[s,t]];
      
      log_lik[s]=log_lik[s]+categorical_logit_lpmf(1 |tau[s]*stim);
      y_pred[s,t] = categorical_logit_rng( tau[s] * stim ); 

      if (t < nTrials){

        q[choice[s,t]] = q[choice[s,t]] + alpha[s]*(reward[s,t]+
        gamma[s]*fmax(q[choice[s,t+1]],q[nchoice[s,t+1]])-q[choice[s,t]]);
          
        q[nchoice[s,t]] = q[nchoice[s,t]] + alpha[s]*(1-reward[s,t]+
        gamma[s]*fmax(q[choice[s,t+1]],q[nchoice[s,t+1]])-q[nchoice[s,t]]);
        
      } else {

        q[choice[s,t]] = q[choice[s,t]] + alpha[s]*(reward[s,t]-q[choice[s,t]]);
      
        q[nchoice[s,t]] = q[nchoice[s,t]] + alpha[s]*(1-reward[s,t]-q[nchoice[s,t]]);
          
        }
       }
      }
  }

don’t know what went wrong.

Can you show Stan code for the model that yields approximately correct inference and then the code for the hierarchical model with comments indicating where the changes are?

This is my model for no pooling, the difference being that the parameter values for the 10 subjects are independent.In the hierarchical model I added the distribution of individual parameters and implemented non-centered (See section of transformed parameters). The true values for these 10 subjects are alpha=c(0.4283638 0.4581328 0.4358208 0.3743409 0.4110630 0.3013664 0.4923545 0.3451772 0.4093865 0.4410609). gamma=c(0.3974123 0.3765190 0.3807021 0.4300113 0.4758382 0.4151602 0.3841490 0.3743457 0.3740947 0.4368509).The stan code model is as follows:

data {
  int<lower=1> nTrials;
  int<lower=1> nSubjects;
  int<lower=1,upper=3> nchoice[nSubjects,nTrials];
  int<lower=1,upper=3> choice[nSubjects,nTrials];
  int<lower=0,upper=1> reward[nSubjects,nTrials];
  }
transformed data {
  vector[3] initv;  // initial values for V
  initv = rep_vector(0.0, 3);
  }
parameters {

  
  vector <lower=0,upper=1>[nSubjects] alpha;
  vector <lower=0,upper=1>[nSubjects] gamma;
  vector <lower=0,upper=3>[nSubjects] tau;

  }




model {

  
  for (s in 1:nSubjects){
    vector[3] q;
    q = initv;
    vector[2] stim;
    for (t in 1:nTrials){
      stim[1]=q[choice[s,t]];
      stim[2]=q[nchoice[s,t]];
      1 ~ categorical_logit(tau[s]*stim);

      
      if (t < nTrials){
          q[choice[s,t]] = q[choice[s,t]] + alpha[s]*(reward[s,t]+
          gamma[s]*fmax(q[choice[s,t+1]],q[nchoice[s,t+1]])-q[choice[s,t]]);
          q[nchoice[s,t]] = q[nchoice[s,t]] + alpha[s]*(1-reward[s,t]+
          gamma[s]*fmax(q[choice[s,t+1]],q[nchoice[s,t+1]])-q[nchoice[s,t]]);
          
      } else {

          q[choice[s,t]] = q[choice[s,t]] + alpha[s]*(reward[s,t]-q[choice[s,t]]);
          q[nchoice[s,t]] = q[nchoice[s,t]] + alpha[s]*(1-reward[s,t]-q[nchoice[s,t]]);
          
        }
      }
    }
  }
generated quantities {
  real log_lik[nSubjects];
  int y_pred[nSubjects, nTrials];
  y_pred = rep_array(-999,nSubjects ,nTrials);
  
  for (s in 1:nSubjects){
    vector[3] q;
    q = initv;
    log_lik[s]=0;
    vector[2] stim;
    for (t in 1:nTrials){
      stim[1]=q[choice[s,t]];
      stim[2]=q[nchoice[s,t]];
      
      log_lik[s]=log_lik[s]+categorical_logit_lpmf(1|tau[s]*stim);
      y_pred[s,t] = categorical_logit_rng( tau[s] * stim ); 

      if (t < nTrials){
          q[choice[s,t]] = q[choice[s,t]] + alpha[s]*(reward[s,t]+
          gamma[s]*fmax(q[choice[s,t+1]],q[nchoice[s,t+1]])-q[choice[s,t]]);
          q[nchoice[s,t]] = q[nchoice[s,t]] + alpha[s]*(1-reward[s,t]+
          gamma[s]*fmax(q[choice[s,t+1]],q[nchoice[s,t+1]])-q[nchoice[s,t]]);

      } else {

          q[choice[s,t]] = q[choice[s,t]] + alpha[s]*(reward[s,t]-q[choice[s,t]]);
          
          q[nchoice[s,t]] = q[nchoice[s,t]] + alpha[s]*(1-reward[s,t]-q[nchoice[s,t]]);

       }
      }
  }
}

The model estimates the
alpha=c(0.4640870 0.4289302 0.3975612 0.3077883 0.4066003 0.4019929 0.5971809 0.5489378 0.5638818 0.4368755).
gamma=c(0.3358349 0.3548909 0.4872146 0.4788712 0.5287833 0.3814734 0.3771508 0.2378647 0.5590828 0.2509559) .
In fact the estimation bias for the true values of some of the subjects’ parameters is also relatively large, but the most outrageous is when using the hierarchical model, where the estimates for the group level parameters are very small