Model failed to converge when using a mixture distribution combing Wiener and Uniform distribution

Hi all,

I am fitting a Hierarchical DDM using a mixture distribution by combing both Wiener and Uniform distributions. So the raw Stan script is from this threadhttps://discourse.mc-stan.org/t/unable-to-recover-mixture-between-wiener-diffusion-and-uniform-distribution/5612

Following is my adjusted Stan script and this script is syntactically correct:

functions { 
   real wiener_diffusion2_lpdf(real y, real drift, real boundary, 
                            real ndt, real bias, real lambda, 
                            int dec, real min_rt, real max_rt) {
  if (y < ndt) {
    return(log(lambda) + uniform_lpdf(y | min_rt, max_rt));
  } else {
    if (dec == 1) {
      return log_mix(lambda, 
        uniform_lpdf(y | min_rt, max_rt),
        wiener_lpdf(y | boundary, ndt, bias, drift)
      );
    } else {
      return log_mix(lambda, 
        uniform_lpdf(y | min_rt, max_rt),
        wiener_lpdf(y | boundary, ndt, 1 - bias, - drift)
      );
    }
  }
}
} 

data {
    int<lower=1> N_obs;       // Number of trial-level observations
    int<lower=1> nparts;      // Number of participants
    real y[N_obs];    // rt
    int<lower=1> participant[N_obs]; // participant index for each trial
    int<lower=0, upper=1> accuracy[N_obs]; // decisions (correct = 1, incorrect = 0)
    real<lower=0> minRT[nparts];           // minimum rt per participant
    real<lower=0> maxRT[nparts];           // maximum rt per participant
  
}

transformed data { 

} 

parameters {
    
    /* sigma paameter*/
    real<lower=0> deltasd;      // Between-participant variability in drift rate 
    real<lower=0> alphasd;      // Between-participant variability in boundary    
    real<lower=0> ressd;        // Between-participant variability in non-decision time
    real<lower=0> lamsd;        // Between-participant variability in lambda
    
    /* Hierarchical mu parameter*/                               
    real deltahier;  // Hierarchical drift rate
    real alphahier;             // Hierarchical boundary
    real reshier;               // Hierarchical Non-decision time
    real lamhier; // Hierarchical lambda
    
    /* participant-level main paameter*/
    vector[nparts] delta; // drift rate for each participant
    vector<lower=0>[nparts] alpha;         // Boundary for each participant
    vector<lower=0>[nparts] res;          // Non-decision time for each participant
    vector<lower=0, upper=1>[nparts] lam; // lambda for each participant
}

model {
    /* sigma paameter*/
    deltasd ~ gamma(1,1); 
    alphasd ~ gamma(1,1); 
    ressd ~ gamma(.1,1);    
    lamsd ~ gamma(.1,1);


    /* Hierarchical mu paameter*/                               

    deltahier ~ normal(2, 4); // drfit rate
    alphahier ~ normal(1, 2); // boundary 
    reshier ~ normal(.2,.4); // ndt
    lamhier ~ normal(.02, .01); //lambda
    
  
    /* participant-level main paameter*/
    for (p in 1:nparts) {
        
        delta[p] ~ normal(deltahier, deltasd); // drift rate
        alpha[p] ~ normal(alphahier, alphasd); // boundary
        res[p] ~ normal(reshier, ressd);      // ndt
        lam[p] ~ normal(lamhier, lamsd);      // lambda

    }
    
    // Wiener likelihood
    for (i in 1:N_obs) { 

        // Log density for DDM process
        target += wiener_diffusion2_lpdf(y[i] | delta[participant[i]], alpha[participant[i]], 
                            res[participant[i]], .5, lam[participant[i]], 
                            accuracy[i], minRT[participant[i]], maxRT[participant[i]]);
    } 
}

generated quantities { 
   vector[N_obs] log_lik; 
 
   // Wiener likelihood
    for (i in 1:N_obs) {
        // Log density for DDM process
         log_lik[i] = wiener_diffusion2_lpdf(y[i] | delta[participant[i]], alpha[participant[i]], 
                            res[participant[i]], .5, lam[participant[i]], 
                            accuracy[i], minRT[participant[i]], maxRT[participant[i]]);
    }
}

So I fitted this model with 4 chains, 1000 warmup and 2000 iteration/chain. But there are 4000 divergent transitions and the largerest Rhat value is like 60!!!.



Might anyone know what’s wrong? Any suggestions will be highly appreciated.