One high K pareto value

AriadnaGVergara · November 24, 2023, 12:59pm

Dear community,

I have a hierarchical DDM model with the following parameters:

One drift rate per stimulus, subject and exp condition
One boundary separation per subject and exp condition
One starting point per subject and exp condition
One non-decision time per subject and exp condition

This is the only version of this model that doesn’t give any convergency problems and that has a good parameter recovery.
I wanted to do cross-validation to ensure the model was not overfit, and used the LOO package, however, I’m not completely sure what to make of the output:

My questions are:

Does that one high value for one observation indicate I can’t trust my posteriors?
Where do I go from here
Are there other better ways to run cross validation in rStan taking into account I can’t do PPC because wiener_rng is not implemented yet?
Is there anything in the code that shows a gross mistake?

Here’s the model:


data {
  int<lower=1> N;  // Number of Trials
  int<lower=1> S; // Number of Subjects
  vector[N] RT;  // response times
  int<lower=0,upper=1> response[N];  // choice made in that trial (1 for high reward)
  int<lower=1> K;  //  Number of Stimuli
  int<lower=1> C;  // number of conditions
  matrix[C, S] min_rt;     // upper bound for tau
  int<lower=1> subject[N]; // Subject in that trial
  int<lower=1> stimulus[N]; // Stimulus in that trial
  int<lower=1> condition [N]; // Condition in that trial
}

transformed data{ //to allow group comparison
  int CONDITION_VEHICLE = 1;
  int CONDITION_DRUG = 2;
}


parameters {
  
  array[C, S, K] real delta; // Drift rate per condition, subject, and stimulus
  matrix[C, K] d_mu;   // group level mean drift rate 
  matrix<lower=0>[C, K] d_sig; // variation of drift rates on the group level
  
  matrix<lower=0>[C, S] alpha;  // Distance betwen boundaries, per condition and subject
  vector<lower=0>[C] a_mu; // group level mean distance between boundaries
  vector<lower=0>[C] a_sig; // variation of distance between boundaries on the group level
  
  matrix<lower=0, upper=1>[C, S] beta;  // Starting bias
  vector<lower=0>[C] b_mu; // group level mean starting bias
  vector<lower=0>[C] b_sig; // variation of starting bias on the group level
  
  matrix<lower=0, upper=1>[C, S] tau_raw; // non decision time
  
}


transformed parameters{
  matrix[C, S] tau = tau_raw .* min_rt; //transformation of tau into range
}


model {
  
  for (c in 1:C){
    for(s in 1:S){
      tau_raw[c, s] ~ beta(1, 1); //tau prior
    }}
  
  for (c in 1:C) {
    for(s in 1:S){
      for(k in 1:K){
        delta[c, s, k] ~ normal(d_mu[c,k],d_sig[c,k]); //delta prior
      }
    }}
  
  for (c in 1:C){
    for(s in 1:S){
      alpha[c, s] ~ normal(a_mu[c],a_sig[c])T[0,]; //alpha prior
    }}
    
  for (c in 1:C){
    for(s in 1:S){
      beta[c, s] ~ normal(b_mu[c],b_sig[c])T[0, 1]; //beta prior
    }}
    
  for (c in 1:C){ //delta mean and variation priors
    for(k in 1:K){
      d_mu[c, k] ~ normal(0, 1);
      d_sig[c, k] ~ inv_gamma(4, 3);
    }}
    
  
  for (c in 1:C) { //alpha mean and variation priors
    a_mu[c] ~ inv_gamma(4, 3);
    a_sig[c] ~ inv_gamma(4, 3);
  }
  
  for (c in 1:C) { //beta mean and variation priors
    b_mu[c] ~ inv_gamma(4, 3);
    b_sig[c] ~ inv_gamma(4, 3);
  }
  
  for (t in 1:N) {
    int c = condition[t]; //condition in that trial
    int s = subject[t]; //subject in that trial
    int k = stimulus [t]; //stimulus in that trial
    
    if(response[t]==1){
      
      RT[t] ~ wiener(
        alpha[c, s],
        tau[c, s],
        beta[c, s],
        delta[c, s, k]
      );
      
      }else{
        RT[t] ~ wiener(
        alpha[c, s],
        tau[c, s],
        1-beta[c, s],
        -delta[c, s, k]
      );
      
      }
  }
}

generated quantities{
  
  array[C, K] real group_condition_delta;
  
  for (c in 1:C) {
    for (k in 1:K) {
      group_condition_delta[c, k] = d_mu[c, k];
    }
  }
  
  array[K] real drug_delta_effect; 
  for (k in 1:K){
    drug_delta_effect[k] = group_condition_delta[CONDITION_VEHICLE, k]- group_condition_delta[CONDITION_DRUG, k];
  }
  
  
  //Log likelihood calculation
  
  vector[N] log_lik;
  
  for (n in 1:N) {
      int c= condition[n];
      int s = subject[n];
      int k = stimulus[n];
    
      if(response[n]==1){
        log_lik[n] = wiener_lpdf(RT[n] | alpha[c, s], tau[c, s], beta[c, s], delta[c, s, k]);
     } else {
        log_lik[n] = wiener_lpdf(RT[n] | alpha[c, s], tau[c, s], 1-beta[c, s], -delta[c, s, k]);
    }
  }
}

I am aware that truncated distributions and matrices aren’t exactly best practice, but as said before, this is the only version of the model that doesn’t give any convergence problems.

Thank you for your help!!

andrjohns · November 24, 2023, 10:14pm

Have a look at using LOO with moment-matching: Avoiding model refits in leave-one-out cross-validation with moment matching

AriadnaGVergara · November 27, 2023, 12:54pm

Thank you for your response! I have tried that, but the result is the same:

I’m guessing that means I shouldn’t trust my posterior because there’s a value that’s too influential?

I’m not too sure where to go from here, all the other diagnostics look good, and so does parameter recovery.

Thank you again for your time.

avehtari · November 29, 2023, 8:56am

No. That one high k value indicates that if you leave out the corresponding observation, the posterior is changing so much that (Pareto smoothed) importance sampling may have big error for the corresponding pointwise LOO-CV elpd. High influence of that observation may be due to that observation being data entry error or just by chance unlikely observation, or due to model being flexible (e.g. in a hierarchical model the observation is only observation in a group) or model being misspecified. In all these cases, we may assume posterior inference may have been perfect and the posterior can be trusted for parameter inference.

You could check that one observation whether it has sensible values.

That high khat value indicates that the pointwise elpd_loo for that observation may have high error which could affect model comparison if the models have similar predictive performances, but based on the available information there doesn’t seem to be bad overfitting.

Based on this, you could have many parameters (more than CSK), but as p_loo is about 26 and you have 646 observations, I assume CSK is not big, or priors are good for not favoring overfitting.

You could use the posterior draws from Stan and use RWiener in Stan to sample from the predictive distribution.

Topic		Replies	Views
How can I solve the problem of high Pareto-k values? Modeling	3	1390	June 26, 2023
High Pareto k̂ values only for some datapoints for some subjects Modeling loo	6	242	September 2, 2025
Reinforcement learning model Modeling loo , cognitive-science	18	5440	May 18, 2019
Model Comparison when high Pareto k? General rstan , loo	6	326	July 21, 2025
Interpret pareto k diagnostic Modeling rstan , fitting-issues , loo	3	2090	August 3, 2023

One high K pareto value

Related topics