One high K pareto value

Dear community,

I have a hierarchical DDM model with the following parameters:

  • One drift rate per stimulus, subject and exp condition
  • One boundary separation per subject and exp condition
  • One starting point per subject and exp condition
  • One non-decision time per subject and exp condition

This is the only version of this model that doesn’t give any convergency problems and that has a good parameter recovery.
I wanted to do cross-validation to ensure the model was not overfit, and used the LOO package, however, I’m not completely sure what to make of the output:

image

My questions are:

  • Does that one high value for one observation indicate I can’t trust my posteriors?
  • Where do I go from here
  • Are there other better ways to run cross validation in rStan taking into account I can’t do PPC because wiener_rng is not implemented yet?
  • Is there anything in the code that shows a gross mistake?

Here’s the model:


data {
  int<lower=1> N;  // Number of Trials
  int<lower=1> S; // Number of Subjects
  vector[N] RT;  // response times
  int<lower=0,upper=1> response[N];  // choice made in that trial (1 for high reward)
  int<lower=1> K;  //  Number of Stimuli
  int<lower=1> C;  // number of conditions
  matrix[C, S] min_rt;     // upper bound for tau
  int<lower=1> subject[N]; // Subject in that trial
  int<lower=1> stimulus[N]; // Stimulus in that trial
  int<lower=1> condition [N]; // Condition in that trial
}

transformed data{ //to allow group comparison
  int CONDITION_VEHICLE = 1;
  int CONDITION_DRUG = 2;
}


parameters {
  
  array[C, S, K] real delta; // Drift rate per condition, subject, and stimulus
  matrix[C, K] d_mu;   // group level mean drift rate 
  matrix<lower=0>[C, K] d_sig; // variation of drift rates on the group level
  
  matrix<lower=0>[C, S] alpha;  // Distance betwen boundaries, per condition and subject
  vector<lower=0>[C] a_mu; // group level mean distance between boundaries
  vector<lower=0>[C] a_sig; // variation of distance between boundaries on the group level
  
  matrix<lower=0, upper=1>[C, S] beta;  // Starting bias
  vector<lower=0>[C] b_mu; // group level mean starting bias
  vector<lower=0>[C] b_sig; // variation of starting bias on the group level
  
  matrix<lower=0, upper=1>[C, S] tau_raw; // non decision time
  
}


transformed parameters{
  matrix[C, S] tau = tau_raw .* min_rt; //transformation of tau into range
}


model {
  
  for (c in 1:C){
    for(s in 1:S){
      tau_raw[c, s] ~ beta(1, 1); //tau prior
    }}
  
  for (c in 1:C) {
    for(s in 1:S){
      for(k in 1:K){
        delta[c, s, k] ~ normal(d_mu[c,k],d_sig[c,k]); //delta prior
      }
    }}
  
  for (c in 1:C){
    for(s in 1:S){
      alpha[c, s] ~ normal(a_mu[c],a_sig[c])T[0,]; //alpha prior
    }}
    
  for (c in 1:C){
    for(s in 1:S){
      beta[c, s] ~ normal(b_mu[c],b_sig[c])T[0, 1]; //beta prior
    }}
    
  for (c in 1:C){ //delta mean and variation priors
    for(k in 1:K){
      d_mu[c, k] ~ normal(0, 1);
      d_sig[c, k] ~ inv_gamma(4, 3);
    }}
    
  
  for (c in 1:C) { //alpha mean and variation priors
    a_mu[c] ~ inv_gamma(4, 3);
    a_sig[c] ~ inv_gamma(4, 3);
  }
  
  for (c in 1:C) { //beta mean and variation priors
    b_mu[c] ~ inv_gamma(4, 3);
    b_sig[c] ~ inv_gamma(4, 3);
  }
  
  for (t in 1:N) {
    int c = condition[t]; //condition in that trial
    int s = subject[t]; //subject in that trial
    int k = stimulus [t]; //stimulus in that trial
    
    if(response[t]==1){
      
      RT[t] ~ wiener(
        alpha[c, s],
        tau[c, s],
        beta[c, s],
        delta[c, s, k]
      );
      
      }else{
        RT[t] ~ wiener(
        alpha[c, s],
        tau[c, s],
        1-beta[c, s],
        -delta[c, s, k]
      );
      
      }
  }
}

generated quantities{
  
  array[C, K] real group_condition_delta;
  
  for (c in 1:C) {
    for (k in 1:K) {
      group_condition_delta[c, k] = d_mu[c, k];
    }
  }
  
  array[K] real drug_delta_effect; 
  for (k in 1:K){
    drug_delta_effect[k] = group_condition_delta[CONDITION_VEHICLE, k]- group_condition_delta[CONDITION_DRUG, k];
  }
  
  
  //Log likelihood calculation
  
  vector[N] log_lik;
  
  for (n in 1:N) {
      int c= condition[n];
      int s = subject[n];
      int k = stimulus[n];
    
      if(response[n]==1){
        log_lik[n] = wiener_lpdf(RT[n] | alpha[c, s], tau[c, s], beta[c, s], delta[c, s, k]);
     } else {
        log_lik[n] = wiener_lpdf(RT[n] | alpha[c, s], tau[c, s], 1-beta[c, s], -delta[c, s, k]);
    }
  }
}

I am aware that truncated distributions and matrices aren’t exactly best practice, but as said before, this is the only version of the model that doesn’t give any convergence problems.

Thank you for your help!!

Have a look at using LOO with moment-matching: Avoiding model refits in leave-one-out cross-validation with moment matching

Thank you for your response! I have tried that, but the result is the same:
image

I’m guessing that means I shouldn’t trust my posterior because there’s a value that’s too influential?

I’m not too sure where to go from here, all the other diagnostics look good, and so does parameter recovery.

Thank you again for your time.

No. That one high k value indicates that if you leave out the corresponding observation, the posterior is changing so much that (Pareto smoothed) importance sampling may have big error for the corresponding pointwise LOO-CV elpd. High influence of that observation may be due to that observation being data entry error or just by chance unlikely observation, or due to model being flexible (e.g. in a hierarchical model the observation is only observation in a group) or model being misspecified. In all these cases, we may assume posterior inference may have been perfect and the posterior can be trusted for parameter inference.

You could check that one observation whether it has sensible values.

That high khat value indicates that the pointwise elpd_loo for that observation may have high error which could affect model comparison if the models have similar predictive performances, but based on the available information there doesn’t seem to be bad overfitting.

Based on this, you could have many parameters (more than CSK), but as p_loo is about 26 and you have 646 observations, I assume CSK is not big, or priors are good for not favoring overfitting.

You could use the posterior draws from Stan and use RWiener in Stan to sample from the predictive distribution.

1 Like