Interpret pareto k diagnostic

BenMerSci · August 1, 2023, 7:43pm

I am new to stan and made 3 different stan models to fit to the same dataset of 1380 observations respectively and ranked them using loo::loo_compare(), but I am struggling with some diagnostics.

For two models I get quite small percentage of Pareto k values that are not good (bad or very bad). I have read other posts on the matter and some documentation (LOO package glossary — loo-glossary • loo), but it seems very context/model specific from what I have seen/understood.

Can the models with bad/very bad Pareto k values still be used and compared to other models, or do they absolutely need to be changed?

All three models are basically the same model with increasing complexity (adding some parameters)

Model informations:
Model 1 (has 2 parameters)
model 1 loo outpout:
loo_1

Model 2 (has 157 parameters)
model 2 loo outpout:
loo_2

Model 3 (has 311 parameters)
model 3 loo output:
loo_3

Here is the stan code for model 3 as example:

data { 
  // Size integer
  int<lower = 1> n;
  int<lower = 1> n_unique; \\ 154 unique
  // Vector data
  vector[n] bf;
  vector[n] bp;
  vector[n] ap;
  int unique_id[n];
}

transformed data {
  vector[n] log_bf = log(bf);
  vector[n] log_bp = log(bp);
  vector[n] log_ap = log(ap);
}

parameters {
  real mu_alp;  \\ 1 parameter
  vector[n_unique] alp; \\ 154 unique alp for 154 parameters
  real<lower = 0> sd_alp; \\ 1 parameter
  real<lower = 0> sigma;  \\ 1 parameter
  vector[n_unique] h_j; \\ 154 unique h_j for 154 parameters
\\ total  of 311 parameters
}

model {
  vector[n] mu;
  vector[n] unique_factor;

  // Priors:
  mu_alp ~ normal(-5,2);
  alp ~ normal(mu_alp, sd_alp);
  sd_alp ~ exponential(3);
  sigma ~ exponential(5);
  h_j ~ normal(-4,2);

  // Computing 

   unique_factor = alp[unique_id] + log_ap;
   
   mu = unique_factor + log_bp - log1p_exp(h_j[unique_id] + unique_factor + log_bp);

  log_bf ~ normal(mu, sigma);

}

generated quantities {
    vector[n] unique_factor;
    vector[n] log_bf_hat;
    vector[n] log_lik;
    vector[n] y_rep;
    real<lower = 0, upper = 1> Rsq_3;


  unique_factor = alp[unique_id] + log_ap;

  log_bf_hat = unique_factor + log_bp - log1p_exp(h_j[unique_id] + unique_factor + log_bp);
      
  for (i in 1:n) {

    log_lik[i] = normal_lpdf(log_bf[i] | log_bf_hat[i], sigma);

    y_rep[i] = normal_rng(log_bf_hat[i], sigma);

  }

  Rsq_3 = variance(log_bf_hat) / (variance(log_bf_hat) + square(sigma));

}

Bob_Carpenter · August 1, 2023, 9:38pm

@avehtari just posted his Cross-validation FAQ, which contains a section, what do I do if I have too many high Pareto-k values.

The problem is that the approximation behind LOO is poor in the case where the Pareto-k values are high. The FAQ gives some possible actions, but you can always just do the cross-validation manually (though leave one out is usually prohibitive).

BenMerSci · August 2, 2023, 12:41am

Thank you for your reply @Bob_Carpenter !

I had seen the recommendations on what to do when there are too many high Pareto k values in other documentation, and to look into p_loo for further information on the underlying problem. I had not see the loo moment matching though and am now looking into it!

Something else I have not found is how many high Pareto k values is too many. Is there any “rule of thumb”, something related to the number of observation? Is only one high Pareto k value too much and model need reworking?

I can’t seem to find anything regarding actual threshold of high Pareto k values.

Thank you again!

avehtari · August 3, 2023, 10:20am

As often the answer is that it depends. As discussed in FAQ, you may get high Pareto k values also when the model is correct and then model doesn’t need reworking, but a different CV-computation may be needed (as in Roaches cross-validation demo). Also as high Pareto k values indicate overoptimism, and if a model with many high k values is much worse than another model, there is no need to improve the computation for the worse (e.g. the binomial model has many high k values, but it’s so much worse and rejected also by PPC, so that there is no need to improve CV-computation). When high k values are due to misspecified model/outliers, it depends on the use case how important it would be to improve the model, as sometimes the conclusions would not change with better model. If in doubt, then just use K-fold-CV.

Topic		Replies	Views
High Pareto-k values for the same observations across different models: Can I still use loo to compare these models? Modeling loo	2	583	November 5, 2018
Up-to-date advice on LOO and high diagnostic values Modeling techniques	7	340	May 23, 2024
Question about large Pareto k value Modeling loo	6	1131	September 13, 2022
Problems in model comparsion with loo-package for a self-written stan model with explanatory variables and hierarchy Modeling specification , loo	6	477	September 2, 2020
Some off pareto values, 99% ok - issue? General fitting-issues , loo , arviz	2	779	May 30, 2022

Interpret pareto k diagnostic

Related topics