Large Pareto K Values in Cognitive Models

Hello everyone,

I have a question about the reliability of pareto-K values in a hierarchical Wiener diffusion model I implemented in brms: is it permissible to allow, say <1%, of data points to have a high pareto-K value given that cognitive modeling is used to outlier/influential RTs?

One idea I had to explain the large pareto-K values was that the Wiener diffusion model itself was not a good model of the data (and thus the model is misspecified). However, I would like to use the Wiener diffusion model because of the theoretical context behind it.

If this is not permissible, would it be more ideal to use heuristic arguments to justify my model design? K-fold CV is not an option, since one run of the DDM took nearly 7 hours to run. Finally, I am confused on calculating the number of nominal parameters (to compare to p_loo) for a hierarchical Wiener diffusion model. Below are the results of my loo-cv as well as model specification.

Computed from 8000 by 9376 log-likelihood matrix.

         Estimate    SE
elpd_loo  -4522.9 130.0
p_loo       649.3  21.0
looic      9045.9 260.1
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.7, 1.6]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     9332  99.5%   286     
   (0.7, 1]   (bad)        35   0.4%   <NA>    
   (1, Inf)   (very bad)    9   0.1%   <NA>    
See help('pareto-k-diagnostic') for details.
rt | dec(resp) ~ population * trial_type * block_type + (1 + trial_type + block_type | participant_id) 
bs ~ population * trial_type * block_type + (1 + trial_type + block_type | participant_id)
ndt ~ population + (1 | participant_id)
bias ~ population * trial_type + (1 + trial_type | participant_id)

Thank you in advance.

I think that posterior predictive checks would be more helpful for deciding whether the Wiener model is a good model of the data. In my experience, the large Pareto k can be related to the log-likelihood that is being used: I believe (but am not sure) that the brms log-likelihood is conditioning on all the random effects in the model, i.e., each person has 8 random effects (if I am counting correctly), which are all being treated as parameters. I think this leads to the p_loo of 649, while your parameter count would probably be much less because you wouldn’t count the random effects as parameters.

An alternative loo computation would involve a likelihood that marginalizes out the random effects. But this will probably involve some tricky integral approximations for this model. I worked on this problem for simpler, multivariate normal models in the paper below, which provides some more background on the issue (where “latent variables” are similar to “random effects”):

Based on the information provided, it is possible that the high Pareto-k values are due to a flexible model, so that for some observations when they are removed the posterior changes a lot even if the model would be well specified. You should look at the specific observations with the high Pareto-k values, and use your domain expertise to assess whether these observations are possible “outliers” and thus not well explained by the model.

You can try improving the LOO computation with moment matching LOO which is supported by brms, but given some very high Pareto-k values it is possible that it’s not able to fix the issue.

Marginalizing the varying intercepts (aka “random effects”) would help, but brms does not yet support it automatically, and you would need to implement the integration yourself. Roaches case study shows an example how to do it in Stan code (but you could do it also in R, like brms is usually doing).