I was looking through the excellent “Fitting the Cauchy distribution” case study by @betanalpha (https://betanalpha.github.io/assets/case_studies/fitting_the_cauchy.html), and I had a conceptual question I was hoping somebody could clear up for me.

In the case study, he uses “tail k_hat” diagnostics to verify that the effective sample sizes for the Cauchy parameters aren’t well-defined. The only time I’ve ever seen k_hat mentioned in this context is in Pareto-smoothed Importance Sampling, as in this paper: https://arxiv.org/pdf/1507.02646.pdf

As far as I can tell there’s no importance sampling of any kind in the case study, but I assume the principle behind the use of k_hat is roughly the same: we use Generalized Pareto distributions to approximate (the tails of) the posterior, and the values of the shape parameters provide estimates on the number of posterior moments that exist. Is that correct?

If so, wouldn’t we expect the k_hat’s to be above 1 for the Cauchy distribution? Presumably (by analogy with the discussion in the PSIS paper), k_hats between 0.5 and 1 imply infinite variances but finite means. Similarly, I tried fitting a student_t(2, 0, 1) and got no k_hat warnings - even though the second moment of this distribution is certainly infinite. Am I missing something in the interpretation here?