Shortest Posterior/HPD Intervals for Highly Skewed Distributions

I am working with the following model:

model {
    y ~ binomial(N, p);
    p ~ beta(1, 1);
}

where p lies in [0, 1]. Often the MLE is very near 0 or very near 1. Furthermore, the estimates are often noisy, so their SEs are large and CI/CrIs wide. Here:

@andrewgelman nicely discusses approaches for computing HPD/SPIn, poiinting to his 2015 paper (which he no longer advocates, it seems). The SPIn R package has been removed from CRAN, but there is the HPDInterval package, however, the latter does not allow interval endpoints to be specified.

Right now, I compute central intervals (as is the default in Stan), but strongly believe HPD/SPIn would work better given my explanation above. Posterior densities in my case are highly skewed and estimates can attain values of 0 or 1, even with high sample sizes.

Has any progress been made on HPD/SPIn lately? Is @andrewgelman’s hacky implementation in the above post the best we have at this point? Was his code the one used in Gelman and Carpenter (2020)?

No, for highest density interval, McElreath’s rethinking package has a function for HPDI:

rethinking::HPDI

and there’s another one in

bayestestR::hdi

1 Like

Awhile back I started testing various implementations of three classes of interval estimators: quantile intervals, shortest intervals, and highest-density intervals (the last of which may be discontinuous). My goal was to update the corresponding implementations in ggdist. Still need to finish it (and I don’t expect the code to be that readable in its current state), but you can look if you want: GitHub - mjskay/interval-estimators

I don’t recall everything I had found in my initial explorations, except for a few salient things:

  1. For highest-density intervals, the implementation in distributional::hdr was best, and ggdist::hdi is now based on a modification of that implementation that uses ggdist::density_bounded to support bounded distributions. HDInterval::hdi was surprisingly bad (I think that implementation is simply incorrect).
  2. For shortest intervals, to my surprise SPIn, at least the implementation of it I found in bayestestr::spi, was beat by simpler methods like ggdist::hdci or coda::HDInterval (I believe the latter is also what rethinking uses).

I should revisit this sometime, finalize the analysis and write it up.

1 Like

@mjskay This looks promising and ggdist looks pretty cool too - I wasn’t previously aware of it!

I’m going to try running your implementation of the intervals, as well as @andrewgelman’s code snippet in the thread I link to.

1 Like