Shortest Posterior/HPD Intervals for Highly Skewed Distributions

jphill01 · June 28, 2024, 12:23am

I am working with the following model:

model {
    y ~ binomial(N, p);
    p ~ beta(1, 1);
}

where p lies in [0, 1]. Often the MLE is very near 0 or very near 1. Furthermore, the estimates are often noisy, so their SEs are large and CI/CrIs wide. Here:

@andrewgelman nicely discusses approaches for computing HPD/SPIn, poiinting to his 2015 paper (which he no longer advocates, it seems). The SPIn R package has been removed from CRAN, but there is the HPDInterval package, however, the latter does not allow interval endpoints to be specified.

Right now, I compute central intervals (as is the default in Stan), but strongly believe HPD/SPIn would work better given my explanation above. Posterior densities in my case are highly skewed and estimates can attain values of 0 or 1, even with high sample sizes.

Has any progress been made on HPD/SPIn lately? Is @andrewgelman’s hacky implementation in the above post the best we have at this point? Was his code the one used in Gelman and Carpenter (2020)?

ssp3nc3r · June 28, 2024, 12:41am

No, for highest density interval, McElreath’s rethinking package has a function for HPDI:

rethinking::HPDI

and there’s another one in

bayestestR::hdi

mjskay · June 28, 2024, 5:56am

Awhile back I started testing various implementations of three classes of interval estimators: quantile intervals, shortest intervals, and highest-density intervals (the last of which may be discontinuous). My goal was to update the corresponding implementations in ggdist. Still need to finish it (and I don’t expect the code to be that readable in its current state), but you can look if you want: GitHub - mjskay/interval-estimators

I don’t recall everything I had found in my initial explorations, except for a few salient things:

For highest-density intervals, the implementation in distributional::hdr was best, and ggdist::hdi is now based on a modification of that implementation that uses ggdist::density_bounded to support bounded distributions. HDInterval::hdi was surprisingly bad (I think that implementation is simply incorrect).
For shortest intervals, to my surprise SPIn, at least the implementation of it I found in bayestestr::spi, was beat by simpler methods like ggdist::hdci or coda::HDInterval (I believe the latter is also what rethinking uses).

I should revisit this sometime, finalize the analysis and write it up.

jphill01 · June 28, 2024, 6:13am

@mjskay This looks promising and ggdist looks pretty cool too - I wasn’t previously aware of it!

I’m going to try running your implementation of the intervals, as well as @andrewgelman’s code snippet in the thread I link to.

Topic		Replies	Views
Quantiles vs HPDI from `rstan::summary()` RStan	13	2802	May 30, 2019
Divergences and Quantifying HPD intervals Modeling	7	1107	October 13, 2020
CmdStanR v0.7.0 released Announcements cmdstanr	15	1000	December 19, 2023
Rstanarm: Posterior standard deviations and credible intervals for odds ratios and probabilities Modeling	2	363	November 9, 2023
Release of CmdStan 2.36.0 Announcements	2	404	December 10, 2024

Shortest Posterior/HPD Intervals for Highly Skewed Distributions

Related topics