Attaching labels estimating proportion of posterior density within a fixed interval in the ggdist package

joels · May 12, 2025, 4:09pm

That was just a quick-and-dirty numerical integration to get the fraction of the total area under the density curve subtended by the ROPE range. It’s equivalent to:

\int\limits_{xmin}^{xmax} density(x)dx \approx \sum\limits_{xmin}^{xmax}density(x)\Delta x

Where xmin and xmax are the ROPE endpoints, density(x) is the posterior density of .epred, and \Delta x is mean(diff(x)), where I took the mean because I figured there might be small differences between each \Delta x interval due to numerical imprecision. But I probably should have done:

rope.dens = sum(c(NA, diff(x)) * .epred, na.rm=TRUE)

Or maybe I should have used the y-value at the midpoint between each x value. And of course there are much more accurate algorithms for more exacting situations.

Just as a check, we expect the total area under the density curve to be equal to 1.0 for the whole distribution:

options(digits=10)

dens.hist %>% 
  group_by(Species, SepalLength_cat) %>% 
  # Get midpoint density values
  mutate(dx = c(NA, diff(x)), 
         .epred.mid = zoo::rollmean(.epred, k=2, na.pad=TRUE, align="right")) %>% 
  summarise(total.density = sum(c(NA, diff(x)) * .epred, na.rm=TRUE),
            total.density2 = sum(mean(diff(x)) * .epred),
            total.density.midpt = sum(.epred.mid * dx, na.rm=TRUE)) %>% 
  as.data.frame()

     Species SepalLength_cat total.density total.density2 total.density.midpt
1     setosa     cat2 - cat1  0.9999992093   0.9999994093        0.9999992198
2     setosa     cat3 - cat1  0.9999993745   0.9999995802        0.9999993834
3     setosa     cat3 - cat2  0.9999988950   0.9999990747        0.9999988836
4 versicolor     cat2 - cat1  0.9999997022   0.9999998813        0.9999996892
5 versicolor     cat3 - cat1  0.9999994202   0.9999996072        0.9999994200
6 versicolor     cat3 - cat2  0.9999987074   0.9999989102        0.9999987074
7  virginica     cat2 - cat1  0.9999987483   0.9999989148        0.9999986161
8  virginica     cat3 - cat1  0.9999984518   0.9999989102        0.9999985982
9  virginica     cat3 - cat2  0.9999990026   0.9999992023        0.9999989849

When we limit the x-range to be within the ROPE, we get the fraction of the total density that’s within the ROPE:

dens.hist %>% 
  mutate(dx = c(NA, diff(x)), 
         .epred.mid = zoo::rollmean(.epred, k=2, na.pad=TRUE, align="right")) %>% 
  filter(x >= min(rope.rng), x <= max(rope.rng)) %>% 
  group_by(Species, SepalLength_cat) %>% 
  summarise(rope.density = sum(mean(diff(x)) * .epred),
            rope.density2 = sum(c(NA, diff(x)) * .epred, na.rm=TRUE),
            rope.density.midpt = sum(.epred.mid * dx, na.rm=TRUE))

  Species    SepalLength_cat rope.density rope.density2 rope.density.midpt
1 setosa     cat2 - cat1      0.000000512   0.000000312        0.000000256
2 setosa     cat3 - cat1      0.585         0.585              0.582      
3 setosa     cat3 - cat2      0.208         0.204              0.211      
4 versicolor cat2 - cat1      0.347         0.346              0.344      
5 versicolor cat3 - cat1      0.216         0.216              0.213      
6 versicolor cat3 - cat2      0.884         0.884              0.882      
7 virginica  cat2 - cat1      0.358         0.357              0.356      
8 virginica  cat3 - cat1      0.143         0.143              0.141      
9 virginica  cat3 - cat2      0.584         0.584              0.581

You can see there are some small differences, so maybe the quickest fix is to just increase the number of points at which the density is calculated:

dens.hist2 = compCat %>% 
  group_by(Species, SepalLength_cat) %>% 
  group_split() %>% 
  map_df(~{
    # Use 4,096 density points instead of 512
    dens = density_unbounded(.x$.epred, adjust=adj, n=2^12)
    tibble(Species=.x$Species[1], SepalLength_cat=.x$SepalLength_cat[1],
           x=dens$x, .epred=dens$y)
  })

dens.hist2 %>% 
  mutate(dx = c(NA, diff(x)), 
         .epred.mid = zoo::rollmean(.epred, k=2, na.pad=TRUE, align="right")) %>% 
  filter(x >= min(rope.rng), x <= max(rope.rng)) %>% 
  group_by(Species, SepalLength_cat) %>% 
  summarise(rope.density = sum(mean(diff(x)) * .epred),
            rope.density2 = sum(c(NA, diff(x)) * .epred, na.rm=TRUE),
            rope.density.midpt = sum(.epred.mid * dx, na.rm=TRUE))

  Species    SepalLength_cat rope.density rope.density2 rope.density.midpt
1 setosa     cat2 - cat1      0.000000493   0.000000469        0.000000456
2 setosa     cat3 - cat1      0.587         0.587              0.587      
3 setosa     cat3 - cat2      0.207         0.207              0.207      
4 versicolor cat2 - cat1      0.348         0.348              0.348      
5 versicolor cat3 - cat1      0.214         0.214              0.214      
6 versicolor cat3 - cat2      0.883         0.883              0.883      
7 virginica  cat2 - cat1      0.360         0.360              0.360      
8 virginica  cat3 - cat1      0.143         0.143              0.143      
9 virginica  cat3 - cat2      0.582         0.582              0.581

Topic		Replies	Views
Some Bayesplot/ggplot2 questions relating to plotting priors and posteriors in same panel General bayesplot , cmdstanr	5	2034	February 1, 2022
Overlaying prior curves to multiple posteriors within the same plot Modeling bayesplot	9	1036	June 25, 2023
Shortest posterior intervals Algorithms techniques	31	3027	August 12, 2020
Brms forest plot brms meta-analysis	14	3116	March 11, 2021
How to best summarize this evidence General	13	2686	February 5, 2021

Attaching labels estimating proportion of posterior density within a fixed interval in the ggdist package

Related topics