Attaching labels estimating proportion of posterior density within a fixed interval in the ggdist package

That was just a quick-and-dirty numerical integration to get the fraction of the total area under the density curve subtended by the ROPE range. It’s equivalent to:

\int\limits_{xmin}^{xmax} density(x)dx \approx \sum\limits_{xmin}^{xmax}density(x)\Delta x

Where xmin and xmax are the ROPE endpoints, density(x) is the posterior density of .epred, and \Delta x is mean(diff(x)), where I took the mean because I figured there might be small differences between each \Delta x interval due to numerical imprecision. But I probably should have done:

rope.dens = sum(c(NA, diff(x)) * .epred, na.rm=TRUE)

Or maybe I should have used the y-value at the midpoint between each x value. And of course there are much more accurate algorithms for more exacting situations.

Just as a check, we expect the total area under the density curve to be equal to 1.0 for the whole distribution:

options(digits=10)

dens.hist %>% 
  group_by(Species, SepalLength_cat) %>% 
  # Get midpoint density values
  mutate(dx = c(NA, diff(x)), 
         .epred.mid = zoo::rollmean(.epred, k=2, na.pad=TRUE, align="right")) %>% 
  summarise(total.density = sum(c(NA, diff(x)) * .epred, na.rm=TRUE),
            total.density2 = sum(mean(diff(x)) * .epred),
            total.density.midpt = sum(.epred.mid * dx, na.rm=TRUE)) %>% 
  as.data.frame()
     Species SepalLength_cat total.density total.density2 total.density.midpt
1     setosa     cat2 - cat1  0.9999992093   0.9999994093        0.9999992198
2     setosa     cat3 - cat1  0.9999993745   0.9999995802        0.9999993834
3     setosa     cat3 - cat2  0.9999988950   0.9999990747        0.9999988836
4 versicolor     cat2 - cat1  0.9999997022   0.9999998813        0.9999996892
5 versicolor     cat3 - cat1  0.9999994202   0.9999996072        0.9999994200
6 versicolor     cat3 - cat2  0.9999987074   0.9999989102        0.9999987074
7  virginica     cat2 - cat1  0.9999987483   0.9999989148        0.9999986161
8  virginica     cat3 - cat1  0.9999984518   0.9999989102        0.9999985982
9  virginica     cat3 - cat2  0.9999990026   0.9999992023        0.9999989849

When we limit the x-range to be within the ROPE, we get the fraction of the total density that’s within the ROPE:

dens.hist %>% 
  mutate(dx = c(NA, diff(x)), 
         .epred.mid = zoo::rollmean(.epred, k=2, na.pad=TRUE, align="right")) %>% 
  filter(x >= min(rope.rng), x <= max(rope.rng)) %>% 
  group_by(Species, SepalLength_cat) %>% 
  summarise(rope.density = sum(mean(diff(x)) * .epred),
            rope.density2 = sum(c(NA, diff(x)) * .epred, na.rm=TRUE),
            rope.density.midpt = sum(.epred.mid * dx, na.rm=TRUE)) 
  Species    SepalLength_cat rope.density rope.density2 rope.density.midpt
1 setosa     cat2 - cat1      0.000000512   0.000000312        0.000000256
2 setosa     cat3 - cat1      0.585         0.585              0.582      
3 setosa     cat3 - cat2      0.208         0.204              0.211      
4 versicolor cat2 - cat1      0.347         0.346              0.344      
5 versicolor cat3 - cat1      0.216         0.216              0.213      
6 versicolor cat3 - cat2      0.884         0.884              0.882      
7 virginica  cat2 - cat1      0.358         0.357              0.356      
8 virginica  cat3 - cat1      0.143         0.143              0.141      
9 virginica  cat3 - cat2      0.584         0.584              0.581 

You can see there are some small differences, so maybe the quickest fix is to just increase the number of points at which the density is calculated:

dens.hist2 = compCat %>% 
  group_by(Species, SepalLength_cat) %>% 
  group_split() %>% 
  map_df(~{
    # Use 4,096 density points instead of 512
    dens = density_unbounded(.x$.epred, adjust=adj, n=2^12)
    tibble(Species=.x$Species[1], SepalLength_cat=.x$SepalLength_cat[1],
           x=dens$x, .epred=dens$y)
  })
dens.hist2 %>% 
  mutate(dx = c(NA, diff(x)), 
         .epred.mid = zoo::rollmean(.epred, k=2, na.pad=TRUE, align="right")) %>% 
  filter(x >= min(rope.rng), x <= max(rope.rng)) %>% 
  group_by(Species, SepalLength_cat) %>% 
  summarise(rope.density = sum(mean(diff(x)) * .epred),
            rope.density2 = sum(c(NA, diff(x)) * .epred, na.rm=TRUE),
            rope.density.midpt = sum(.epred.mid * dx, na.rm=TRUE)) 
  Species    SepalLength_cat rope.density rope.density2 rope.density.midpt
1 setosa     cat2 - cat1      0.000000493   0.000000469        0.000000456
2 setosa     cat3 - cat1      0.587         0.587              0.587      
3 setosa     cat3 - cat2      0.207         0.207              0.207      
4 versicolor cat2 - cat1      0.348         0.348              0.348      
5 versicolor cat3 - cat1      0.214         0.214              0.214      
6 versicolor cat3 - cat2      0.883         0.883              0.883      
7 virginica  cat2 - cat1      0.360         0.360              0.360      
8 virginica  cat3 - cat1      0.143         0.143              0.143      
9 virginica  cat3 - cat2      0.582         0.582              0.581 
1 Like