How to handle large number of categories : Posterior Predictive Check graphs

I am working with the dataset that consists of 10,000 instances or rows. I am currently using bayesplot package for graphical posterior predictive checking of categorical variable which consists of 200+ categories dummy encoded as 1,2,3…290. I have generated the graphs using ppc_bars as I wanted to summarise all data in one plot as follows.


  1. Is there a way I can make the graph more interpretable while keeping all the discrete categories?

  2. Can I use a log scale or make multiple subplots with each subplot representing different set of categories for example : plot 1 (categories 1-20), plot 2 (categories 21-40)…and then use facet_grid ?

I would really appreciate if anyone can please point me to resources to refer for solving this problem or possible suggestions to make this graph less busy and more intuitive.


Are there other packages other than bayesplot that can be used for plotting PPC that can possibly help create a less busy graph?

Probably switch to ggplot where you have more control over the graphs you’re making (make lines thin and points small, break things up into different graphs, etc).

You might try reordering these things by median of something, and also plotting y on a log scale.


Unless I’m badly mistaken, the bayesplot output is a regular ggplot object, so you should still be able to add additional scales, layers, geoms, etc to the baseline.

One possibility for subplots would be to write a function that maps categories to facet numbers, then facet with that.

split_categories = function(category) {
  recoded = dplyr::case_when( # Returns the RHS of the first true LHS
    category <= 20   ~ "A",
    category <= 60 ~ "B",
    category <= 200   ~ "C",
    TRUE ~ "D") # triggers everything that doesn't meet previous conditions

  forcats::fct_inorder(recoded) # ensure they'll be ordered correctly

## Add facets to your previous plot
pp_bayesplot  + # This was your previous plot