How to handle large number of categories : Posterior Predictive Check graphs

sam_learner · April 30, 2020, 12:20am

Background:
I am working with the dataset that consists of 10,000 instances or rows. I am currently using bayesplot package for graphical posterior predictive checking of categorical variable which consists of 200+ categories dummy encoded as 1,2,3…290. I have generated the graphs using ppc_bars as I wanted to summarise all data in one plot as follows.

Question

Is there a way I can make the graph more interpretable while keeping all the discrete categories?
Can I use a log scale or make multiple subplots with each subplot representing different set of categories for example : plot 1 (categories 1-20), plot 2 (categories 21-40)…and then use facet_grid ?

I would really appreciate if anyone can please point me to resources to refer for solving this problem or possible suggestions to make this graph less busy and more intuitive.

ppc

sam_learner · April 30, 2020, 2:34pm

Are there other packages other than bayesplot that can be used for plotting PPC that can possibly help create a less busy graph?

bbbales2 · April 30, 2020, 8:19pm

Probably switch to ggplot where you have more control over the graphs you’re making (make lines thin and points small, break things up into different graphs, etc).

You might try reordering these things by median of something, and also plotting y on a log scale.

Christopher-Peterson · April 30, 2020, 9:39pm

Unless I’m badly mistaken, the bayesplot output is a regular ggplot object, so you should still be able to add additional scales, layers, geoms, etc to the baseline.

One possibility for subplots would be to write a function that maps categories to facet numbers, then facet with that.

split_categories = function(category) {
  recoded = dplyr::case_when( # Returns the RHS of the first true LHS
    category <= 20   ~ "A",
    category <= 60 ~ "B",
    category <= 200   ~ "C",
    TRUE ~ "D") # triggers everything that doesn't meet previous conditions

  forcats::fct_inorder(recoded) # ensure they'll be ordered correctly
}

## Add facets to your previous plot
pp_bayesplot  + # This was your previous plot
  facet_wrap(~split_categories(your_category_variable))

Topic		Replies	Views
How to summarise graphical Posterior predictive checks in one graph? Modeling techniques , specification	4	910	April 23, 2020
Posterior predictive check with two grouping variables General bayesplot	1	648	November 11, 2021
Posterior predictive distributions for multilevel Poisson-GLM General bayesplot	2	645	April 6, 2021
Useful PPC visual checks for counts of diffent magnitudes? General bayesplot	3	55	August 14, 2024
PPC for ordered_logistic? General	3	683	May 23, 2019

How to handle large number of categories : Posterior Predictive Check graphs

Related topics