Background:
I am working with the dataset that consists of 10,000 instances or rows. I am currently using bayesplot package for graphical posterior predictive checking of categorical variable
which consists of 200+ categories dummy encoded as 1,2,3…290. I have generated the graphs using ppc_bars
as I wanted to summarise all data in one plot as follows.
Question
-
Is there a way I can make the graph more interpretable while keeping all the discrete categories?
-
Can I use a log scale or make multiple subplots with each subplot representing different set of categories for example : plot 1 (categories 1-20), plot 2 (categories 21-40)…and then use facet_grid ?
I would really appreciate if anyone can please point me to resources to refer for solving this problem or possible suggestions to make this graph less busy and more intuitive.
Are there other packages other than bayesplot that can be used for plotting PPC that can possibly help create a less busy graph?
Probably switch to ggplot where you have more control over the graphs you’re making (make lines thin and points small, break things up into different graphs, etc).
You might try reordering these things by median of something, and also plotting y on a log scale.
2 Likes
Unless I’m badly mistaken, the bayesplot output is a regular ggplot object, so you should still be able to add additional scales, layers, geoms, etc to the baseline.
One possibility for subplots would be to write a function that maps categories to facet numbers, then facet with that.
split_categories = function(category) {
recoded = dplyr::case_when( # Returns the RHS of the first true LHS
category <= 20 ~ "A",
category <= 60 ~ "B",
category <= 200 ~ "C",
TRUE ~ "D") # triggers everything that doesn't meet previous conditions
forcats::fct_inorder(recoded) # ensure they'll be ordered correctly
}
## Add facets to your previous plot
pp_bayesplot + # This was your previous plot
facet_wrap(~split_categories(your_category_variable))
4 Likes