# How to summarise graphical Posterior predictive checks in one graph?

Background:
I am working with the dataset that consists of 10,000 instances or rows. I am currently using bayesplot package for graphical posterior predictive checking of `location` variable which consists of 5 location names dummy encoded as 1,2,3,4,5. I have generated the graphs using `ppc_dens_overlay` and `ppc_hist` shown below.

Question:
Is there a way where I can summarize the PPC graphs into one graph i.e. one plot with Means + error bars ?

(I don’t want to use density overlay to summarize the ppc for a categorical variable. To me using histogram is more intuitive for a categorical variable moreover, if I am running the `ppc_hist` it generates so many histograms. I know I can slice to just plot small number of graphs (shown in image attached) but having it summarised in one graph is I think would be better for my understanding)

I would really appreciate if anyone can please point me to resources to refer for solving this problem.

1 Like

@jonah do you know how to do this in Bayesplot?

1 Like

I might be misunderstanding, but does `ppc_bars()` do what you want? There are some examples in the Examples section here:

3 Likes

It looks like you’re treating your categories 1:5 as a histogram (which assumes an underlying continuous variable), not as a bar plot (which treats them as discrete).

One possible solution that comes to mind would be to make a ggplot that combined `geom_bar()` or `geom_col()` for your real data and `geom_dotplot()`, `geom_violin()` or something similar for the replicates.
For example:

``````library(tidyverse)
library(ggforce) # for geom_sina(), which creates the nice jitter plots
set.seed(143)
# define the real data
real_data = tibble(
category = as.character(1:5),
N = c(20, 17, 12, 31, 62)) %>%
mutate(y_obs = N/sum(N))

total_N = sum(real_data\$N)

# Simulate some y-reps that would come from a PP check
sim = rmultinom(100, total_N, real_data\$y_obs)
rownames(sim) = c(1:5)
sim_data = as_tibble(sim, rownames = "category") %>%
gather(-category, key = "rep", value = "N") %>%
mutate(y_rep = N / total_N)

# Join data
full_data = left_join(real_data %>% select(-N), sim_data %>% select(-N),
by = c("category"))
# plot
ggplot(full_data, aes(x = category)) +
geom_col(aes(y = y_obs), position = "identity",
color = "black", fill = "white") +
geom_sina(aes(y = y_rep),  alpha = .4, seed = 120) +
theme_bw() + theme(panel.grid = element_blank())

``````
5 Likes

thanks so much for sharing the code snippet. really appreciate the help.

thanks for outlining that making histogram means assumes an underlying continuous variable. I have `location` as a categorical variable so I think using bar plots and treating them as discrete would be more suitable