Is there a way to do this in ggplot though? so I can build on all the nice facilities there? (for example I might want to calculate a function of 3 or 4 parameters combined, or over-plot densities with different colors or whatever
It can be done that way, but we are trying to make sure that bayesplot has sufficiently functionality. Can you elaborate what you don’t like about bayesplot::mcmc_hist(samps, regex_pars = "^bar") + ggplot2::whatever?
Also I have thousands of parameters. I probably want to take a random sample of indexes and then plot just that random sample. I also might want to overplot 2 or 3 different geom_densities with different colors for each index… stuff like that.
I don’t know if you are still looking for a clean way to do this sort of thing, but I’ve been working on a package aimed at exactly this kind of custom sample-munging-and-plotting problem: https://github.com/mjskay/tidybayes
It can extract common indices across multiple variables, among other things. It turns everything into a tidy data frame, which makes it easy to make custom ggplots. What you described would be something like this using that package:
m %>% # m is a stan fit or a matrix of samples
spread_samples(a[i], b[i], c[i]) %>%
mutate(x = exp(a + b^2/2)/c) %>%
ggplot(aes(x = x, fill = i, group = i)) +
geom_histogram()
(or however you wanted to set up your plot to distinguish samples at different indices)
The beauty of transforming samples into tidy data frames is that you can use vanilla ggplot to make most of the plots you need. This also makes sampling random indices straightforward using filter() from the tidyverse:
m %>%
spread_samples(a[i], b[i], c[i]) %>%
filter(i %in% sample(i, 5)) %>%
mutate(x = exp(a + b^2/2)/c) %>%
ggplot(aes(x = x, fill = i, group = i)) +
geom_histogram()
(N.B. I wrote but did not test these examples)
If the package is helpful to you I’d love to hear feedback! You can install it from github currently and I am planning a CRAN submission soon.
I tried ggmcmc awhile back and found that it didn’t quite do everything I wanted, hence tidybayes. :) It should work with rstanarm as well, though I haven’t tested it extensively as I have been using brm more than rstanarm lately. I wrote the base support for tidybayes to work with rstan, rstanarm, runjags, brms, MCMCglmm, and coda, and extending it to support other model types for the most part just involves giving it an implementation of one function (as_sample_tibble).
I have been mixing plain ggplot2, bayesplot, ggmcmc and my own implementations, as none of the approaches are really convenient in time series / state space settings. Basically in Stan output one would to need to somehow group together x[1,1], x[1,2] etc where the column index is time, or in more general setting one (I) could have a three dimensional array where say the first index is the state index, second is time and third is iteration. I have to check tidybayes as well. I am willing to contribute something relating to this to one of the packages but haven’t really figured out what would be the best base to start.
I’d love to get some better stuff for time series into our bayesplot package and always happy to get contributions! If anyone is interested in contributing I’m happy to discuss and provide any help you need (e.g. working with the package internals, or whatever).
tidybayes looks cool too! In addition to tidybayes allowing easier custom
plots in ggplot, it would be nice to be able to also pass tidy data
prepared by tidybayes into bayesplot plotting functions. I’m not sure if
that works as is but we should make that possible.
@helske I’d love to hear if tidybayes helps with your time series analyses. I think tidybayes::spread_samples might help; it can generate a tidy data frame based on parameters with an arbitrary number of indices. For example, spread_samples(m, x[i,j,k]) will give you a data frame with columns .chain, .iteration, x, i, j, and k, where the value of x on a given row is the value of the parameter x[i,j,k] on the given .chain and .iteration.
This also works on multiple variables with shared indices, and indices are not required to be of the same number for each variable. E.g., spread_samples(m, x[i,j], z[i]) will give a data frame with columns .chain, .iteration, x, i, j, and z, where the value of x on a given row is the value of the parameter x[i,j] on the given .chain and .iteration, and z is the value of z[i] on the given .chain and .iteration (so values of z will be repeated across rows with the same i, .chain, and .iteration but different j).
@jonah I love the idea of being able to pass tidy data back into bayesplot! Does bayesplot mostly take a matrix of samples? If so, the easiest approach might be for me to make a function to invert the spread_samples and gather_samples functions in tidybayes to turn tidy data frames back into matrices (or data frames with x[i]-ish names). I could imagine a nice workflow like:
m %>%
spread_samples(...) %>%
#some transformations of tidy samples %>%
unspread_samples(...) %>%
bayesplot::some_function(...)