Plotting Parameters using RStan and ggplot2?

I’d like to do something like

foo <- as.data.frame(samps)

ggplot(foo) + geom_histogram(aes(x=bar[33]))

to plot the 33rd element of parameter bar, but of course this doesn’t work because the angle brackets aren’t interpreted as a column name.

What’s the appropriate idiom here for looking at parameters that are defined as vectors in stan from within R?

bayesplot::mcmc_hist(samps, pars = "bar[33]")

Is there a way to do this in ggplot though? so I can build on all the nice facilities there? (for example I might want to calculate a function of 3 or 4 parameters combined, or over-plot densities with different colors or whatever

If you assign it to an object, you can do whatever ggplot2 things to it. Or just put a + afterward and go. Like

bayesplot::mcmc_hist(samps, pars = "bar[33]") + ggplot2::xlim(0, 10)

You need to use backticks:

ggplot(foo) + geom_histogram(aes(x=`bar[33]`))
2 Likes

Yes, backticks! thank you. I knew there had to be a way to do that sort of thing.

1 Like

Aha, also if you’re doing this programmatically and want to insert the index number via a program, you could do something like:

for(i in 1:10){
ggplot(data,aes_(x=as.name(sprintf(“bar[%d]”,i)))+geom_histogram()
}

as.name and aes_ are the keys to the castle there

2 Likes

It can be done that way, but we are trying to make sure that bayesplot has sufficiently functionality. Can you elaborate what you don’t like about bayesplot::mcmc_hist(samps, regex_pars = "^bar") + ggplot2::whatever?

I ultimately will want to do things like exp(a[33] + b[33]^2/2)/c[33] rather than just plotting individual params straight up.

Also I have thousands of parameters. I probably want to take a random sample of indexes and then plot just that random sample. I also might want to overplot 2 or 3 different geom_densities with different colors for each index… stuff like that.

I don’t know if you are still looking for a clean way to do this sort of thing, but I’ve been working on a package aimed at exactly this kind of custom sample-munging-and-plotting problem: https://github.com/mjskay/tidybayes

It can extract common indices across multiple variables, among other things. It turns everything into a tidy data frame, which makes it easy to make custom ggplots. What you described would be something like this using that package:

m %>%   # m is a stan fit or a matrix of samples
  spread_samples(a[i], b[i], c[i]) %>%
  mutate(x = exp(a + b^2/2)/c) %>%
  ggplot(aes(x = x, fill = i, group = i)) +
  geom_histogram()

(or however you wanted to set up your plot to distinguish samples at different indices)

The beauty of transforming samples into tidy data frames is that you can use vanilla ggplot to make most of the plots you need. This also makes sampling random indices straightforward using filter() from the tidyverse:

m %>%
  spread_samples(a[i], b[i], c[i]) %>%
  filter(i %in% sample(i, 5)) %>%
  mutate(x = exp(a + b^2/2)/c) %>%
  ggplot(aes(x = x, fill = i, group = i)) +
  geom_histogram()

(N.B. I wrote but did not test these examples)

If the package is helpful to you I’d love to hear feedback! You can install it from github currently and I am planning a CRAN submission soon.

2 Likes

tidybayes looks great! I might have to try it out. (I’ve already written my own personal tidying package for rstanarm models too.)

Just noting that ggmcmc is another one to watch in this space.

Cool!

I tried ggmcmc awhile back and found that it didn’t quite do everything I wanted, hence tidybayes. :) It should work with rstanarm as well, though I haven’t tested it extensively as I have been using brm more than rstanarm lately. I wrote the base support for tidybayes to work with rstan, rstanarm, runjags, brms, MCMCglmm, and coda, and extending it to support other model types for the most part just involves giving it an implementation of one function (as_sample_tibble).

1 Like

I have been mixing plain ggplot2, bayesplot, ggmcmc and my own implementations, as none of the approaches are really convenient in time series / state space settings. Basically in Stan output one would to need to somehow group together x[1,1], x[1,2] etc where the column index is time, or in more general setting one (I) could have a three dimensional array where say the first index is the state index, second is time and third is iteration. I have to check tidybayes as well. I am willing to contribute something relating to this to one of the packages but haven’t really figured out what would be the best base to start.

I’d love to get some better stuff for time series into our bayesplot package and always happy to get contributions! If anyone is interested in contributing I’m happy to discuss and provide any help you need (e.g. working with the package internals, or whatever).

tidybayes looks cool too! In addition to tidybayes allowing easier custom
plots in ggplot, it would be nice to be able to also pass tidy data
prepared by tidybayes into bayesplot plotting functions. I’m not sure if
that works as is but we should make that possible.

1 Like

@helske I’d love to hear if tidybayes helps with your time series analyses. I think tidybayes::spread_samples might help; it can generate a tidy data frame based on parameters with an arbitrary number of indices. For example, spread_samples(m, x[i,j,k]) will give you a data frame with columns .chain, .iteration, x, i, j, and k, where the value of x on a given row is the value of the parameter x[i,j,k] on the given .chain and .iteration.

This also works on multiple variables with shared indices, and indices are not required to be of the same number for each variable. E.g., spread_samples(m, x[i,j], z[i]) will give a data frame with columns .chain, .iteration, x, i, j, and z, where the value of x on a given row is the value of the parameter x[i,j] on the given .chain and .iteration, and z is the value of z[i] on the given .chain and .iteration (so values of z will be repeated across rows with the same i, .chain, and .iteration but different j).

@jonah I love the idea of being able to pass tidy data back into bayesplot! Does bayesplot mostly take a matrix of samples? If so, the easiest approach might be for me to make a function to invert the spread_samples and gather_samples functions in tidybayes to turn tidy data frames back into matrices (or data frames with x[i]-ish names). I could imagine a nice workflow like:

m %>%
  spread_samples(...) %>%
  #some transformations of tidy samples %>%
  unspread_samples(...) %>%
  bayesplot::some_function(...)