Unwanted variable name when using `cbind` with a `draws_df` object

jgcolman · October 7, 2023, 2:11pm

Using cmdstanr in RStudio I have fitted a model so

fit <- eight_schools_cp$sample(
  data = data_list, 
  chains = 1, 
  refresh = 100  
)

and can derive a dataframe containing the draws:

results_df <-  fit$draws(format = "df")
colnames(results_df)

results_df is a draws_df object with the following column names:

[1] “lp__” “mu” “tau” “theta”
[5] “.chain” “.iteration” “.draw”

I am interested in plotting the draws of the logarithm of one variable, tau, against another, theta showing divergent and non-divergent draws in different colours. To get my plot I try to generate a data.frame along the following lines:

divergent <- fit$sampler_diagnostics()[,1,2]
plot_df <- results_df %>%
                            dplyr::select(theta, tau) %>%
                                   cbind(status = divergent)

But there is something wrong with the column names:

colnames(plot_df)

[1] “theta” “tau” “1.divergent__”

For plotting I was hoping to use:

ggplot(plot_df, aes(x = theta, y = log(tau),  colour = status) +
                     geom_point()

but that doesn’t work because status is not recognised.
If, however, I use ...colour = 1.divergent__ that doesn’t work either because the double underlines are not acceptable by ggplot.

I assume, having read https://cran.r-project.org/web/packages/posterior/vignettes/posterior.html, that I could change the name of 1.divergent__ back to something sensible but that requires me to know that the variable names are being changed.

I’d like to know why the variable name I use in cbind is being changed, and how to get round the problems that that change creates.

avehtari · October 7, 2023, 5:12pm

sampler_diagnostics are draws, too, so you can transform them to a draws_df object
You can use subset_draws and bind_draws to handle posterior objects so that the returned objects are still posterior objects

So you could do something like

divergent <- subset_draws(fit$sampler_diagnostics(format="df"),
                          variable="divergent__")
plot_df <- results_df |>
  subset_draws(varible="theta") |>
  bind_draws(divergent)

EDIT: made the code to match the original code closer

jgcolman · October 8, 2023, 1:03pm

Many thanks, Aki. That works really well.

Topic		Replies	Views
Issue with variable names staying when using as_draws_df General posterior-package	1	455	August 20, 2021
Mangled variable names with rstan::As.mcmc.list interferes with tidybayes RStan	2	500	April 22, 2020
"Error: Duplicate variable names are not allowed in draws objects." brms	7	1033	September 30, 2021
Posterior draws objects => recover original array data structure? Interfaces	13	884	August 20, 2021
Variable names with colon problematic for spread_draws brms techniques	2	1331	November 10, 2020

Unwanted variable name when using `cbind` with a `draws_df` object

Related topics