Using cmdstanr
in RStudio I have fitted a model so
fit <- eight_schools_cp$sample(
data = data_list,
chains = 1,
refresh = 100
)
and can derive a dataframe containing the draws:
results_df <- fit$draws(format = "df")
colnames(results_df)
results_df
is a draws_df
object with the following column names:
[1] “lp__” “mu” “tau” “theta”
[5] “.chain” “.iteration” “.draw”
I am interested in plotting the draws of the logarithm of one variable, tau
, against another, theta
showing divergent and non-divergent draws in different colours. To get my plot I try to generate a data.frame along the following lines:
divergent <- fit$sampler_diagnostics()[,1,2]
plot_df <- results_df %>%
dplyr::select(theta, tau) %>%
cbind(status = divergent)
But there is something wrong with the column names:
colnames(plot_df)
[1] “theta” “tau” “1.divergent__”
For plotting I was hoping to use:
ggplot(plot_df, aes(x = theta, y = log(tau), colour = status) +
geom_point()
but that doesn’t work because status
is not recognised.
If, however, I use ...colour = 1.divergent__
that doesn’t work either because the double underlines are not acceptable by ggplot
.
I assume, having read https://cran.r-project.org/web/packages/posterior/vignettes/posterior.html, that I could change the name of 1.divergent__
back to something sensible but that requires me to know that the variable names are being changed.
I’d like to know why the variable name I use in cbind
is being changed, and how to get round the problems that that change creates.