Tidybayes, posterior prediction, and cmdstanr

Apologies if this is inappropriate here, but I figured I would give it a shot.

I am trying to plot posterior prediction curves (or even the expectation of the prediction) much like the code here from @mjskay vignette on tidybayes:

#epred

mtcars %>%
  group_by(cyl) %>%
  data_grid(hp = seq_range(hp, n = 51)) %>%
  add_epred_rvars(m_mpg) %>%
  ggplot(aes(x = hp, color = ordered(cyl))) +
  stat_lineribbon(aes(ydist = .epred)) +
  geom_point(aes(y = mpg), data = mtcars) +
  scale_fill_brewer(palette = "Greys") +
  scale_color_brewer(palette = "Set2")

#pred
mtcars %>%
  group_by(cyl) %>%
  data_grid(hp = seq_range(hp, n = 101)) %>%
  add_predicted_rvars(m_mpg) %>%
  ggplot(aes(x = hp, color = ordered(cyl), fill = ordered(cyl))) +
  stat_lineribbon(aes(ydist = .prediction), .width = c(.95, .80, .50), alpha = 1/4) +
  geom_point(aes(y = mpg), data = mtcars) +
  scale_fill_brewer(palette = "Set2") +
  scale_color_brewer(palette = "Dark2")

I want to complete the same task with my own data but with posterior predictive draws from cmdstanr$generate_quantities. I know the documentation suggests using add_draws from interfaces outside of brms and rstanarm, but i cannot seem to get the syntax correct - I think it has something to do with ypred being an array of length N and I am having troubling summarizing this? I want something like this but I cannot seem to get spread_draws() and other functions to cooperate upstream.

dat%>%
  data_grid(agey = seq_range(agey, n = 1316)) %>%
  add_draws(...) %>%
  ggplot(aes(x = agey)) +
  stat_lineribbon(aes(ydist = .prediction), .width = c(.95, .80, .50), alpha = 1/4) +
  geom_point(aes(y = hdl), data = dat) +
  scale_fill_brewer(palette = "Set2") +
  scale_color_brewer(palette = "Dark2")
data{

  int N;
  vector[N] x;

}
parameters{

  real HDL_constant; // mean function
  real HDL_exponent; // mean function
  real HDL_offset; // mean function
  real HDL_noise_intercept; // sd function 
  real HDL_noise_slope; // sd function

}
generated quantities{

  array[N] real ypred = normal_rng(HDL_constant*pow(x,HDL_exponent) + HDL_offset, HDL_noise_intercept*(1 + x*HDL_noise_slope));

}

If youā€™re using generated quantities to produce ypred values conditional on the x variable input to the model, then the input to add_draws() should be two things:

  • A data frame with your predictor (x) in the same order as the values of ypred. Assuming the x variable you input into your cmdstanr model is also called x in your R environment, this could be data.frame(x = x).
  • A matrix where rows index draws from the model and the number of columns is equal to length(x), where each column contains the draws from ypred for the corresponding value of x. If your cmdstanr model is stored in a variable called fit, this would be fit$draws("ypred", format = "matrix").

Putting that together, you want something like:

data.frame(x = x) |>
  add_draws(fit$draws("ypred", format = "matrix")) |> 
  ...
1 Like

Hi @mjskay,

Thank you for this - I should have put this together. I get the following error below. Trying to deep dive purr::map2 and other, but cannot pinpoint yet the exact causeā€¦

data.frame(agey = dat$agey) %>%
  add_draws(fit_gq$draws("ypred", format = "matrix"))

Error in `map2()`:
ā„¹ In index: 2.
ā„¹ With name: .value.
Caused by error in `list_unchop()`:
! Can't combine `x[[1]]` <draws_matrix> and `x[[2]]` <draws_matrix>.
āœ– Some attributes are incompatible.
ā„¹ The author of the class should implement vctrs methods.
ā„¹ See <https://vctrs.r-lib.org/reference/faq-error-incompatible-attributes.html>.
Run `rlang::last_trace()` to see where the error occurred.

@mjskay Following up here for completeness -

It seems add_draws does not like the attributes attached to the draws_matrix coming from cmdstanr / posterior. I had to ā€˜transformā€™ to a base matrix in r (although not really?), eliminate the dimnames, and then I can get everything to run. It is not the most elegant solution, but when I went into some of the examples in the references (e.g., add_draws(posterior_epred(m_mpg, newdata = .))), the posterior_epred function returns a base r matrix with no dimnames. So i tried to recreate this with the draws from cmdstanr.

tdy <- fit_gq$draws(format = 'matrix')
tdy_mat <- as.matrix(tdy)
dimnames(tdy_mat) <- NULL


data.frame(agey = dat$agey) |>
  add_draws(tdy_mat) |> ggplot(aes(x = agey, y = HDL_L)) + 
  stat_lineribbon(aes(y = .value[,1])) + geom_point(data = dat) + 
  scale_fill_brewer(palette = "Reds") + xlab("Age[years]") + ylab("HDL[mm]") + 
  theme_classic() + theme(legend.position = "none")

2 Likes

Thanks for this!

Iā€™ve run into the same issue with having to manually set null dimnames, which makes creating processing functions clunky.

@mjskay Is the suggestion in the error ā€œThe author of the class should implement vctrs methodsā€ something that would assist with this processing?

This approach kind of gets around the issue:

Data |>
        add_draws(apply(PCR_Fit$draws("ypred"), 3, as.vector)) 
1 Like