Tidybayes, posterior prediction, and cmdstanr

cwolfe · May 7, 2024, 5:47pm

Apologies if this is inappropriate here, but I figured I would give it a shot.

I am trying to plot posterior prediction curves (or even the expectation of the prediction) much like the code here from @mjskay vignette on tidybayes:

#epred

mtcars %>%
  group_by(cyl) %>%
  data_grid(hp = seq_range(hp, n = 51)) %>%
  add_epred_rvars(m_mpg) %>%
  ggplot(aes(x = hp, color = ordered(cyl))) +
  stat_lineribbon(aes(ydist = .epred)) +
  geom_point(aes(y = mpg), data = mtcars) +
  scale_fill_brewer(palette = "Greys") +
  scale_color_brewer(palette = "Set2")

#pred
mtcars %>%
  group_by(cyl) %>%
  data_grid(hp = seq_range(hp, n = 101)) %>%
  add_predicted_rvars(m_mpg) %>%
  ggplot(aes(x = hp, color = ordered(cyl), fill = ordered(cyl))) +
  stat_lineribbon(aes(ydist = .prediction), .width = c(.95, .80, .50), alpha = 1/4) +
  geom_point(aes(y = mpg), data = mtcars) +
  scale_fill_brewer(palette = "Set2") +
  scale_color_brewer(palette = "Dark2")

I want to complete the same task with my own data but with posterior predictive draws from cmdstanr$generate_quantities. I know the documentation suggests using add_draws from interfaces outside of brms and rstanarm, but i cannot seem to get the syntax correct - I think it has something to do with ypred being an array of length N and I am having troubling summarizing this? I want something like this but I cannot seem to get spread_draws() and other functions to cooperate upstream.

dat%>%
  data_grid(agey = seq_range(agey, n = 1316)) %>%
  add_draws(...) %>%
  ggplot(aes(x = agey)) +
  stat_lineribbon(aes(ydist = .prediction), .width = c(.95, .80, .50), alpha = 1/4) +
  geom_point(aes(y = hdl), data = dat) +
  scale_fill_brewer(palette = "Set2") +
  scale_color_brewer(palette = "Dark2")

data{

  int N;
  vector[N] x;

}
parameters{

  real HDL_constant; // mean function
  real HDL_exponent; // mean function
  real HDL_offset; // mean function
  real HDL_noise_intercept; // sd function 
  real HDL_noise_slope; // sd function

}
generated quantities{

  array[N] real ypred = normal_rng(HDL_constant*pow(x,HDL_exponent) + HDL_offset, HDL_noise_intercept*(1 + x*HDL_noise_slope));

}

mjskay · May 8, 2024, 3:02am

If you’re using generated quantities to produce ypred values conditional on the x variable input to the model, then the input to add_draws() should be two things:

A data frame with your predictor (x) in the same order as the values of ypred. Assuming the x variable you input into your cmdstanr model is also called x in your R environment, this could be data.frame(x = x).
A matrix where rows index draws from the model and the number of columns is equal to length(x), where each column contains the draws from ypred for the corresponding value of x. If your cmdstanr model is stored in a variable called fit, this would be fit$draws("ypred", format = "matrix").

Putting that together, you want something like:

data.frame(x = x) |>
  add_draws(fit$draws("ypred", format = "matrix")) |> 
  ...

cwolfe · May 8, 2024, 3:23am

Hi @mjskay,

Thank you for this - I should have put this together. I get the following error below. Trying to deep dive purr::map2 and other, but cannot pinpoint yet the exact cause…

data.frame(agey = dat$agey) %>%
  add_draws(fit_gq$draws("ypred", format = "matrix"))

Error in `map2()`:
ℹ In index: 2.
ℹ With name: .value.
Caused by error in `list_unchop()`:
! Can't combine `x[[1]]` <draws_matrix> and `x[[2]]` <draws_matrix>.
✖ Some attributes are incompatible.
ℹ The author of the class should implement vctrs methods.
ℹ See <https://vctrs.r-lib.org/reference/faq-error-incompatible-attributes.html>.
Run `rlang::last_trace()` to see where the error occurred.

cwolfe · May 8, 2024, 2:19pm

@mjskay Following up here for completeness -

It seems add_draws does not like the attributes attached to the draws_matrix coming from cmdstanr / posterior. I had to ‘transform’ to a base matrix in r (although not really?), eliminate the dimnames, and then I can get everything to run. It is not the most elegant solution, but when I went into some of the examples in the references (e.g., add_draws(posterior_epred(m_mpg, newdata = .))), the posterior_epred function returns a base r matrix with no dimnames. So i tried to recreate this with the draws from cmdstanr.

tdy <- fit_gq$draws(format = 'matrix')
tdy_mat <- as.matrix(tdy)
dimnames(tdy_mat) <- NULL


data.frame(agey = dat$agey) |>
  add_draws(tdy_mat) |> ggplot(aes(x = agey, y = HDL_L)) + 
  stat_lineribbon(aes(y = .value[,1])) + geom_point(data = dat) + 
  scale_fill_brewer(palette = "Reds") + xlab("Age[years]") + ylab("HDL[mm]") + 
  theme_classic() + theme(legend.position = "none")

JLC · May 19, 2024, 2:44pm

Thanks for this!

I’ve run into the same issue with having to manually set null dimnames, which makes creating processing functions clunky.

@mjskay Is the suggestion in the error “The author of the class should implement vctrs methods” something that would assist with this processing?

This approach kind of gets around the issue:

Data |>
        add_draws(apply(PCR_Fit$draws("ypred"), 3, as.vector))

Topic		Replies	Views
Request to help understand an apparent discrepancy between tidybayes::add_predicted_draws and brms::posterior_predict brms brms	2	685	January 20, 2021
Associate posterior probablity checks with covariates - using cmdstanR Modeling	5	498	August 10, 2021
Function like rstanarm::posterior_predict() for models written in Rstan? Interfaces rstan	4	402	November 4, 2020
Prediction for a new observation Modeling	8	1781	March 16, 2018
Problem with predicting Posterior Predictive Distributions Modeling rstan , techniques	2	1134	October 22, 2021

Tidybayes, posterior prediction, and cmdstanr

Related topics