Slow cmdstanr/posterior vs. rstan summary

Hallo everybody.

I noticed that for big models cmdstanr seemed a lot slower in summarizing than rstan, so I run a test on a dummy model with N=100, 4 chains and 4000 warmup and 4000 sampling iterations and saved the result in .csv files.

data {
  int<lower=0> N;
}

parameters {
  vector[N] big_mat[N];
}

model {
  for (n in 1:N)
    big_mat[n] ~ std_normal();
}

It seems that the posterior::summarise_draws function is very slow.
I ran the following comparisons:

## Read-ins
> system.time({stanfit_by_csvs <- read_stan_csv(grep("\\.csv", dir(), value = TRUE))})
   user  system elapsed 
 70.012   0.583  70.583 
> system.time({cmdstan_fit_by_csvs <- as_cmdstan_fit(grep("\\.csv", dir(), value = TRUE))})
   user  system elapsed 
 14.876   3.111  15.497 
> system.time({read_cmdstan_csv_input <- read_cmdstan_csv(grep("\\.csv", dir(), value = TRUE))})
   user  system elapsed 
 15.340   2.615  14.551 

Rstan is slower to read in the files and create a fit object, but for the summary it’s much faster:

## Summaries
> system.time({rstan_summary <- rstan::summary(stanfit_by_csvs)})
   user  system elapsed 
 56.724   0.134  58.782 
> system.time({cmdstanr_summary <- cmdstan_fit_by_csvs$summary()})
   user  system elapsed 
232.343   0.158 232.512 
> system.time({posterior_summarise_draws <- posterior::summarise_draws(read_cmdstan_csv_input$post_warmup_draws)})
   user  system elapsed 
231.480   0.064 231.503 

I made a “manual” version and it’s a bit faster:

> system.time({manual_summary <- posterior::as_draws_df(read_cmdstan_csv_input$post_warmup_draws) %>%
+     pivot_longer(-c(.iteration, .chain, .draw), names_to = "variable") %>%
+     group_by(variable) %>%
+     summarize(
+       across(
+         value,
+         list(
+           mean = mean,
+           median = median,
+           sd = sd,
+           mad = mad,
+           q5 = function(x) quantile(x, 0.05),
+           q95 = function(x) quantile(x, 0.95),
+           rhat = posterior::rhat,
+           ess_bulk = posterior::ess_bulk,
+           ess_tail = posterior::ess_tail
+         ),
+         .names = "{.fn}"
+       )
+     )})
   user  system elapsed 
159.625   1.732 161.333 

So I’m wondering:

  • What’s going on?
  • How can rstan be so much faster?
    • Also: When rstan calculates the summaries, it (to me) magically stores the result somewhere and all subsequent calculations (including diagnostics) are very fast. How does that work and why can’t cmdstanr do that?
  • Is there a way to speed up the manual cmdstanr version (for example when I don’t want to work with rstan for some reason)?
  • Basically, what is the fastest way to get the summary statistics for a set of parameters of interest AND ALSO the sampler diagnostics?

One more strange thing:
All versions seem to differ in the calculated ESS (I didn’t put the cmdstanr_summary version here since it’s the same as the posterior_summarise_draws one). What is going on there??

> manual_summary %>% arrange(match(variable, posterior_summarise_draws$variable)) %>% select(variable, ess_bulk, ess_tail) %>% print(n = 5)
# A tibble: 10,001 × 3
  variable     ess_bulk ess_tail
  <chr>           <dbl>    <dbl>
1 lp__            4523.    6340.
2 big_mat[1,1]   38982.   10783.
3 big_mat[2,1]   37777.   10699.
4 big_mat[3,1]   36204.   10589.
5 big_mat[4,1]   36468.   10457.
# … with 9,996 more rows
> posterior_summarise_draws %>% select(variable, ess_bulk, ess_tail) %>% print(n = 5)
# A tibble: 10,001 × 3
  variable     ess_bulk ess_tail
  <chr>           <dbl>    <dbl>
1 lp__            4534.    6347.
2 big_mat[1,1]   39649.   10829.
3 big_mat[2,1]   38682.   10710.
4 big_mat[3,1]   36529.   10628.
5 big_mat[4,1]   36782.   10448.
# … with 9,996 more rows
> rstan_summary$summary %>% as_tibble(rownames = "variable") %>% arrange(match(variable, posterior_summarise_draws$variable)) %>% select(variable, n_eff) %>% print(n = 5)
# A tibble: 10,001 × 2
  variable      n_eff
  <chr>         <dbl>
1 lp__          4516.
2 big_mat[1,1] 39265.
3 big_mat[2,1] 38295.
4 big_mat[3,1] 36844.
5 big_mat[4,1] 36501.
# … with 9,996 more rows

@mitzimorris

I know some of the CmdStanR details so I am going to respond for those questions.

The $summary() function is just a wrapper to posterior::summarise_draws(). Storing the summaries is left to the users, so if you need them multiple times, just call

summary <- fit$summary()

and then access the summary object later on. Any subsequent call will be quick as it will just access the previously created. We are aware that some of the summaries for larger numbers of parameters are not the most efficient yet (for example Summarise draws is probibitively slow for large number of parameters · Issue #98 · stan-dev/posterior · GitHub).

Posterior package is implemented entirely in R, while rstan uses C++ code to compute some of the summaries (ESS and RHat for example), which is obviously faster.

As for the differences in ESS calculation, I am not really an expert there, but as far as I know, the algorithms in posterior are newer and up to date to the latest published research.

cmdstan_summary() just calls the C++ version available in CmdStan (bin/stansummary), so that is what is going on there.

1 Like

Thank you very much (again) for the quick answer. That answered all my question. So in summary, for now it is quicker to do the “manual” version, especially if I have less parameters of interest than are part of the original output. I just saw this post - Summary method slow for large models - #13 by zcai - addressing almost the same issue, but for some reason I didn’t find it earlier.

1 Like

Oh, if you only need summaries for a few parameteres, you can also specify which of those you need in the summary call:

fit$summary(c(“p1”, “p2))
3 Likes

Note that posterior::summarise_draws() has a .cores argument that will speed things up on a multi-core system.

4 Likes