Rstan::read_stan_csv throwing error with cmdstan models (versions 2.35)

Jesse_Wolfhagen · June 27, 2024, 7:13pm

Hello, I recently updated my cmdstanr and rstan packages after the newly released version of Stan (both now say Stan version 2.35.0). My workflow has been to run a model using cmdstan and then using the rstan::read_stan_csv() function on the sampling output (the result of stan_model$sample) so that I could then use the rstan::extract() function to get an array of the posterior samples for my variables.

rstan::read_stan_csv(model_samples$output_files())

After updating, I now get the following error, which has appeared periodically on other updates of Stan and cmdstanr:

Error in if (max(save_warmup) == 0L) { :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In parse_stancsv_comments(comments) : NAs introduced by coercion
2: In parse_stancsv_comments(comments) : NAs introduced by coercion
3: In parse_stancsv_comments(comments) : NAs introduced by coercion
4: In parse_stancsv_comments(comments) : NAs introduced by coercion

I had hoped updating rstan from the experimental branch would have resolved this but I still get the error after updating to 2.35.0 in rstan.

Is there a way to resolve this issue? Can I hard-code or tweak the text of a function to resolve the “max_warmup” variable?

andrjohns · June 27, 2024, 7:28pm

My workflow has been to run a model using cmdstan and then using the rstan::read_stan_csv() function on the sampling output (the result of stan_model$sample) so that I could then use the rstan::extract() function to get an array of the posterior samples for my variables.

That seems like an unnecessarily complex path if you just want an array of posterior samples. Is there a reason to prefer this approach over just using cmdstanr to read the samples? For example:

fit$draws(format = "draws_array")

Or one of the many other draws formats

Jesse_Wolfhagen · June 27, 2024, 7:35pm

I recognize it’s complicated but stanfit objects give me arrays that are grouped by named variable (vectorized variable?). Not clear on the best nomenclature, but something where I could say posterior$x to get a whole table of x[1], x[2], x[3], … whereas fit$draws(format = “draws_array”) or fit$draws(format = “df”) gives me something where I would have to separately call posterior$x[1], posterior$x[2], posterior$x[3].

Most of my post-processing code is designed around navigating these arrays (e.g., grabbing the sample for a particular specimen or for specimens from a particular analytical grouping).

If there’s a straightforward way to recreate that behavior with a draws_array (or draws_df), I would happily do that! But from what I’ve been looking at, it’d take either a lot of overhauling or writing my own wrapper to recreate that (which is what I’m currently working on)

andrjohns · June 27, 2024, 7:48pm

Yep that’s also trivial to do:

x <- fit$draws(variables="x")

Or:

draws <- fit$draws()

x <- posterior::subset_draws(draws, "x")

Jesse_Wolfhagen · June 27, 2024, 7:50pm

Ahhh, thank you so much!

Erik_Ringen · July 29, 2024, 8:06pm

I have also run into this error. In my case, I use read_stan_csv() so that I can get a “rethinking” style list of posterior samples:

extract_samples <- function(cmdstanrfit) {
  cmdstanrfit$output_files() |>
    rstan::read_stan_csv() |>
    rethinking::extract.samples()
}

The resulting object is a list where the dimensions of each object are intuitive to work with. For example my stan model has a JxJ matrix called lambda then previously I could run (does not work in current version of cmdstanr):

post <- extract_samples(cmdstanr_fit)
post$lambda

And it would return an array with dimensions [ndraws, J, J]. I know that I can do this for named parameters individually via:

post <- as_draws_rvars(fit$draws())
lambda <- draws_of(post$lambda)

But having to wrap draws_of() around every variable from the posterior is really a downgrade compared to the workflow I (and many others) have used for years, where all the draws are stored in a single “post” object that can be readily accessed.

andrjohns · July 30, 2024, 7:12am

If the end goal is a named list where each element is an array of posterior samples for a given variable, you could implement the extract_samples function for cmdstanr and posterior as:

extract_samples <- function(fit_obj) {
  vars <- fit_obj$metadata()$stan_variables
  draws <- posterior::as_draws_rvars(fit_obj$draws())
  
  lapply(vars, \(var_name){  
    posterior::draws_of(draws[[var_name]], with_chains = FALSE)
  }) |> setNames(vars)
}

Erik_Ringen · July 30, 2024, 1:03pm

This is exactly what I was looking for, thank you.

brock · November 10, 2024, 11:39am

To anyone else hitting this issue with legacy code and just want to run it without rewriting, I posted a guide here: `rstan` fails to read `cmdstanr` output .csv files · Issue #1133 · stan-dev/rstan · GitHub. @andrjohns if anything there seems different from what you would recommend, let me know and I’ll update the guide.

Topic		Replies	Views
Problem with read_stan_csv CmdStan	4	758	November 12, 2018
Problem using rstan::read_stan_csv RStan cmdstanr	9	1464	November 9, 2021
`rstan::read_stan_csv` Fails to Read CSV Output from cmdstan When `thin=` is Specified Developers	2	412	April 7, 2020
Error with read_stan_csv RStan	5	973	April 28, 2017
C++ exception rstan error, Stan model does not contain samples. Catalina 10.15.3, rstan 2.21.2 RStan fitting-issues	7	1009	November 9, 2020

Rstan::read_stan_csv throwing error with cmdstan models (versions 2.35)

Related topics