Hello, I recently updated my cmdstanr and rstan packages after the newly released version of Stan (both now say Stan version 2.35.0). My workflow has been to run a model using cmdstan and then using the rstan::read_stan_csv() function on the sampling output (the result of stan_model$sample) so that I could then use the rstan::extract() function to get an array of the posterior samples for my variables.
After updating, I now get the following error, which has appeared periodically on other updates of Stan and cmdstanr:
Error in if (max(save_warmup) == 0L) { :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In parse_stancsv_comments(comments) : NAs introduced by coercion
2: In parse_stancsv_comments(comments) : NAs introduced by coercion
3: In parse_stancsv_comments(comments) : NAs introduced by coercion
4: In parse_stancsv_comments(comments) : NAs introduced by coercion
I had hoped updating rstan from the experimental branch would have resolved this but I still get the error after updating to 2.35.0 in rstan.
Is there a way to resolve this issue? Can I hard-code or tweak the text of a function to resolve the “max_warmup” variable?
My workflow has been to run a model using cmdstan and then using the rstan::read_stan_csv() function on the sampling output (the result of stan_model$sample) so that I could then use the rstan::extract() function to get an array of the posterior samples for my variables.
That seems like an unnecessarily complex path if you just want an array of posterior samples. Is there a reason to prefer this approach over just using cmdstanr to read the samples? For example:
I recognize it’s complicated but stanfit objects give me arrays that are grouped by named variable (vectorized variable?). Not clear on the best nomenclature, but something where I could say posterior$x to get a whole table of x[1], x[2], x[3], … whereas fit$draws(format = “draws_array”) or fit$draws(format = “df”) gives me something where I would have to separately call posterior$x[1], posterior$x[2], posterior$x[3].
Most of my post-processing code is designed around navigating these arrays (e.g., grabbing the sample for a particular specimen or for specimens from a particular analytical grouping).
If there’s a straightforward way to recreate that behavior with a draws_array (or draws_df), I would happily do that! But from what I’ve been looking at, it’d take either a lot of overhauling or writing my own wrapper to recreate that (which is what I’m currently working on)
The resulting object is a list where the dimensions of each object are intuitive to work with. For example my stan model has a JxJ matrix called lambda then previously I could run (does not work in current version of cmdstanr):
post <- extract_samples(cmdstanr_fit)
post$lambda
And it would return an array with dimensions [ndraws, J, J]. I know that I can do this for named parameters individually via:
post <- as_draws_rvars(fit$draws())
lambda <- draws_of(post$lambda)
But having to wrap draws_of() around every variable from the posterior is really a downgrade compared to the workflow I (and many others) have used for years, where all the draws are stored in a single “post” object that can be readily accessed.
If the end goal is a named list where each element is an array of posterior samples for a given variable, you could implement the extract_samples function for cmdstanr and posterior as:
extract_samples <- function(fit_obj) {
vars <- fit_obj$metadata()$stan_variables
draws <- posterior::as_draws_rvars(fit_obj$draws())
lapply(vars, \(var_name){
posterior::draws_of(draws[[var_name]], with_chains = FALSE)
}) |> setNames(vars)
}