Cmdstanr fails to read its own csv files for large number of parameters

I am fitting a hierarchical Hidden-Markov model with cmdstanr. I am using data from 20 subjects, each with around a 1000 trials (distributed across 20 block). The forward algorithm is implemented directly, hence I have to store about 20 x 20 x 100 variables for the forward-variables (and an equal number for the backward- and forward-backward smoothed ones as I wish to estimate by-trial probabilities).

Fitting the model works just fine and with lower number of subjects/blocks/trials, there is no issue whatsoever. However, when using the full dataset, cmdstanr cannot read back its own output files. In fact, it gets stuck in some obscure computation when trying to access any of the fitted model variables (such as using fit$draws() and even if trying to use fit$save_object(). This is the case, even if I use fit$optimize() instead of fit$sample(), even though the resulting output file from the fit (attached) is has only a single line (but a lot of variables, obviously) and is only about 9 MB in size.

This is the cmdstanr-file from an optimize()-run: https://dropfiles.org/asHh3fmL

Is this a known issue? Can I do anything to circumvent the issue? Currently, I am switching back to rstan.

  • Operating System: Linux (Debian 6.3.0-18)
  • CmdStan Version: 2.26.1
  • Compiler/Toolkit: GCC 6.3.0 20170516
1 Like

Can you check on the raw size of the csvā€™s?

2 Likes

Itā€™s just 9 MB for the optimize-run. Here is the file https://dropfiles.org/asHh3fmL.

1 Like

@ihrke I was able to reproduce this using the file you shared (thanks for that). The problem is happening when CmdStanR calls posterior::subset_draws() towards the end of cmdstanr::read_cmdstan_csv.

I made a branch that has a temporary fix for this when reading in the csv after optimization (I think ultimately we need to fix posterior::subset_draws()):

remotes::install_github("stan-dev/cmdstanr@temp-fix-optimize-csv")

This should get it to work with optimization (at least it allows me to use read_cmdstan_csv() successfully with the file you provided). Unfortunately Iā€™m not sure where the problem is happening when youā€™re using sampling but if you share that csv I can probably track it down.

Edit: @ihrke I updated the branch to avoid using subset_draws also for sampling so perhaps it will solve that for you too but Iā€™m not 100% sure.

2 Likes

Yeah, this is a known problem. A workaround for now is to use cmdstanr::read_cmdstan_csv() to read the csv files directly instead of using $draws().

Edit: @jonah notes that Iā€™m wrong about this. Just for posterity, it is also currently the case that read_cmdstan_csv works for sampling fits with large numbers of parameters, but $draws() does not.

In this case it also seems to happen with read_cmdstan_csv() unfortunately.

1 Like

Yeah, thanks for the reminder, I had forgotten about that. @rok_cesnovar We need to get back to that and figure something out.

2 Likes

Actually Iā€™m now pretty sure that the problem with draws() is also related to this issue with posterior::subset_draws() that I just opened:

1 Like

Thatā€™s amazing, thank you! I will try it asap!

I tried using the branch with your fix and reading the csv now seems to work fine. I use the following convenience function to read the whole fit into memory (before storing it as an .RData file) and it is lightning fast (as opposed to taking ages before your fix).

cmdstanr.resolve <- function(fit){
  temp_rds_file <- tempfile(fileext = ".RDS")
  fit$save_object(file = temp_rds_file)
  fit <- readRDS(temp_rds_file)  
  return(fit)
}

However, I cannot use the fit$summary() or fit$draws() function for this object as I used to. The error I get is

> mod_opt_probed.r$draws("gmu")
Error in `[.default`(private$draws_, , variables, drop = FALSE) : 
  subscript out of bounds
> mod_opt_probed.r$summary()
Error: Can't subset columns that don't exist.
x Columns `variable` and `mean` don't exist.
Run `rlang::last_error()` to see where the error occurred.

Oops I may have broken draws() on that branch when I fixed the CSV reading. But I think this PR that we just merged in the posterior package

will hopefully fix the problem. Can you try reinstalling both posterior and cmdstanr from master?

remotes::install_github("stan-dev/posterior")
remotes::install_github("stan-dev/cmdstanr")

and let let me know if that fixes the problem? Sorry for the hassle, but thanks for helping us fix this!

1 Like

Thanks, that seems to have fixed the problem! Thanks so much for your efforts and the incredibly fast fix!

Thatā€™s great, thanks for trying that out. Glad itā€™s working now!

@jsocolar Iā€™m hopeful that with posterior::subset_draws() now fixed this will drastically improve the speed of $draws() with many parameters. I havenā€™t done any rigorous testing yet though.

1 Like

@jonah, can I test by updating posterior without rebuilding the R6 object, or do I need to rebuild after updating?

My hunch is that youā€™ll need to rebuild the R6 object. Unfortunately if we update a method it doesnā€™t update the methods associated with existing R6 objects.

However, there may be an alternative: do you by any chance still have the CSV files or were those just written to temp files? If you still have the CSV files associated with the old R6 object then you can recreate the R6 object without having to rerun the model using as_cmdstan_fit(paths_to_csv_files). Then the resulting fit object would use the latest draws method.

1 Like

Yeah, thatā€™s what I meant by rebuild :)
Iā€™ll go ahead and give it a crack.

Cool, thanks for trying. You might also try using format = "draws_list" when running as_cmdstan_fit. According to @rok_cesnovar thatā€™s the most efficient format to use if there are a ton of parameters. (That will just affect how the draws are stored internally. If you then use draws() it will use the regular default of ā€œdraws_arrayā€ unless you specify a different format.)

2 Likes

My 3yo arrived home so I just got around to this.

$draws() is now blazing fast on a fit where it was previously unusable (250K parameters, now takes about 30 seconds, previously I killed it after 90 minutes).

4 Likes

Awesome, thatā€™s great news! Thanks for testing it out for us.