$draws() method in CmdStanR is still slow

jsocolar · November 25, 2020, 6:28pm

I’m working with a model with about 2.5e+6 parameters. At the moment, I’ve run one chain for 500 iterations post-warmup using CmdStanR. Reading those back into R with mymodel$draws() is taking way longer than expected.

Similar issues have been mentioned on this forum before, and they are supposed to have been fixed with CmdStanR version 0.2 which replaced vroom with data.table::fread for reading in the CSVs. However, I’m now running CmdStanR version 0.2.1 from GitHub and I am not seeing the expected performance.

In particular, my_output <- my_model$draws() has been running for the past half hour with no sign of progress (the R session continues to blaze away on the CPU however). On the other hand, thedata <- data.table::fread(filepath, skip = 545, header = F) finishes in about 9 seconds, and posterior::as_draws_array(thedata) finishes in about 2.5 minutes.

Any suggestions? I can post a public link to the csv output file if that’s helpful.

rok_cesnovar · November 25, 2020, 6:48pm

Thanks for the report.

Would you be so kind and just post an issue on Github with the number of parameters x iterations and the times. Can you try to upgrade to the latest posterior version? We made some improvements there last week, but that will probably not help with everything. I think there are other issues with the bind_draws we use a bit too much.

jsocolar · November 25, 2020, 6:49pm

Sure thing.

walkerharrison · November 25, 2020, 7:33pm

You’ve already mentioned circumventing the built-in methods with fread and I assume you’d prefer a solution that returns you to the cmdstanr functions…but since you linked one of my original posts, I’ll point you to some code I wrote to take the output files and mold them into the same data structure that’s returned by the extract function from rstan, which I’m most familiar/comfortable with.

If you do use it, I’d replace bind_rows with data.table::rbindlist, and double check that the skip logic still applies to your files.

jsocolar · November 25, 2020, 7:39pm

So while working on a reproducible version for GitHub, I noticed something strange:
Since I have no idea how to rebuild the R6 object with its associated $draws() method, I figured that I’d provide a public link to the CSV file and then reproduce using cmdstanr::read_cmdstan_csv() …

…but this ran very fast. It’s just $draws that’s giving me the problem.

Before I file the issue, I want to check two things:

This model was fit under an older version of cmdstanr, and so the fit object was returned by an older version of cmdstanr. There’s no way that’s the crux of the problem is there? (Unfortunately it’s several days of computation to re-fit the model using the new cmdstanr).
I have no idea how to rebuild the R6 object with its associated $draws() method as would be necessary to provide a reproducible example. Is there a way to do this, or is it ok to forego a reproducible example in this case?

jonah · November 25, 2020, 7:40pm

Yeah I like the structure from extract() too. We’re planning to have something similar available for CmdStanR via the posterior package, but it’s not implemented in posterior yet. In the meantime this function you wrote is super helpful! (I’m even tempted to consider adding it to CmdStanR until we have it available in posterior.)

jonah · November 25, 2020, 7:41pm

Unfortunately yeah this could be the issue, although I don’t know for sure. The methods associated with the fitted model objects are the ones defined at the time the fitted model object was created, even if you subsequently install a later version of the package. That’s just how the R6 objects work.

jsocolar · November 25, 2020, 7:43pm

That’s probably it then. Which is actually fortunate, rather than unfortunate!

rok_cesnovar · November 25, 2020, 7:44pm

Did you run fit$draws() or did you filter out any particular parameter?

jsocolar · November 25, 2020, 7:44pm

Just fit$draws()

walkerharrison · November 25, 2020, 7:46pm

by all means add it or iterate on it if you’d like. probably needs some more rigorous testing beyond my one (1) model…

jonah · November 25, 2020, 7:47pm

It’s also possible that it’s still too slow even with the latest version, but if you fit the model with a version before we switched to fread or made other improvements then that would certainly also make it slower than the latest version.

jsocolar · November 25, 2020, 7:48pm

I’ll report back eventually if its still too slow, but at least read_cmdstan_csv is blazing fast, so there’s an obvious workaround for anybody in a similar boat.

rok_cesnovar · November 25, 2020, 7:58pm

One more thing. Was this just 1 chain or multiple chains?

jsocolar · November 25, 2020, 7:58pm

Just 1.

yizhang · December 17, 2020, 11:22pm

Not to hijack the thread. IS there a way to rebuild R6 object after reading csv in cmdstanr? @rok_cesnovar

jonah · December 17, 2020, 11:31pm

Not yet but I’d like to add that. Can you open an issue for this if you have a chance? I have to step away from my computer for a little while now but I can make an issue later if you can’t do it now.

yizhang · December 18, 2020, 12:52am

Done here.

Topic		Replies	Views
Slow cmdstanr/posterior vs. rstan summary CmdStan cmdstanr	5	1297	November 16, 2021
Summary method slow for large models General	12	2367	May 31, 2021
Extracting draw summaries prohibitively slow for massive models Interfaces cmdstanr , posterior-package	2	533	May 2, 2023
Parallelized loading of csv files Developers r , cmdstanr , posterior-package	3	53	September 6, 2024
Cmdstanr save_object() takes a long time CmdStan cmdstanr	6	662	January 7, 2024

$draws() method in CmdStanR is still slow

Related topics