$draws() method in CmdStanR is still slow

I’m working with a model with about 2.5e+6 parameters. At the moment, I’ve run one chain for 500 iterations post-warmup using CmdStanR. Reading those back into R with mymodel$draws() is taking way longer than expected.

Similar issues have been mentioned on this forum before, and they are supposed to have been fixed with CmdStanR version 0.2 which replaced vroom with data.table::fread for reading in the CSVs. However, I’m now running CmdStanR version 0.2.1 from GitHub and I am not seeing the expected performance.

In particular, my_output <- my_model$draws() has been running for the past half hour with no sign of progress (the R session continues to blaze away on the CPU however). On the other hand, thedata <- data.table::fread(filepath, skip = 545, header = F) finishes in about 9 seconds, and posterior::as_draws_array(thedata) finishes in about 2.5 minutes.

Any suggestions? I can post a public link to the csv output file if that’s helpful.

3 Likes

Thanks for the report.

Would you be so kind and just post an issue on Github with the number of parameters x iterations and the times. Can you try to upgrade to the latest posterior version? We made some improvements there last week, but that will probably not help with everything. I think there are other issues with the bind_draws we use a bit too much.

1 Like

Sure thing.

1 Like

You’ve already mentioned circumventing the built-in methods with fread and I assume you’d prefer a solution that returns you to the cmdstanr functions…but since you linked one of my original posts, I’ll point you to some code I wrote to take the output files and mold them into the same data structure that’s returned by the extract function from rstan, which I’m most familiar/comfortable with.

If you do use it, I’d replace bind_rows with data.table::rbindlist, and double check that the skip logic still applies to your files.

3 Likes

So while working on a reproducible version for GitHub, I noticed something strange:
Since I have no idea how to rebuild the R6 object with its associated $draws() method, I figured that I’d provide a public link to the CSV file and then reproduce using cmdstanr::read_cmdstan_csv()

…but this ran very fast. It’s just $draws that’s giving me the problem.

Before I file the issue, I want to check two things:

  1. This model was fit under an older version of cmdstanr, and so the fit object was returned by an older version of cmdstanr. There’s no way that’s the crux of the problem is there? (Unfortunately it’s several days of computation to re-fit the model using the new cmdstanr).
  2. I have no idea how to rebuild the R6 object with its associated $draws() method as would be necessary to provide a reproducible example. Is there a way to do this, or is it ok to forego a reproducible example in this case?

Yeah I like the structure from extract() too. We’re planning to have something similar available for CmdStanR via the posterior package, but it’s not implemented in posterior yet. In the meantime this function you wrote is super helpful! (I’m even tempted to consider adding it to CmdStanR until we have it available in posterior.)

2 Likes

Unfortunately yeah this could be the issue, although I don’t know for sure. The methods associated with the fitted model objects are the ones defined at the time the fitted model object was created, even if you subsequently install a later version of the package. That’s just how the R6 objects work.

That’s probably it then. Which is actually fortunate, rather than unfortunate!

1 Like

Did you run fit$draws() or did you filter out any particular parameter?

Just fit$draws()

by all means add it or iterate on it if you’d like. probably needs some more rigorous testing beyond my one (1) model…

1 Like

It’s also possible that it’s still too slow even with the latest version, but if you fit the model with a version before we switched to fread or made other improvements then that would certainly also make it slower than the latest version.

I’ll report back eventually if its still too slow, but at least read_cmdstan_csv is blazing fast, so there’s an obvious workaround for anybody in a similar boat.

2 Likes

One more thing. Was this just 1 chain or multiple chains?

Just 1.

Not to hijack the thread. IS there a way to rebuild R6 object after reading csv in cmdstanr? @rok_cesnovar

1 Like

Not yet but I’d like to add that. Can you open an issue for this if you have a chance? I have to step away from my computer for a little while now but I can make an issue later if you can’t do it now.

1 Like

Done here.

2 Likes