I’m working with a model with about 2.5e+6 parameters. At the moment, I’ve run one chain for 500 iterations post-warmup using CmdStanR. Reading those back into R with mymodel$draws() is taking way longer than expected.
In particular, my_output <- my_model$draws() has been running for the past half hour with no sign of progress (the R session continues to blaze away on the CPU however). On the other hand, thedata <- data.table::fread(filepath, skip = 545, header = F) finishes in about 9 seconds, and posterior::as_draws_array(thedata) finishes in about 2.5 minutes.
Any suggestions? I can post a public link to the csv output file if that’s helpful.
Would you be so kind and just post an issue on Github with the number of parameters x iterations and the times. Can you try to upgrade to the latest posterior version? We made some improvements there last week, but that will probably not help with everything. I think there are other issues with the bind_draws we use a bit too much.
You’ve already mentioned circumventing the built-in methods with fread and I assume you’d prefer a solution that returns you to the cmdstanr functions…but since you linked one of my original posts, I’ll point you to some code I wrote to take the output files and mold them into the same data structure that’s returned by the extract function from rstan, which I’m most familiar/comfortable with.
If you do use it, I’d replace bind_rows with data.table::rbindlist, and double check that the skip logic still applies to your files.
So while working on a reproducible version for GitHub, I noticed something strange:
Since I have no idea how to rebuild the R6 object with its associated $draws() method, I figured that I’d provide a public link to the CSV file and then reproduce using cmdstanr::read_cmdstan_csv() …
…but this ran very fast. It’s just $draws that’s giving me the problem.
Before I file the issue, I want to check two things:
This model was fit under an older version of cmdstanr, and so the fit object was returned by an older version of cmdstanr. There’s no way that’s the crux of the problem is there? (Unfortunately it’s several days of computation to re-fit the model using the new cmdstanr).
I have no idea how to rebuild the R6 object with its associated $draws() method as would be necessary to provide a reproducible example. Is there a way to do this, or is it ok to forego a reproducible example in this case?
Yeah I like the structure from extract() too. We’re planning to have something similar available for CmdStanR via the posterior package, but it’s not implemented in posterior yet. In the meantime this function you wrote is super helpful! (I’m even tempted to consider adding it to CmdStanR until we have it available in posterior.)
Unfortunately yeah this could be the issue, although I don’t know for sure. The methods associated with the fitted model objects are the ones defined at the time the fitted model object was created, even if you subsequently install a later version of the package. That’s just how the R6 objects work.
It’s also possible that it’s still too slow even with the latest version, but if you fit the model with a version before we switched to fread or made other improvements then that would certainly also make it slower than the latest version.
I’ll report back eventually if its still too slow, but at least read_cmdstan_csv is blazing fast, so there’s an obvious workaround for anybody in a similar boat.
Not yet but I’d like to add that. Can you open an issue for this if you have a chance? I have to step away from my computer for a little while now but I can make an issue later if you can’t do it now.