I recently switched to using cmdstanr and am learning the differences from rstan, etc. I’ve run a multispecies distance sampling model and am encountering trouble with saving the output with the suggested save_object() method. It is taking forever to save - the model ran in ~4 days, and now has spent >2 days trying to save. This is on a supercomputer, and I gave the job 4 cpus each with 10GB. There should be 2k iterations saved in the posterior. It’s a fairly big model (1000 sites x 10 species x 30 years), but it seems absurd that saving should take this long given how quickly the model itself ran. Has anyone encountered similar issues or have suggestions for troubleshooting the saving process?
I don’t know what the cause is, but you’re not the only person to have reported this. Maybe @Jonah or @mitzimorris knows the root cause—Mitzi and Jonah have worked on CmdStanPy and CmdStanR and know the most about them.
Is it really 2K iterations across all chains? Are you saving warmup? If it’s really 2K, then the total number of values stored is 2K iterations * 1000 * 10 * 30 parameters/iteration is going to be 600M floating point values at double precision, or about 5GB assuming one parameter for each of those things you listed.
For troubleshooting, you can see if CmdStan has successfully written the draws to one or more files (depending on number of chains). If so, you can move the files out of temp directories, kill the process, and try again.
If it’s taking a long time it would either be in the self$draws() line or the saveRDS() line. self$draws() makes sure the posterior draws have all been read in from the CSV files, and this can be pretty slow with lots of parameters/transformed parameters/generated quantities. But if the draws have been read in already (via a previous call to draws() or summary() or any method that requires the draws) then it won’t read them in again, in which case it would only be the saveRDS part that’s time consuming. The ... lets you pass arguments to base::saveRDS, so you could change the type of compression it does, but I’m not sure if that would help or hurt in this case.
You can also try saving the object any other way you want. As long as you make sure everything you need has already been read in to memory from CSV (draws, sampler diagnostics, etc) then you can use any means available to R users to save the object instead of using fit$save_object(). There may be faster options that I’m not aware of.
Following up on this - would there be a way of importing the CSV files into some kind of other storage format, like parquet, for really big models? Or maybe into an SQLite DB? Having to load all the CSV files into memory can be a limitation for really big models.
We haven’t implemented anything like that, but it’s definitely something we’re interested in. There is a design document that was approved (but not implemented yet) that mentions parquet support for CmdStan: