Saving of CmdStanModel objects from cmdstanr

[macOS High Sierra, R 4.0, cmdstanr_0.0.0.9005]

After running a model with cmdstanr and saving it, I am sometimes unable to get the draws.

Specifically, I first do the following:

model = cmdstan_mode("my_model.stan")
sf = model$sample(data = datalist)
save(sf, file = "my_fit.Rdata")

Later I load the model again (on the same machine) and try to get the draws:

load("my_fit.Rdata")
sf$draws()

This works usually, but sometimes i get the following error:

Error in read_sample_csv(files = self$output_files(include_failed = FALSE), :
Assertion on ‘output_file’ failed: File does not exist: ‘/var/folders/yj/ws4lqwt13ms_lqg7hjw4jj7r0000gn/T/RtmpHEFQLD/GammaMix_sWK-202006191532-1-82393a.csv’.

I first thought that I had figured it out and that I needed to do something with the draws once before saving the CmdStanModel object in order to get the draws from the csv file into the CmdStanModel object. If one saves the CmdStanModel before doing something with the draws, the resulting .Rdata file is much smaller. But this does not seem to work consistently as I know have .Rdata file from a model that is so large that it should include the draws, but cmdstanr still wants to access the (temporary?) .csv file with the draws. As this file does not exist anymore, I am getting the error message above.

Is this expected behavior, or am I doing something wrong?
(I didn’t find information about saving CmdStanModel objects in the documentation).

Given the way rstan works, I think it would be best if saving a CmdStanModel object would always automatically be saved together with draws and sampler diagnostics.

PS: I also might have updated cmndstanr between saving an loading the CmdStanModel objects. But I am not sure, and I think this should not lead to the problem i see.

1 Like

Hi @Guido_Biele,

what you have experienced is intentional due to the way cmdstanr reads samples in the memory.
In order to save the users RAM we only read in samples when they are needed. And once the user reads samples in, they stay in the fit object.

So if you did

fit$draws()
fit$sampler_diagnostics()
saveRDS(...)

The fit object would have everything.

A much nicer way of doing this is using

fit$save_object(file = temp_rds_file)

This is explained in detail in this great vignette by @jonah

There is an issue open that would hopefully make even saveRDS() automatically do that (by indexing and memory mapping the CSV file) but that is waiting for some issues to be resolved in some of the packages we use.

2 Likes

Is it possible that fit$save_object() does not save the sampler diagnostics?

looks as if sampler diagnostics are saved after sf$sampler_diagnostics() was called.

Hm, could you can share an example of how you used it?

This example works:

library("cmdstanr")

file <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")
mod <- cmdstan_model(file)
data_list <- list(N = 20, y = c(0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,1,0))
fit <- mod$sample(
  data = data_list,
  seed = 123,
  chains =4
)

fit$save_object(file = "test.rds")
rm(fit)
gc()

fit2 <- readRDS("test.rds")
print(fit2$sampler_diagnostics())    

and the diagnostics are shown properly.
Its completely possible that something else is broken so if you have a share-able example that would be great.

2 Likes

Sorry for the false alarm,
not sure now where the error I got came from.
The example you provided also works for me.

No problem @Guido_Biele. You have been an extremely valuable pre-beta tester of cmdstanr so please don’t hesitate on these types of questions!

1 Like

+1, thanks for stress testing cmdstanr for us! It’s been very helpful.