My R list has several data objects, totaling 7GB in R. The main list object is the independent data matrix of size 5 million x 157. The following error is given when I try to sample in CmdStanR:
Error in collapse(tmp, inner = FALSE, indent = indent) : R character strings are limited to 2^31-1 bytes
This happens in CmdStanR before any sampling occurs. It appears there is a limit to the data that R can convert to JSON. I was able to duplicate the error with the following command (my list is data_all_stan):
I have 128GB RAM, so it is not a RAM limitation on my desktop. This appears to be a limitation that I found others dealing with too outside Stan, converting to JSON. I tried Windows and WSL.
Not sure if others have recommendations. For now, I am cutting down the file size by randomly selecting rows.
Hi,
this appears to be a bug/limitation of the current cmdstanr implementation which relies on jsonlite::write_json. Could you try building a small reproducible example (e.g. by simulating a large dataset) and filing an issue at Issues · stan-dev/cmdstanr · GitHub ?
I think the only workaround that does not require code changes to cmdstanr is for you to write the JSON file yourself (you can inspect the format by writing a smaller dataset) in a way that does not require construction of large strings. Then you can call the model executable directly (see e.g. 4 MCMC Sampling | CmdStan User’s Guide) and then use cmdstanr::read_stan_csv or cmdstanr::as_cmdstan_fit to read the results into R.
We currently have a ton of R code that’s built around the project, which would be a bit of a pain to port to Python. Perhaps one thing we could do is just write a lil intermediate script that handles the JSON-writing. I’ll try it out and post it here.
If anyone is coming to this late (jsonlite is still throwing the same error - 2024) my solution to the problem was to write to file using jsonlite::stream_out function (batches lines into blocks of 500, and then writes to file without throwing the above error. Not quick but simple
I was able to solve this error by using jsonlite::toJSON to write the list to a json file.
Assuming your data list is called dat:
json_txt ← jsonlite::toJSON(dat,
auto_unbox = TRUE)
# auto_unbox to convert lists of length one to single elements
json_file ← file.path(read_folder, “json_dat.json”)
writeLines(json_txt, file(json_file))
stanfit ← model$sample(data = json_file, …)
# remplace ... with your actual sampling parameters
Also, don’t use jsonlite::prettify which seems to be causing the error.