Limits to JSON conversion for Large Data (R character strings are limited to 2^31-1 bytes)

klattery · September 8, 2021, 12:53am

My R list has several data objects, totaling 7GB in R. The main list object is the independent data matrix of size 5 million x 157. The following error is given when I try to sample in CmdStanR:

Error in collapse(tmp, inner = FALSE, indent = indent) :
R character strings are limited to 2^31-1 bytes

This happens in CmdStanR before any sampling occurs. It appears there is a limit to the data that R can convert to JSON. I was able to duplicate the error with the following command (my list is data_all_stan):

write_stan_json(data_stan_all, file = file.path(dir_out, “data_stan_all.json”))

That gives the same error as above.

I have 128GB RAM, so it is not a RAM limitation on my desktop. This appears to be a limitation that I found others dealing with too outside Stan, converting to JSON. I tried Windows and WSL.

Not sure if others have recommendations. For now, I am cutting down the file size by randomly selecting rows.

martinmodrak · September 13, 2021, 8:42am

Hi,
this appears to be a bug/limitation of the current cmdstanr implementation which relies on jsonlite::write_json. Could you try building a small reproducible example (e.g. by simulating a large dataset) and filing an issue at Issues · stan-dev/cmdstanr · GitHub ?

I think the only workaround that does not require code changes to cmdstanr is for you to write the JSON file yourself (you can inspect the format by writing a smaller dataset) in a way that does not require construction of large strings. Then you can call the model executable directly (see e.g. 4 MCMC Sampling | CmdStan User’s Guide) and then use cmdstanr::read_stan_csv or cmdstanr::as_cmdstan_fit to read the results into R.

A similar problem was discussed here: Brms limited memory issue while running on 15M data points (without solution unfortunately). The problem was noted for jsonlite at R, convert large dataset into JSON - Stack Overflow (once again without solution)

lin.wang.idd.pasteur · February 18, 2022, 11:07am

This bug/limitation is not yet solved?

cpfiffer · July 18, 2023, 12:10am

This is a pretty major issue on my side – has anyone found a workaround here?

mitzimorris · July 18, 2023, 4:22pm

no solution for BRMS, but for CmdStanR, might you consider switching to Python? with plotnine, a Python port of ggplot2, you can go pretty far.

cpfiffer · July 18, 2023, 5:52pm

We currently have a ton of R code that’s built around the project, which would be a bit of a pain to port to Python. Perhaps one thing we could do is just write a lil intermediate script that handles the JSON-writing. I’ll try it out and post it here.

mitzimorris · July 18, 2023, 6:11pm

I’ve found that ChatGPT 4 is pretty good at translating R to Python and Python to R.

shane_conneely · August 30, 2024, 12:00pm

If anyone is coming to this late (jsonlite is still throwing the same error - 2024) my solution to the problem was to write to file using jsonlite::stream_out function (batches lines into blocks of 500, and then writes to file without throwing the above error. Not quick but simple

arya · October 27, 2024, 10:33pm

@shane_conneely thanks for sharing! Where exactly do you enter this option is it when calling sample in cmdstanr or do you run using regular cmdstan?

Amos · July 14, 2025, 9:19pm

Does anyone have a solution to this? It seems that stream_out only supports writing data.frames.

Topic		Replies	Views
Brms limited memory issue while running on 15M data points Modeling brms	9	1061	July 28, 2021
Request for comments: JSON Sampling output Developers	4	601	February 16, 2020
Error in unserialize(socklist[[n]]) with large-ish input data RStan	9	1009	November 8, 2018
Problem with unserialize with reasonably large model RStan	12	2443	May 24, 2019
How to convert "standata" to "json"? CmdStan	4	873	July 27, 2020

Limits to JSON conversion for Large Data (R character strings are limited to 2^31-1 bytes)

Related topics