Faster JSON serialization, simdjson

ariddell · April 8, 2020, 2:04pm

Looks like there’s a new json library that’s far faster than rapidjson, https://github.com/simdjson/simdjson It’s reportedly 8.6 times faster than rapidjson.

I wonder how much slower this is than using binary serialization. In switching to protobuf, httpstan saw a ~30% increase in speed over ujson. But perhaps the speedup would have been marginal had we been using simdjson. Impressive!

Related to previous discussion here: Notes on Stan Output Serialization Options (YAML, Protobuf, Avro, CBOR)

rok_cesnovar · April 8, 2020, 2:13pm

Yep, simdjson is lightning fast and I looked at it when working on cmdstan JSON parser change.
However, its currently not suitable for cmdstan due to the C++17 requirement.

ariddell · April 8, 2020, 2:19pm

That makes sense.

I probably should have used “deserialization” in the title. I was thinking about how simdjson might change some of our thinking about the suitability of JSON as a storage format for output. That is, if simdjson makes reading large fits really fast, we would have less pressure to consider formats like HDF or Apache Arrow, etc.

Bob_Carpenter · April 9, 2020, 9:56pm

There’s also size. JSON files will be more than twice as big if they’re at 16-digit precision compared to binary (there’s also the - and . and e symbols).

Topic		Replies	Views
Notes on Stan Output Serialization Options (YAML, Protobuf, Avro, CBOR) Developers	13	3181	July 14, 2021
CmdStan dataset loading speed Developers cmdstan	21	1288	February 14, 2020
Request for comments: JSON Sampling output Developers	4	601	February 16, 2020
Speed difference between rstan and cmdstan for a simple model CmdStan rstan , techniques	25	3366	November 7, 2021
JSON Output for STAN Developers	30	853	October 3, 2023

Faster JSON serialization, simdjson

Related topics