"ServerStan" implementation language poll

Are the mass matrix elements not embedded as comments? That was the original intent, as bad an idea as that was.

I think everyone agrees that mass matrix and step size should be taken out of this format. Until then, we could use a fast reader for the draws and a separate reader to just fish out the step size and inverse mass matrix.

the problem is that they’re embedded as comments in the middle of the goddamn data rows - you have the CSV header row, then, if save_warmup is True, the warmup draws, then comments, then the sampling draws. this imposes a line-by-line processing strategy.

even if the warmup draws aren’t saved, it seems that some readers don’t like comments anywhere except at the beginning of the file - @rok_cesnovar can correct me here

2 Likes

Yes, comments before or after the data are fine for any reader we tried. The ones between the header and data or between data rows cause issue for almost all fast readers (at least in the R ecosystem).

On python side, pandas can skip comments even between the samples.

But collecting the samples, means that the file needs to be iterated through a second time

E.g. here is the latest implemention in ArviZ

The C++ implementation of a reader I wrote in the thread above just collects everything in one pass and was basically as fast as the fastest other lib-based solutions. Hard to beat just plowing through the file in one pass. That code could’ve been shared across Python/R/etc… and we wouldn’t have to have inconsistencies across interfaces. It could be re-written to be pretty easily maintainable since the only libs it uses are standard ones and the only touchy parts were iostream stuff. I’m a big fan of having a single implementation for simple things.