"ServerStan" implementation language poll

Bob_Carpenter · August 20, 2020, 9:29pm

Are the mass matrix elements not embedded as comments? That was the original intent, as bad an idea as that was.

I think everyone agrees that mass matrix and step size should be taken out of this format. Until then, we could use a fast reader for the draws and a separate reader to just fish out the step size and inverse mass matrix.

mitzimorris · August 20, 2020, 10:22pm

the problem is that they’re embedded as comments in the middle of the goddamn data rows - you have the CSV header row, then, if save_warmup is True, the warmup draws, then comments, then the sampling draws. this imposes a line-by-line processing strategy.

even if the warmup draws aren’t saved, it seems that some readers don’t like comments anywhere except at the beginning of the file - @rok_cesnovar can correct me here

rok_cesnovar · August 21, 2020, 4:22am

Yes, comments before or after the data are fine for any reader we tried. The ones between the header and data or between data rows cause issue for almost all fast readers (at least in the R ecosystem).

ahartikainen · August 21, 2020, 4:28am

On python side, pandas can skip comments even between the samples.

But collecting the samples, means that the file needs to be iterated through a second time

E.g. here is the latest implemention in ArviZ

github.com

arviz-devs/arviz/blob/13ed70d5952c7c10a4f278c831195dd15b5173d1/arviz/data/io_cmdstan.py#L619


                key, value = match_str.group(1), match_str.group(2)
                results[key] = value
            elif match_empty:
                key = match_empty.group(1)
                results[key] = None

    results = {key: results[key] for key in sorted(results)}
    return results


def _read_output_file(path):
    comments = []

    # read comments
    with open(path, "rb") as f_obj:
        for line in f_obj:
            if line.startswith(b"#"):
                comments.append(line.decode("utf-8").strip())

    with open(path, "rb") as f_obj:
        data = pd.read_csv(f_obj, comment="#")

sakrejda · August 21, 2020, 1:26pm

The C++ implementation of a reader I wrote in the thread above just collects everything in one pass and was basically as fast as the fastest other lib-based solutions. Hard to beat just plowing through the file in one pass. That code could’ve been shared across Python/R/etc… and we wouldn’t have to have inconsistencies across interfaces. It could be re-written to be pretty easily maintainable since the only libs it uses are standard ones and the only touchy parts were iostream stuff. I’m a big fan of having a single implementation for simple things.

Topic		Replies	Views
The new stanc3 ocaml compiler can run the 8 schools model! Developers	2	900	May 17, 2019
First stanc3 release candidate! Developers	9	2142	August 19, 2019
Choosing the new Stan compiler's implementation language Developers	108	8227	November 17, 2018
Universal, static, logger-style output Developers pystan , rstan	57	3009	August 14, 2018
Stan 2.21.0 released Announcements	11	1807	October 21, 2019

"ServerStan" implementation language poll

Related Topics