"ServerStan" implementation language poll

rok_cesnovar · August 19, 2020, 7:19pm

The CSV itself is not the problem, neither are the comments themselves. The problem for all fast R CSV reader packages (and I believe Python packages as well - Mitzi or Ari would know more) is the #adaptation terminated and the inverse mass matrix and stepsize comments that are reported in comments between the header and sampling values. Those are difficult for fast readers. The comments before the header (the metadata) and the comments after the samples (timings) are not a problem at all.
So this:

lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta
# Adaptation terminated
# Step size = 0.822884
# Diagonal elements of inverse mass matrix:
# 0.417943
-7.1053,0.953597,0.822884,2,3,0,7.96837,0.364082
-7.27881,0.989561,0.822884,3,7,0,7.30662,0.390752
-7.11597,1,0.822884,1,1,0,7.27771,0.365875

If this info would be presented in some other form, then this would be universally readable.

rstan uses the utils::read.csv that has no problem parsing this but is really slow. In cmdstanr we use a package called vroom, which is a lot faster (see Lightweight interfaces - keeping it light - #9 by rok_cesnovar), but not so lightweight of a package. Vroom is able to parse this format, but requires a lot of info on the incoming CSV to read it correctly (ballpark number of lines, initial number of commented lines, etc). The fastest CSV reading R package data.table which is also lightweight, can not read this format.

Topic		Replies	Views
The new stanc3 ocaml compiler can run the 8 schools model! Developers	2	960	May 17, 2019
First stanc3 release candidate! Developers	9	2199	August 19, 2019
PyStan 3 prototype "pystan-next" Developers	6	922	August 14, 2020
Httpstan, the HTTP interface to Stan Developers	10	4736	March 3, 2022
Choosing the new Stan compiler's implementation language Developers	108	8608	November 17, 2018

"ServerStan" implementation language poll

Related topics