The CSV itself is not the problem, neither are the comments themselves. The problem for all fast R CSV reader packages (and I believe Python packages as well - Mitzi or Ari would know more) is the #adaptation terminated
and the inverse mass matrix and stepsize comments that are reported in comments between the header and sampling values. Those are difficult for fast readers. The comments before the header (the metadata) and the comments after the samples (timings) are not a problem at all.
So this:
lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta
# Adaptation terminated
# Step size = 0.822884
# Diagonal elements of inverse mass matrix:
# 0.417943
-7.1053,0.953597,0.822884,2,3,0,7.96837,0.364082
-7.27881,0.989561,0.822884,3,7,0,7.30662,0.390752
-7.11597,1,0.822884,1,1,0,7.27771,0.365875
If this info would be presented in some other form, then this would be universally readable.
rstan uses the utils::read.csv
that has no problem parsing this but is really slow. In cmdstanr we use a package called vroom
, which is a lot faster (see Lightweight interfaces - keeping it light - #9 by rok_cesnovar), but not so lightweight of a package. Vroom is able to parse this format, but requires a lot of info on the incoming CSV to read it correctly (ballpark number of lines, initial number of commented lines, etc). The fastest CSV reading R package data.table
which is also lightweight, can not read this format.