Stan CSV file format

mitzimorris · August 1, 2020, 9:07pm

Just updated the CmdStan chapter on the Stan CSV file format: https://mc-stan.org/docs/cmdstan-guide/stan-csv.html

The difficulty of parsing the Stan CSV files using standard CSV parser packages came up here:

This inspired me to document the gory details of the sampler outputs. I hope this is useful to others and for the ongoing discussion of better and faster outputs from the sampler and other Stan services. As always feedback welcome, also help fleshing out descriptions of sampler outputs for the other methods.

ahartikainen · August 1, 2020, 9:47pm

If I ignore performance for a moment:

Parsing the header by grouping against whitespace is not optimal.

Maybe verbose names could be better?

mitzimorris · August 2, 2020, 3:16am

hi @ahartikainen, not sure I follow your comments - what’s the context here?

ahartikainen · August 2, 2020, 6:41am

The header has multiple levels. Sure, human can read it easily, but parsing the lines with code is a bit harder --> one needs to follow what block is going on. And couple of copy-paste errors (done by human) whitespace groups can dissappear. Then one would need to know all arguments for all samplers to get back the correct structure

Good:

# stan_version_major = 2
# stan_version_minor = 24
# stan_version_patch = 0
# model = bernoulli_model
# method = sample (Default)

Not bad, but

#   sample
#     num_samples = 100
#     num_warmup = 200
#     save_warmup = 1
#     thin = 1 (Default)
#     adapt
#       engaged = 1 (Default)
...

Is basically same as

#   sample.num_samples = 100
#   sample.num_warmup = 200
#   sample.save_warmup = 1
#   sample.thin = 1 (Default)
#   sample.adapt.engaged = 1 (Default)

mitzimorris · August 2, 2020, 3:47pm

took a look at the CmdStan code - implementing this would require a refactor of the argument handling code and it would be a lot of work - I’ve messed with that code before - it burned up about of week of dev time between me and Daniel Lee - not worth it.

almost all of arguments names are unique - with the exception of keyword “file” which is used for data block inputs, parameter init inputs, and algorithm outputs. flattening the argument names along the lines of your suggestion would lead to “data_file” “init_file” and “output_file”. note that the sample method already has keyword “diagnostic_file” which is a step in the right direction. at which point, white space wouldn’t matter.

also note that “init_file” for specific parameter inits would allow the use to also specify the init range for all other parameters - the services layer interface allows this, it’s a limit of the current CmdStan set of argument names.

actually, you can specify “output diagnostic_file=foo.csv” for any method - not sure if any methods besides ‘sample’ do anything - maybe ‘vb’ does?

Topic		Replies	Views
Lightweight interfaces - keeping it light Developers	21	1669	September 18, 2020
Rstan::read_stan_csv throwing error with cmdstan models (versions 2.35) General rstan , cmdstanr	8	604	November 10, 2024
Problem using rstan::read_stan_csv RStan cmdstanr	9	1544	November 9, 2021
Naming help on function in CmdStanR and CmdStanPy to get variable from sample Interfaces posterior-package	25	2124	June 9, 2020
Status of the IO re-factor? Developers	16	1362	September 18, 2020

Stan CSV file format

Related topics