Lack of precision/truncation of log posterior trace on cmdStan (but not PyStan)

caesoma · January 17, 2019, 8:27pm

I got some strange results when running the same Stan model with the same data using cmdStan compared to previous PyStan runs.

Here’s a run of the same thing with each interface side by side (and ignoring the lack of convergence/mixing of this multi-channel GP model, about which I posted before):

It looked like the cmdStan runs were not changing the proposal for hundreds of iterations, but the traces of the model parameters looked “normal”, in that they were changing and exploring parameter space. Additionally, this only happened when scaling up from 20 to 40 channels in the model.

I initially thought it was a more serious problem with the inference itself, but now I’m convinced it is just how cmdStan is logging the values. The .csv output has values of lp__ like 2.17258e+07 (i.e. 21725800 when loaded back into Python), while PyStan has ones like 21725161.65147942 so the latter looks normal while the former looks like big steps and flatlines for a large number of iterations.

My question is whether it is possible to make cmdStan use greater precision for logging the traces. Alternatively, to be sure, could this be caused by anything else (assuming the sampler is working properly and the HMC proposals are actually exploring parameter space as expected)?

Thanks in advance.

mitzimorris · January 17, 2019, 8:58pm

you’re correct - this could be changed in cmdstan and at the very least we need to document the precision -

in the current implementation, the same writer is used to output all sampler values for one draw - sampler state plus values for all parameter, transformed parameter, and generated quantities variables. increased precision on one output column would require increased the precision on all columns - this might slow down I/O and increase size of output files. but it’s doable.

caesoma · January 17, 2019, 10:50pm

Thanks for the quick reply. For now I can always recompute the posterior for each sample in whatever language I’m loading the cmdStan output into, or have some other Stan interface use the model to do it with the fixed_param algorithm. I just wanted to make sure there was nothing wrong internally, and that the developers were aware of any potential issue

I don’t know how often this problem would happen, I have a few thousand of data points, so I guess that makes for a log posterior this large, and that appeared when I doubled the number of data points to use a larger subset of the data.

Maybe the generated quantities could have greater precision since it is optional and could be used as a workaround for when these things happen, or maybe there’s a reason not to go in that direction. I’m sure you guys will figure it out.
Thanks again.

mitzimorris · January 18, 2019, 3:00pm

generated quantities is shoehorned into exactly same constraint as all other information - the problem is that all information from a draw - sampler state, params, and generated quantities variables - are output using exactly the same c++ output stream and the precision is set on the output stream, not the individual items.

the solution is finer-grained outputs. there’s been some discussion around this -

github.com

stan-dev/design-docs/blob/master/designs/0001-logger-io.md

**_Static, Logger-Style Output:_**

**Functional and Technical Specifications**

**_Motivation_**

This is a proposal for static, logger-style output to handle data and message output from the Stan services.  The goal is to make it easier to do three things

1.  Add a new type of data output to an existing service.
1.  Add a new algorithm, service, model method, etc.
1.  Write handlers in the interfaces.

The basis of the refactor will be to have a single, statically accessible output handler used for all output communication to interfaces.

**_Functional Specification_**

The proposal is to use a singleton to route all output in the style of traditional loggers.

**Types of output**

This file has been truncated. show original

Topic		Replies	Views
CmdStan slower than PyStan? Interfaces cmdstan , pystan	26	2599	April 17, 2019
NUL Characters in cmdstan output Modeling	6	396	March 18, 2021
Log_prob_grad via csv files for cmdstan CmdStan	7	457	May 12, 2021
Reproducibility of a non-linear model CmdStan cmdstanpy	3	208	April 23, 2024
Slightly discrenpancy between Rstan and Cmdstanr output General	2	500	February 1, 2021

Lack of precision/truncation of log posterior trace on cmdStan (but not PyStan)

Related topics