Plot Posterior Samples


#1

Hi,

I’m looking for some advice on plotting and visualizing posterior samples from cmdstan.

In the past, I’ve loaded them into r, using the read_stan_csv command from the rstan package. However, sometimes the sample file is several gigabytes, and R struggles.

Someone once mentioned, in passing, that they used gnuplot for this, and that sounds like a nice solution.

Does anybody have an experience with using gnuplot to plot from a stan samples csv file? If so, how do you load that data, select which parameter to plot, etc?

Thank You


#2

Importing into R works better if you sed first to strip comment lines (well, sed for not matching the leading ‘#’ character and then print matches to a file (I think it’s sed -p -e 's/^[^#]//') and then use the csv file-reader from data.table package. What kills R’s performance is all the fancy logic in read.table… I think even if you set colClasses in R it does much better but I don’t recall to what extent. R will robustly and quickly allocate a few gigabytes to a data frame or matrix in my experience.


#3

Sakrejda,

That’s a nice idea. Data.table is great, but I hadn’t thought of using it for Stan samples.

Quickly, how would do this with sample.csv from multiple chains?

Thanks!


#4

I took a shot at that separately since I do stuff on a cluster or modified CmdStan versions pretty often, the repo is at: https://github.com/sakrejda/stannis

The command you want is stannis::read_stan_set(root=output_dir, pattern=pattern_matching_your_chains)

I worked out loading big files separately so that’s not integrated in there yet… my code reads the comment lines for metadata separately. The sed and data.table function needs to be substituted into stannis::read_stan_data for the current read.table. I wasn’t going to share this anytime soon but the code is pretty clean and it might save you some work. No doc yet.


#7

@betanalpha uses gnuplot with CmdStan, so he should be able to tell you how to do it. You’re not going to get the BayesPlot or ShinyStan level of built-in plotting, but Michael seems to manage.


#8

Gnuplot ignores the non csv lines in the CmdStan output so you can treat it just like a csv of samples. Then it’s just a question of what you want to visualize and looking up the appropriate plot style in the Gnuplot manual. You know, to first order.


#9

Thanks Bob and betanalpha, very helpful stuff!