Plot Posterior Samples

Hi,

I’m looking for some advice on plotting and visualizing posterior samples from cmdstan.

In the past, I’ve loaded them into r, using the read_stan_csv command from the rstan package. However, sometimes the sample file is several gigabytes, and R struggles.

Someone once mentioned, in passing, that they used gnuplot for this, and that sounds like a nice solution.

Does anybody have an experience with using gnuplot to plot from a stan samples csv file? If so, how do you load that data, select which parameter to plot, etc?

Thank You

1 Like

Importing into R works better if you sed first to strip comment lines (well, sed for not matching the leading ‘#’ character and then print matches to a file (I think it’s sed -p -e 's/^[^#]//') and then use the csv file-reader from data.table package. What kills R’s performance is all the fancy logic in read.table… I think even if you set colClasses in R it does much better but I don’t recall to what extent. R will robustly and quickly allocate a few gigabytes to a data frame or matrix in my experience.

Sakrejda,

That’s a nice idea. Data.table is great, but I hadn’t thought of using it for Stan samples.

Quickly, how would do this with sample.csv from multiple chains?

Thanks!

I took a shot at that separately since I do stuff on a cluster or modified CmdStan versions pretty often, the repo is at: https://github.com/sakrejda/stannis

The command you want is stannis::read_stan_set(root=output_dir, pattern=pattern_matching_your_chains)

I worked out loading big files separately so that’s not integrated in there yet… my code reads the comment lines for metadata separately. The sed and data.table function needs to be substituted into stannis::read_stan_data for the current read.table. I wasn’t going to share this anytime soon but the code is pretty clean and it might save you some work. No doc yet.

1 Like

@betanalpha uses gnuplot with CmdStan, so he should be able to tell you how to do it. You’re not going to get the BayesPlot or ShinyStan level of built-in plotting, but Michael seems to manage.

1 Like

Gnuplot ignores the non csv lines in the CmdStan output so you can treat it just like a csv of samples. Then it’s just a question of what you want to visualize and looking up the appropriate plot style in the Gnuplot manual. You know, to first order.

2 Likes

Thanks Bob and betanalpha, very helpful stuff!

1 Like

Hey @betanalpha , this might be a tall order considering that this is from a thread a few years ago but do you have an examples of code where you’re plotting CmdStan with Gnuplot? I was thinking about seeing what is possible in terms of Bayesplot like plots for CmdStan. Thanks again and any feedback or help is much appreciated. I was also thinking of using Gnuplot in conjunction with feedgnuplot.

gnuplot is generic plotting tool so it can plot just about every common visualization, including but very much not limited to scatter plots, histograms, and contour plots. Because the CmdStan output is in CSV format (modulo the headers which don’t affect most common line tools for selecting column values like awk, cut, and gnuplot itself) using gnuplot with CmdStan is as straightforward as

set datafile separator ","
plot "output.csv" using 8:9

Is there something in specific you were having trouble with?

1 Like

Awesome, thanks again @betanalpha for this. It helps me out to get started.
In terms of difficulties, I’m not exactly there yet but I’m sure I’ll get there. I was just looking for some examples to get started and just understanding how to work gnuplot in general with regard to CmdStan.

I don’t have any general examples at hand but here are some specific example of plotting histograms and scatter plots, although they’re a little complicated by the fact that they’re plotting differences between rows which require some extra work.


```#!/bin/bash

light="#DCBCBC"
light_highlight="#C79999"
mid="#B97C7C"
mid_highlight="#A25050"
dark="#8F2727"
dark_highlight="#7C0000"

# Histograms
file="gauss.output.csv"

gnuplot << END

set datafile separator ','

set terminal pdfcairo transparent enhanced size 12in, 8in font 'Times,60'
set output 'energy_gauss.pdf'

set border 3

set title ""

stats "$file" using 7 nooutput
mean = STATS_mean

set xtics nomirror scale 0
set xlabel 'E - <E>'
set xrange[-60:60]

set ytics nomirror scale 0
set ylabel ''
set format y ''
set yrange[0:0.05]

set label "{/Symbol p}_{ {/Symbol D}E}" at 17, 0.03 center tc rgb "$dark_highlight"
set label "{/Symbol p}_{ E}" at 24, 0.01 center tc rgb "$light_highlight"

binwidth = 1
set boxwidth binwidth
bin(x, width) = width * floor(x / width) + 0.5 * binwidth

weight = 1.0 / (binwidth * 10000)

set style fill transparent solid 0.5

sigma=0.5
k=200
anal_mean=sigma*k

E_old=0

plot \
"$file" using (temp=E_old-\$7, E_old=\$7, bin(temp, binwidth)):(weight) smooth freq \
with boxes lt 1 lc rgb "$dark" fs fill border lc rgbcolor "$dark_highlight" notitle, \
"$file" using (bin(\$7 - mean, binwidth)):(weight) smooth freq \
with boxes lt 1 lc rgb "$light" fs fill border lc rgbcolor "$light_highlight" notitle
#exp(-0.5 * (x + anal_mean) / sigma + (0.5 * k - 1) * log(x + anal_mean) \
#- 0.5 * k * log(2 * sigma) - lgamma(0.5 * k)) w l lw 3 lc rgb "#666666" notitle

END

gnuplot << END

set datafile separator ","

set terminal pdfcairo transparent enhanced size 12in, 8in font 'Times,60'
set output "energy_funnel_ncp.pdf"

set border 3

set title ""

stats "$file" using 7 nooutput
mean = STATS_mean

set xtics nomirror scale 0
set xlabel 'E - <E>'
set xrange[-30:30]

set ytics nomirror scale 0
set ylabel "{/Symbol t}"
set log y
set yrange[1e-4:1e2]

plot \
"< awk -F, '{ if (\$6 == 0) print \$0 }' $file" \
u (\$7 - mean):9 w p pt 7 ps 0.5 lc rgb "$light_highlight" notitle, \
"< awk -F, '{ if (\$6 == 1) print \$0 }' $file" \
u (\$7 - mean):9 w p pt 7 ps 0.5 lc rgb "green" notitle

END
2 Likes

Awesome, thanks again for this @betanalpha !

1 Like