Extracting output of stansummary as an array

philipm · July 16, 2020, 11:23am

I am just getting started with the PyStan interface having previously used CmdStan. I am a big fan of the stansummary program that is part of CmdStan and have identified the equivalent in PyStan (i.e., the stansummary method for objects of type StanFit4Model).

Is there a way of extracting the n_eff column as an array? I would like to be able to identify which parameters have the smallest n_eff values.

At the moment the only way I can see of doing this is to examine the contents of the string that is returned by the stansummary method.

Would I be better off switching to one of the other interfaces such as RStan for this kind of work?

ahartikainen · July 16, 2020, 11:32am

We have summary which contains the same information. This can be transformed to pandas DataFrame with a little code.

To use updated diagnostics, I recommend to use arviz and its summary.

import arviz as az
summary = az.summary(fit)

philipm · July 16, 2020, 2:35pm

Thanks. Yes, that does answer my question.

I have also identified another solution, which is to use the --csv_file argument in CmdStan.This generates a file that is relatively easy to read into a pandas data frame. E.g.,

df = pd.read_csv(name, comment='#')

I don’t know if I am missing things in the documentation again but there were a couple of things I wasn’t able to do with arviz that I was able to do with CmdStan / stansummary.

Evaluate N_Eff/s.
Pass a CSV file of samples as input and produce a summary table.

The CSV file that I want to analyze has the following structure,

#     algorithm = lmc
#       lmc
#         engine = smmala
lp__,omega0_c1,omega0_c2,sd_in_c1,sd_in_c2,zeta
6161.36,81.4509,40.4473,99.4843,9.97418,0.201897
6165.26,79.3476,40.3356,97.1401,9.97287,0.188205
...
#
#  Elapsed Time: 0 seconds (Warm-up)
#                8.01562 seconds (Sampling)
#

I tried the following commands in Python

import arviz
fit = arviz.from_cmdstan("output.csv")

And I got the following error message,

  File "/home/philipm/.local/lib/python3.6/site-packages/arviz/data/io_cmdstan.py", line 499, in _process_configuration
    "thin": thin,
UnboundLocalError: local variable 'thin' referenced before assignment

ariddell · July 16, 2020, 3:04pm

If you’re willing to read some docstrings, you can probably get
something working.

PyStan has a function pystan.chains.ess which returns the effective
sample size for a single parameter given several chains.

philipm · July 16, 2020, 3:56pm

I am interested in the ESS per second as well as the ESS. I am getting what I need from CmdStan now, so I think it is best to treat this as resolved.

ahartikainen · July 16, 2020, 4:13pm

That is a bug in ArviZ. I need to fix it.

I think from_cmdstan reads sampling time to posterior attributes. I need to check this.

Topic		Replies	Views
cmdstanPy to cmdStanR CmdStan	13	762	December 2, 2020
Extract samples from vb() like StanFit4model.extract() PyStan	1	1444	August 17, 2017
Rstan::read_stan_csv throwing error with cmdstan models (versions 2.35) General rstan , cmdstanr	8	426	November 10, 2024
Is it possible to choose what is shown in the STAN summary output? General	4	495	February 1, 2022
cmdStan output to R, or dev branch in Rstan Developers	7	873	July 26, 2018

Extracting output of stansummary as an array

Related topics