Extracting output of stansummary as an array

I am just getting started with the PyStan interface having previously used CmdStan. I am a big fan of the stansummary program that is part of CmdStan and have identified the equivalent in PyStan (i.e., the stansummary method for objects of type StanFit4Model).

Is there a way of extracting the n_eff column as an array? I would like to be able to identify which parameters have the smallest n_eff values.

At the moment the only way I can see of doing this is to examine the contents of the string that is returned by the stansummary method.

Would I be better off switching to one of the other interfaces such as RStan for this kind of work?

We have summary which contains the same information. This can be transformed to pandas DataFrame with a little code.

To use updated diagnostics, I recommend to use arviz and its summary.

import arviz as az
summary = az.summary(fit)

Thanks. Yes, that does answer my question.

I have also identified another solution, which is to use the --csv_file argument in CmdStan.This generates a file that is relatively easy to read into a pandas data frame. E.g.,

df = pd.read_csv(name, comment='#')

I don’t know if I am missing things in the documentation again but there were a couple of things I wasn’t able to do with arviz that I was able to do with CmdStan / stansummary.

  • Evaluate N_Eff/s.
  • Pass a CSV file of samples as input and produce a summary table.

The CSV file that I want to analyze has the following structure,

#     algorithm = lmc
#       lmc
#         engine = smmala
lp__,omega0_c1,omega0_c2,sd_in_c1,sd_in_c2,zeta
6161.36,81.4509,40.4473,99.4843,9.97418,0.201897
6165.26,79.3476,40.3356,97.1401,9.97287,0.188205
...
#
#  Elapsed Time: 0 seconds (Warm-up)
#                8.01562 seconds (Sampling)
#

I tried the following commands in Python

import arviz
fit = arviz.from_cmdstan("output.csv")

And I got the following error message,

  File "/home/philipm/.local/lib/python3.6/site-packages/arviz/data/io_cmdstan.py", line 499, in _process_configuration
    "thin": thin,
UnboundLocalError: local variable 'thin' referenced before assignment

If you’re willing to read some docstrings, you can probably get
something working.

PyStan has a function pystan.chains.ess which returns the effective
sample size for a single parameter given several chains.

I am interested in the ESS per second as well as the ESS. I am getting what I need from CmdStan now, so I think it is best to treat this as resolved.

That is a bug in ArviZ. I need to fix it.

I think from_cmdstan reads sampling time to posterior attributes. I need to check this.