Summaries of named variables in CmdStanPy?

Bob_Carpenter · March 17, 2023, 8:41pm

Is there any way to get a summary of a collection of variables by name like in RStan? The problem I have now is that summary only provides a few of the variables, but I want to investigate all of them other than y_rep(), which is most of the output. For example, I have a model with about 15 parameters and when I use .summary() on the fit object, I see this:

>>> fit.summary()

                   Mean      MCSE     StdDev           5%          50%          95%    N_Eff   N_Eff/s     R_hat
name                                                                                                            
lp__       -2314.430000  3.336650  37.751100 -2374.560000 -2316.120000 -2249.310000  128.008  0.219058  1.003970
u[1]           0.186183  0.021229   0.464021    -0.605620     0.201469     0.919997  477.777  0.817612  0.999643
u[2]          -0.157828  0.021724   0.592679    -1.125520    -0.156358     0.789620  744.299  1.273710  1.001500
u[3]          -0.785473  0.032570   0.509819    -1.596770    -0.788237     0.116052  245.017  0.419294  1.008590
u[4]           0.215903  0.019871   0.504749    -0.626732     0.232238     1.028460  645.209  1.104130  0.999067
...                 ...       ...        ...          ...          ...          ...      ...       ...       ...
y_rep[556]     0.907912  0.030717   0.923840    -0.622677     0.957355     2.328460  904.546  1.547930  0.999005
y_rep[557]     1.303210  0.031271   0.978371    -0.220924     1.328180     2.809030  978.847  1.675080  0.999449
y_rep[558]    -0.924537  0.030419   0.902428    -2.368720    -0.849168     0.507195  880.110  1.506120  1.000300
y_rep[559]     2.318210  0.042204   1.152030     0.370854     2.294010     4.204340  745.123  1.275120  0.999000
y_rep[560]    -0.073558  0.031857   0.917968    -1.513320    -0.132271     1.425860  830.315  1.420900  0.999807
[4026 rows x 9 columns]

There was a previous discussion here that just seems to punt on the issue and eventually recommends arviz: Question, do any of the interfaces allow a return summary of parameter(s) sized like in Stan?

In this topic: Cmdstanpy - extracting indexed parameters like log_likelihood

@ahartikainen suggests this indirect route:

summ = fit.summary()
rows = [key for key in summ.index if str(key).startswith("log_lik")]
print(summ.loc[rows])

That works, but it’s very brute force walking over every key and doing a string compare. But the real problem is that I’ll never remember how to do it next time (hence this post):

> fitsum = fit.summary()
> rows = [key for key in fitsum.index if str(key).startswith("alpha")]
> print(summ.loc[rows])

>>> print(fitsum.loc[rows])
          Mean     MCSE   StdDev       5%      50%      95%     N_Eff  N_Eff/s    R_hat
name                                                                                   
alpha  0.61041  0.07444  0.52416  0.07281  0.46768  1.61039  49.57603  0.08484  1.11267

What I would like to do is what I could do in RStan,

fit.summary(pars = ['alpha'])

WardBrian · March 18, 2023, 3:32am

The return value of summary is a pandas dataframe with every parameter. The fact that you’re only seeing some of them is just an artifact of how this table is printed in the REPL.

The output here is actually pandas parsing the CSV stansummary outputs, so Add argument to only output some parameters from `stansummary` by WardBrian · Pull Request #1143 · stan-dev/cmdstan · GitHub would allow what you’re requesting directly, but you can also just apply standard pandas indexing to the currently returned table

WardBrian · March 20, 2023, 2:04pm

In particular, if you know the precise name of the variable (e.g. it’s a scalar like alpha, not one with [...] afterwards, you can use

fitsum[fitsum.index == 'alpha']

If you want to do a search, I think the following is what Pandas would recommend over the list comprehension:

fitsum[fitsum.index.str.startswith("my_variable")]

ahartikainen · March 20, 2023, 2:27pm

Or maybe use filter

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.filter.html

fitsum.filter(like="my_variable", axis=0)

or

fitsum.filter(regex="my_variable[0-3]", axis="index")

Topic		Replies	Views
Question, do any of the interfaces allow a return summary of parameter(s) sized like in Stan? Interfaces	10	575	May 31, 2021
Naming help on function in CmdStanR and CmdStanPy to get variable from sample Interfaces posterior-package	25	1864	June 9, 2020
I can't get summary of my model CmdStan	2	697	July 29, 2022
Mangled variable names with rstan::As.mcmc.list interferes with tidybayes RStan	2	503	April 22, 2020
Summary function for accessing the Stanfit object General	5	598	January 22, 2021

Summaries of named variables in CmdStanPy?

Related topics