Question, do any of the interfaces allow a return summary of parameter(s) sized like in Stan?

spinkney · May 26, 2021, 10:11am

Wondering if any interface gives the option to return a summary like mean, median, etc. in the same format as written in the Stan program? I know it’s not too hard to munge into these but I’m getting tired of doing it! It seems like something I may have missed that is already available.

For eg

matrix[5, 5] x[2]

would return in R as a list of size 2 where each element in the list is a 5 x 5 matrix? I get that this conversion from Stan types to language X can be a bit ambiguous. Even in R, should it be an array of 2 x 5 x 5? Well, I want to know if anyone has taken the time to think through and make these decisions already :).

For n-d arrays that contain reals, vectors, matrices it seems a natural return in R is a nested list until you get to the real, vector, matrix. Arrays that are just arrays return arrays. Reals, vectors, matrices just return same size as themselves.

mike-lawrence · May 26, 2021, 12:44pm

I think maybe you’re looking for the newish rvars format

spinkney · May 26, 2021, 2:24pm

rvars is cool! I’ll play around with it. Everyone should check it out rvar: The Random Variable Datatype • posterior (mc-stan.org).

It seems like it will make doing predictions on new data much easier too.

mitzimorris · May 26, 2021, 2:30pm

hi, CmdStanPy CmdStanMCMC object has methods stan_variable and stan_variables. - the latter gives you a dict over all the model variables

Funko_Unko · May 26, 2021, 2:53pm

I did also think of this, but @spinkney wanted means/medians etc.

But, once one has a numpy array it’s just a matter of np.mean(x,axis=0) or such, so I guess that’s great.

mitzimorris · May 26, 2021, 3:09pm

we (Jonah, Rok, and I) try to keep CmdStanPy and CmdStanR as similar as possible, but CmdStanPy is more minimal - by design - because, as you note, once you’ve got a numpy array, then you can use numpy to get the stats.
my assumption, possibly wrong, was the Python users will already be using numpy to analyze their data. also, there’s arviz for downstream processing.

spinkney · May 26, 2021, 3:12pm

is each object in the dict the same size of the stan object? So, if I have mat = matrix[5, 5] and vec = vector[10] would the python dict have key-pair mat and vec where I ask for mat and it returns a 5 x 5 x iters array? Then I’d just do np.mean or whatever to get the average?

mitzimorris · May 26, 2021, 3:17pm

exactly.

plus, the CmdStanMCMC object provides metatdata giving you the dimensions of all variables as well, plus mapping from variables to columns in the draws array - propoerties stan_vars_dims et al.

mike-lawrence · May 26, 2021, 4:05pm

Also FYI, I’m working on cmdstan-CSV-to-InferenceData converter that I’m hoping to have done this week. (Well, and R-based converter; pretty sure there’s an existing python-based one)

OriolAbril · May 26, 2021, 7:33pm

I think arviz.summary basically does what you want already. InferenceData always preserves the dimensions of the data/parameters, which is one of the main reasons ArviZ is built on top of xarray.

Here is one example using the 8 schools model:

import arviz as az
idata = az.load_arviz_data("centered_eight")  
# arviz includes converters from pystan, cmdstanpy or cmdstan
summary = az.summary(idata, fmt="xarray")

The result is an xarray.Dataset (like the groups in InferenceData):

<xarray.Dataset>
Dimensions:  (metric: 9, school: 8)
Coordinates:
  * school   (school) object 'Choate' 'Deerfield' ... "St. Paul's" 'Mt. Hermon'
  * metric   (metric) <U9 'mean' 'sd' 'hdi_3%' ... 'ess_bulk' 'ess_tail' 'r_hat'
Data variables:
    mu       (metric) float64 4.093 3.372 -2.118 10.4 ... 250.3 642.9 1.027
    theta    (metric, school) float64 6.026 4.724 3.576 ... 1.016 1.016 1.014
    tau      (metric) float64 4.089 3.001 0.5692 9.386 ... 78.97 53.6 1.071

It reduces the chain and draw dimensions, and adds the metric one while keeping all other dimensions untouched, preserving both their shape, their names and their coordinate values.

mitzimorris · May 31, 2021, 11:11pm

yes, there’s a Python one in Arviz

Topic		Replies	Views
Summaries of named variables in CmdStanPy? Interfaces	3	312	March 20, 2023
Naming help on function in CmdStanR and CmdStanPy to get variable from sample Interfaces posterior-package	25	1870	June 9, 2020
How to get model parameter summary of selected variables General	1	47	October 21, 2024
Extracting output of stansummary as an array PyStan rstan	5	1666	July 16, 2020
[Interface roadmap] fit objects and `extract` Developers	44	2356	September 17, 2019

Question, do any of the interfaces allow a return summary of parameter(s) sized like in Stan?

Related topics