Question, do any of the interfaces allow a return summary of parameter(s) sized like in Stan?

Wondering if any interface gives the option to return a summary like mean, median, etc. in the same format as written in the Stan program? I know it’s not too hard to munge into these but I’m getting tired of doing it! It seems like something I may have missed that is already available.

For eg

matrix[5, 5] x[2]

would return in R as a list of size 2 where each element in the list is a 5 x 5 matrix? I get that this conversion from Stan types to language X can be a bit ambiguous. Even in R, should it be an array of 2 x 5 x 5? Well, I want to know if anyone has taken the time to think through and make these decisions already :).

For n-d arrays that contain reals, vectors, matrices it seems a natural return in R is a nested list until you get to the real, vector, matrix. Arrays that are just arrays return arrays. Reals, vectors, matrices just return same size as themselves.

3 Likes

I think maybe you’re looking for the newish rvars format

6 Likes

rvars is cool! I’ll play around with it. Everyone should check it out rvar: The Random Variable Datatype • posterior (mc-stan.org).

It seems like it will make doing predictions on new data much easier too.

4 Likes

hi, CmdStanPy CmdStanMCMC object has methods stan_variable and stan_variables. - the latter gives you a dict over all the model variables

4 Likes

I did also think of this, but @spinkney wanted means/medians etc.

But, once one has a numpy array it’s just a matter of np.mean(x,axis=0) or such, so I guess that’s great.

we (Jonah, Rok, and I) try to keep CmdStanPy and CmdStanR as similar as possible, but CmdStanPy is more minimal - by design - because, as you note, once you’ve got a numpy array, then you can use numpy to get the stats.
my assumption, possibly wrong, was the Python users will already be using numpy to analyze their data. also, there’s arviz for downstream processing.

2 Likes

is each object in the dict the same size of the stan object? So, if I have mat = matrix[5, 5] and vec = vector[10] would the python dict have key-pair mat and vec where I ask for mat and it returns a 5 x 5 x iters array? Then I’d just do np.mean or whatever to get the average?

exactly.

plus, the CmdStanMCMC object provides metatdata giving you the dimensions of all variables as well, plus mapping from variables to columns in the draws array - propoerties stan_vars_dims et al.

2 Likes

Also FYI, I’m working on cmdstan-CSV-to-InferenceData converter that I’m hoping to have done this week. (Well, and R-based converter; pretty sure there’s an existing python-based one)

2 Likes

I think arviz.summary basically does what you want already. InferenceData always preserves the dimensions of the data/parameters, which is one of the main reasons ArviZ is built on top of xarray.

Here is one example using the 8 schools model:

import arviz as az
idata = az.load_arviz_data("centered_eight")  
# arviz includes converters from pystan, cmdstanpy or cmdstan
summary = az.summary(idata, fmt="xarray")

The result is an xarray.Dataset (like the groups in InferenceData):

<xarray.Dataset>
Dimensions:  (metric: 9, school: 8)
Coordinates:
  * school   (school) object 'Choate' 'Deerfield' ... "St. Paul's" 'Mt. Hermon'
  * metric   (metric) <U9 'mean' 'sd' 'hdi_3%' ... 'ess_bulk' 'ess_tail' 'r_hat'
Data variables:
    mu       (metric) float64 4.093 3.372 -2.118 10.4 ... 250.3 642.9 1.027
    theta    (metric, school) float64 6.026 4.724 3.576 ... 1.016 1.016 1.014
    tau      (metric) float64 4.089 3.001 0.5692 9.386 ... 78.97 53.6 1.071

It reduces the chain and draw dimensions, and adds the metric one while keeping all other dimensions untouched, preserving both their shape, their names and their coordinate values.

4 Likes

yes, there’s a Python one in Arviz

2 Likes