Replacement for permuted=TRUE (RStan 3 / PyStan 3)

What should be the method of accessing samples in RStan 3 / PyStan 3? I’m assuming we’re all happy with getting rid of permuted (see https://github.com/stan-dev/rstan/issues/345)

So we have a fit for eight schools and we are interested in getting samples of real eta[8];

We have our fit object, upon which we use extract.

extracted <- extract(fit)
draws <- extracted$eta

What shape is draws? Is it (num_chains, num_samples, 8)? Sounds right to me but this will be an adjustment for users.

Yes, we should definitely eliminate the concept of permuted.

The function name extract is bad in R because it clashes with an extract function in one of the hadleyverse packages. Actually, stand-alone functions in general are bad these days because there are 10000 packages to clash with and increasing by a few hundred every month.

So, I am pretty strongly in favor of

eta_draws <- fit$eta

rather than something like

eta_draws <- fit$extract("eta")

If you wanted all of the parameters, then you would be doing something with a specialization of a common generic function like as.matrix, as.data.frame, as.list, etc.

Also, eta_draws would be a instance of a S4 type that mostly mirrors the types in the Stan language, so you can do things like transform it to the unconstrained space.

As to what shape eta_draws is, I think it should be the same shape as in the Stan declaration, plus one additional trailing dimension that is iterations x chains. For things like convergence diagnostics, we need the ability to apply functions to the chains separately but for most things, we want to lump the chains together. Both uses are easy to accomplish in R because R is just storing it as a long numeric vector anyway and the dimensions are just metadata (attributes). So, you can get either shape without copying the values.

1 Like

And for all that is good in this world stop stripping away the sampler diagnostics. At this point I would also prefer having iteration and chain number added as extra columns so that we can completely avoid any ambiguity over the order of the samples.

2 Likes

This all sounds good. Default shape is the question: num_chains x num_samples x param_dim or num_samples x param_dim x num_chains ? (just thinking of scalar and vector params for now) Is there a right answer here?

1 Like

It doesn’t matter provided that there are functions to collapse the default output into parameter vs iteration.

Any suggestions for a function name to do this? Do we want to do this by default, assuming the vast majority of users will not care about draws by chain?

I think param_dims x (iterations_chains). This is different from the way it has been done, but it gives the opportunity to the calling function (mostly in packages) to set rownames and / or colnames easily so that those identifiers show up in the output rather than beta[7,3].

Depends on the desired workflow. Would the R-hat or E-BFMI calculation be done on the fit object or the extracted samples object? If the former then chain id information can be hidden in the fit object and extracted with special functions if desired by the user.

Both, I think. But typically the user would invoke fit$Rhat() or whatever and that would call Rhat on each variable that was saved, which would calculated it on each margin. For R-hat, you usually want it on everything, but for something like mean() you might want it for theta but not rho.

But we need to define a Rhat, n_eff, etc. S4 methods for the StanTypes in case someone does something like

exp_theta <- exp(fit$theta)
n_eff(exp_theta)

Are we going to store raw draws as an array or do we put them in to some subclass (python here). If we store them in a “draws” class we could in principle define function that need the chain info. This is something similar that PyMC3 does (MultiTrace object).

Some examples.

fit.theta.draws # raw array
fit.theta.rhat # value
fit.theta.chain_0 # array
fit.theta.mean() # value
fit.theta.mean(chains=True) # value array
fit.theta.ci(alpha=0.05)
1 Like

I think we all agree that these fit objects should support all sorts of functionality and reshaping operations. Sounds like this will not be a problem in R or Python.

I do think people should get the same default result in RStan and PyStan, so the question is what do we want

# R
fit$eta

and

# Python
fit['eta']  # and/or fit.theta

to return?

Seems like there are a few options:

  • Option A: num_chains x num_samples x param_dims
  • Option B-I: rearrangements of the dimensions above (but A seems natural to me)
  • Option J: num_samples x param_dims # chains collapsed
  • Option K: param_dims x num_samples # cahins collapsed
  • Option L: no default. You must use something like the PyMC multitrace attributes to specify more precisely what shape you want. So fit['theta'].draws for raw access and fit['theta'].collapsed for

Option J is identical to current behavior in Python (except the draws are permuted).

A significant utility of this kind of output is transportability to other plotting/analysis packages so we want to make the default output as simple as possible. If the user specifies by parameter name then it should be a single concatenated list/column of samples. But the user should also be able to specify multiple parameters, so I don’t like the fit$param_name pattern. Rather we should have something like fit$samples('name1', 'name2'). Then fit$samples with no named argument would return all of the samples including the sampler parameters and additional columns for chain number and iteration number (these will be especially useful if we ever end up with ragged chains). From this output R-hat and other multi chain diagnostics can all be calculated independently of Stan (for example, with bayesplot).

If people wanted more elaborate structures then there could be fit$samples_by_chain methods and so on.

In particular I think we should make the fitting as separate from the diagnostics/analysis as possible, moving away from the monolithic pattern of PyMC. Not only is this more modular it allows us to put more stuff in bayesplot hence get other MCMC tools to use the proper diagnostics.

Related:

stan::mcmc::chains.hpp stores chains in num_chains x num_samples x num_flat_params (memory in column-major order).

See: https://github.com/stan-dev/stan/blob/develop/src/stan/mcmc/chains.hpp

Ok, all this sounds good. Re: multiple params, so with Python we can do

fit['mu', 'eta']

without difficulty. What would you like this to return?

A dictionary with keys “mu”, “eta”, “chain”, “iter”. Or just a dictionary with keys “mu” and “eta” if people don’t want the chain and iteration info there by default. Actually I would be fine with “chain” and “iter” just being selectable parameter names in the call itself. So one could due fit['mu', 'eta', 'chain', 'tier']. And the sampler parameters would be available in the same way: fit['eta', 'divergent'].

This orientation (and constructs like fit$eta) is usually optimal for people (like Andrew) who want to do probabilistic programming stuff with the output. Other orientations are more suited for convergence diagnostics and so forth, but those tasks will almost always be handled by our own functions, in which case the objects can be reshaped internally.

For RStan, there are going to be S4 classes that mirror the possible types in the Stan language. These S4 classes contain an array slot to hold the raw draws. There hasn’t been much progress lately, but what progress there is is under rstan3/ on GitHub, for example
https://github.com/stan-dev/rstan/blob/develop/rstan3/R/AllClass.R#L255

Umm, don’t strip them, but perhaps don’t keep them in the same structure with the same methods. The sampler diagnostics aren’t draws, draws aren’t sampler diagnostics. We typically want quantiles, mean, sample variance, R-hat, estimated effective samples, and e-bfmi over the draws. We don’t want all of that over sample parameters.

This discussion overlaps with the discussion on breaking apart the writers.

If this were done object-oriented, then the return type of something like fit$eta would be something like a draws class that would have methods like r-hat, quantile, etc, whereas something like fit$tree_depth__ wouldn’t. I’m not suggesting these things be given separate classes in R or Python… I’m just trying to illustrate that they are different things.

+1.

@bgoodri and @ariddell, thoughts on how you want to deal with subsets of parameters? That’s currently done in RStan and PyStan, so I don’t want that to be an afterthought as this is being designed.

I think it makes sense to not keep all elements of a symmetric matrix. And there are mechanisms for letting users specify which parameters to keep / omit. Does the fit object keep both the full param dims and the subset around? Is there going to be a way to tell when a run is from a particular program? Two runs may have different param dims depending on runtime arguments.