Replacement for permuted=TRUE (RStan 3 / PyStan 3)

I think we would still have pars and include arguments to the fitting functions like we have done in RStan for a long time. But the implementation of that has always been sort of fake. It actually does store them (and computes the means) and then drops them before returning. There should be some common mechanism in the Stan library to accumulate the marginal means and variances of everything and for the interface to know to skip storing parameters that are not wanted.

That makes a lot of sense. Any guidance on what would make sense to move into Stan?

For means and variances, that should be easy to call the corresponding accumulator after it calls write. We need to decide how to handle these accumulated means and variances when the sampler is done. I guess CmdStan would write them to a separate file but PyStan and RStan should store them in the objects being returned.

It seems like skipping the storing of some things has to be up to the interfaces. More generally, it should be easy-ish for the interfaces to take in the long vector of parameters and store parts of it in different holders and / or omit parts of it.

Iā€™d mildly prefer that stan::services handle the exclusion of parameters.

As for permuted=TRUE, Iā€™m not sure we reached a resolution. Maybe we
need some user testing?

For now, internally, Iā€™m going to have the fit object store the chains
as a num_chains x num_samples x num_flat_params array (column-major in
memory), since thatā€™s the precedent set by stan::mcmc::chains. As Ben
has mentioned, itā€™s easy to give users a view into this array with a
different shape.

This is completely incorrect. We have iterations which are sequential states of a Markov chain. The states themselves are elements of the sample space (and then dependent values including constrained parameters, transformed parameters, and generated quantities). Then we have the sequential states of the sampler itself which can be considered as an extended sample space.

We very much want to compute expectations of the sampler states just as we would with the parameter states. Thatā€™s where we get average accept probabilities, and even E-BFMI. Even Rhat is valid as the sampler behavior should be the same for all chains when the samplers are performing well.

And this is before considering plotting applications where we need to be able to correlate the sampler parameters with the sample space parameters.

If I understand correctly, I had already anticipated this. In the rstan3 prototype


every quantity that is saved can be a diagnostic or from one of the blocks in the Stan language. So, we would store energy for example pretty much like any other real scalar, except with type = diagnostic. So, you could do fit$energy, fit$samples("energy"), fit$diagnostics(), ā€¦

I would be okay with this, but the default has to include everything. We need to facilitate users acknowledging and playing around with the diagnostics so that they better understand both when they can trust their fits and how to go about debugging bad fits.

We donā€™t just have elements of the sample space or the extended sample space. There are things like divergent__ and stepsize__, and treedepth__ (which is a function of n_leapfrog__). But letā€™s just focus on the technical aspects of the design.

Hereā€™s what we have per iteration:

  • iteration number
  • unconstrained parameters
  • sampling parameters that are part of the extended sample space
  • additional information about the iteration from the sampling algorithm (e.g. divergent__)
  • constrained parameters
  • transformed parameters
  • generated quantities

Thereā€™s no reason all of these have to be jammed into the same structure. I know itā€™s tempting, but letā€™s move away from building god objects. Weā€™re currently jamming the sampling parameters, additional information about the iteration, and the constrained parameters + transformed parameters + generated quantities into one std::vector<double> when some of these things actually live in separate structures within the code base and should just be written out separately. For the things that arenā€™t, maybe we can also have those be modular. Weā€™re currently concatenating std::vectors and weā€™re also promoting things like tree_depth__ to double.

By making these modular, we can also add more information in the future. We will also have a better way to deal with new algorithms. If anyone has an argument against this sort of modularity, Iā€™d be interested in hearing it.

Well I disagree COMPLETELY. This is not some game in pull a fun pattern out of a text book and talk about modularity being fun and hate on god objects.

A complete state of the sampler CANNOT be reconstructed without all of this information, which his why the underlying C++ spits it all out TOGETHER. This naive goal of splitting everything up is why debugging poorly fitting models in RStan and PyStan is a complete and utter mess right now. A reasonably robust analysis requires a ridiculous amount of bespoke code to get at the right information, and only the most dedicated users will end up doing their analyses correctly. This alone should be enough to motivate why splitting up the information is a terrible design decision.

The number one priority should be providing users with the information they need, and so long as HMC is the default algorithm in Stan then all of this information is going to be necessary.

Mike, please keep it civil. Iā€™m not trying to troll you and thereā€™s no need for this sort of antagonism.

Want to reread what I posted and comment on the technical points? It seems like you missed all of that. Nothing I mentioned goes counter to what you said and it will make what you want to do easier.

1 Like

Just to summarize, I see several issues.

  1. Do we have one index (for the draw) or two indexes (for the draw and the chain)?
  2. Do we keep the underlying shapes of the Stan object in the output? That is, if I have declared a matrix type in Stan, do I get that 2D structure back in the results?
  3. Do we throw the diagnostics together with the parameters in a combined structure?

For (1): I donā€™t ever need the chain ID. But all the internals, like ShinyStan are going to need it and Iā€™d rather have it be explicit. Can we have a version that collapses and a version that doesnā€™t? One thing I donā€™t like about the current extract() is that the output shape changes based on whether permuted=TRUE or permuted=FALSE.

For (2): I do like that the output of extract has the structure.

For (3): I found it easier back when it was all combined. The whole ā€œgod objectā€ thing is about combining heterogeneous structures into a global blackboard-like object, but thatā€™s not what weā€™re talking about here. Weā€™re talking about mapping names to values on an iteration-by-iteration basis. These names have many flavors:

  • parameters (continuous)
  • transformed parameters (continuous)
  • generated quantities (discrete and continuous)
  • diagnostics (boolean, discrete, and continuous)

Right now, if thereā€™s a categorical discrete output in generated quantities, it makes no sense at all to compute its mean (well, maybe if itā€™s ordinal).

If we do wind up separating these things somehow in the output, then itā€™d be nice to have exactly the same interface to pull the diagnostics out as the parameters, transformed parameters, and generated quantities.

Re 1): What I am trying to emphasize is that in R, if, for example,

x <- 1:30

you could do

dim(x) <- c(5,2,3)
x

which results in

, , 1

     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

, , 2

     [,1] [,2]
[1,]   11   16
[2,]   12   17
[3,]   13   18
[4,]   14   19
[5,]   15   20

, , 3

     [,1] [,2]
[1,]   21   26
[2,]   22   27
[3,]   23   28
[4,]   24   29
[5,]   25   30

Or you can do

dim(x) <- c(5,6)
x

which results in

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    6   11   16   21   26
[2,]    2    7   12   17   22   27
[3,]    3    8   13   18   23   28
[4,]    4    9   14   19   24   29
[5,]    5   10   15   20   25   30

So, if the output is stored as the Stan dimensions x iterations x chains that is the same as storing it as the Stan dimensions x draws. Users will tend to prefer iterations x chains collapsed into draws and convergence diagnostics functions can easily chop draws into iterations x chains without copying large objects. The

Re 2): I agree that it should be Stan dimensions x something and I think only Allen has voiced disagreement on the grounds that stan::mcmc::chains.hpp has expected things to be chains x iterations x unknowns. But stan::mcmc::chains.hpp is not user-facing and could be changed anyway. I think if a R user wanted something with the unknowns flattened, the would usually be doing as.matrix or as.data.frame, which almost have to be draws x unknowns. Arguments could be made various ways as to what as.array should do. Leading with the Stan dimensions makes it easy to append more chains to the end.

Re 3): To me, this is a CmdStan problem that has been metastasized into a flamewar. For RStan and PyStan, only one object is going to be returned when someone does foo <- fit$sample() but foo is going to have a field for each piece of information that comes back from the C++. So, for RStan and PyStan, it is just a question of how to access it and what gets shown by default in the summary (my opinion is that there should be warnings like there are now when there are divergences, transitions that hit max_treedept__, or chains with low BFMI, but the user doesnā€™t need to see the mean, mcmc_se, sd, quantiles, etc. for things like n_leapfrog__). For CmdStan, it is an actual issue as to whether to write things to one file or several. But even for StataStan / JuliaStan / MatlabStan / MathematicaStan that wrap CmdStan, they can be programmed to read from one file or many pretty easily.

For @Bob_Carpenterā€™s (1), the underlying library should track two indexes if itā€™s running multiple chains, which may be possible with C++11. I think it makes sense that itā€™s tracked in the interfaces.

For @bgoodriā€™s (1), I missed what was the user preference even though you wrote it out as ā€œUsers will tend to prefer iterations x chains collapsed into draws.ā€ For your example, which is what R users will prefer?

For (2), it sounds like weā€™re in agreement. We absolutely have the freedom to reimplement stan::mcmc::chains.hpp ā€“ I think I hacked that together in a weekend early on and I didnā€™t really expect it to last this long.

For (3), first off, sorry that itā€™s become a flamewar. Thereā€™s no reason for it.

I agree with @bgoodri. From the underlying library, it should be easier to separate it out as what Bobā€™s suggested with the addition of the additional sampler parameters. Right now, weā€™re concatenating std::vector<double> in order to write out these things. Iā€™m just suggesting we donā€™t and just write them separately. See mcmc_writer.hpp

@Bob_Carpenter, in CmdStan, there are actually two files ā€“ the diagnostic.csv. The second one defaults to not writing out, but when it does, it writes out the sampler params and the unconstrained values.

And, as @bgoodri said, even though the output may be written separately, itā€™ll be kept together in RStan and PyStan. In CmdStan, itā€™ll probably still be kept in one output file even if there are separate writers for it.

No, I did not miss anything. You completely ignored what I am saying which is evident, for example, by your using the phrase ā€œextended sampling spaceā€ as some mysterious new space that doesnā€™t include sampler parameters such as divergences.

Modularity implies a decomposition into independent units, and this is fundamentally not what we have in MCMC. To be formal we have two base sources of information at each iteration,

  • sampler information (chain, iteration, diagnostics)
  • unconstrained parameters

then from the latter we have the sequence

  • unconstrained parameters -> constrained parameters -> generated quantities.

Altogether we have the dependences,

sampler_info
Ė†
|
Ė‡
unconstrained_params -> constrained_params -> generated_quantities

These can be presented modularly to the user/implemented modularly in the underlying code only when the dependencies can be ignored, but this is very much the case. Firstly, as I noted above but everyone seems to be ignoring, expectations of sampler parameters are useful in diagnosing problems. Secondly, correlations between unconstrained and constrained parameters and generated quantities are critical in diagnosing problems with poor fits.

Importantly this is not a theoretical argument. Putting together case studies and exercises for workshops is a mess because the current RStan and PyStan APIs treat the sampler information and parameters as independent, splitting the information up and making it a massive pain to reconstitute and institute a robust workflow. I canā€™t imagine that anyone using those interfaces is running HMC robustly at the moment.

To comment on something @bgoodri mentioned in a later post, canned warnings are insufficient. Not only are the warnings hard to control (they behave differently in the R app, RStudio, knitr, etc) they cannot be recovered after the fact and they censor information. More utility functions provided to the user can compensate for the former, but any such implementation will have to tackle reconstituting the information which again argues that the information is too tightly coupled to be ā€œmodularā€. The latter requires that the coupled information be readily available to the user which, again, demonstrates the coupling of the information.

And as everything is currently implemented there are no problems with adding additional information, either in terms of new sampler diagnostics or new parameters. The code readily handles the different sampler diagnostics in dynamic HMC and static HMC without any custom code. Everything is fused together and passed along with the correct coupling.

I completely agree. I think for RStan/PyStan we should/will have

fit$sample()  # or whatever the final name is

return the draws for everything (i.e., not separating out sampler diagnostics into a separate object with a separate access method).

Having to call get_sampler_params and work with the object it returns certainly discourages all but the most advanced users from interacting with the sampler information at all. Returning everything in the same object will also make some of the shinystan and bayesplot code simpler, as they currently treat the parameter draws and sampler diagnostic draws separately in the code as a result of RStan.

2 Likes

Do we have some mark/special_name how one can separate parameter names from sampler diagnostic names

Double underscore. So, e.g., energy__ is not a valid name in the parameters block of a Stan program.

2 Likes

I think all trailing double underscore are disallowed in the language, but Iā€™d need to check the manual again to be sure.

We wouldnā€™t need to have those as Stan keywords if we fixed the underlying Stan library.

What we donā€™t have are easy ways of accessing the unconstrained parameters.

The naming conventions are not ideal. In RStan3, the type slot of a StanType can be

  1. data
  2. parameter
  3. transformed parameter
  4. generated quantity
  5. diagnostic

Things like energy__ would be of type diagnostic and you can access its posterior draws the same way you would access the posterior draws of anything else, i.e. draws$energy.

1 Like