Hey all,
I wanted to break out this topic from the larger interface roadmap draft. We need to decide what the posteriors returned from the fit objects will look like. I asked Ben about replacing “extract” and he wrote the following:
For RStan3, I don’t think there should be a standalone extract function (or many standalone functions for that matter) because functions with names like extract conflict with functions that have the same name in other packages.
There could be an $extract() method for the Reference Class that holds the output, but I think it would be kind of unnecessary. I think users would be able to do foo$theta to get the draws of thetarather than foo$extract(pars = “theta”) or something like that. The user wanted all or most of the parameters, then as.matrix, as.data.frame, etc. would presumably be more useful.
So, could go either way on an $extract method, but the main points (which I think were agreed upon in 2014) were that
- The permute argument should die
- The order should be StanDimensions x Chains x MainIterations so that it is easy to add more iterations to the back and so that you don’t have to do anything if you want to analyze the thing with the distinction between chains and main iterations erased.
Do you all still agree on this direction? Here I’ll attempt to write it up declaratively as it would appear in the end result roadmap document, noting places where I’m filling in specifics that weren’t specified anywhere and could use comment with italics and question marks. I was unsure about most of these details, so please correct me as needed.
Returning results
The language interfaces will return an object representing the results of fitting a Stan model (henceforth called a “StanFit” object, though the name isn’t important) that allows attribute access to each of the named parameters, transformed parameters, and generated quantities returned by running a Stan inference algorithm on data. The iterations will be stored flat (?) such that the first dimension is the MainIteration number, the 2nd dimension is the chain number, and the rest of the dimensions are the StanDimensions (? is this right or backwards? see example:). For example, if we have a two dimensional array parameter real theta[2];
and the following fit result containing two iterations on two chains:
iteration | chain | theta.1 | theta.2 |
---|---|---|---|
1 | 1 | 11.1 | 11.2 |
1 | 2 | 12.1 | 12.2 |
2 | 1 | 21.1 | 21.2 |
2 | 2 | 22.1 | 22.2 |
The results would be a single dimensional (flat) array as the following: fit.theta = [11.1, 11.2, 12.1, 12.2, 21.1, 21.2, 22.1, 22.2]
.
Each interface will also provide methods for translating into a 2d matrix or dataframe style object with the appropriate columns for chain id, iteration id, and each of the Stan dimensions of each of the parameters, essentially just like the table posted above. There will be no options to change the ordering (i.e. no permuted
or permute
flag), and there will be a separate method to retrieve the warmup samples (?).
//cc @bgoodri @ariddell @jonah @paul.buerkner @ahartikainen @Bob_Carpenter @mitzimorris
Particularly curious if we want them to be flattened or to have the theta[iter][chain][standim1][standim2][...]
structure. Thanks all!