I think R: fit$theta
/ Python: fit['theta']
/ fit.theta
are quite common (at least in dynamical languages)
Just think about it as the parent class exists at installation time which has some basic methods defined and the child class is created at runtime that has additional accessor methods for the parameters / transformed parameters / generated quantities that were actually declared in the Stan program.
Gotcha, so essentially have we described the behavior of as.Matrix(fit)
w.r.t. the ordering of the flattened output / raw storage? And we’ll have fit.theta
or fit$theta
for accessing unflattened StanDimensions with different chains merged in / flattened (ie. like that wiki page says, theta = fit["theta"] # shape (param_stan_dimensions, num_draws * num_chains) NEW!
).
I’ll add that in.
interesting - it seems like foo
is basically just apply
with the correct MARGIN
chosen, right? I guess we could add a method that did that to the fit object, maybe fit$apply(FUN)
(and fit$apply_chains(FUN)`)? What do you think of that, @bgoodri and @ariddell?
We’re going to overload common functions that take all draws and return a scalar, so that median(foo$theta)
would have the same dimensions as theta
in the Stan program.
That works, but isn’t it easier to just add the apply
method to fit
? Is Andrew somewhat alone in wanting to do it that way? I also think in Python it would be suboptimal to monkey patch numpy (or whatever) functions in this way for this use case, is that true @ariddell or @ahartikainen? If that’s a no-go then fit$apply
might be more cross-platform amenable?
You can do apply(foo$theta, ...)
as well, but I think we would want to overload mean
, median
, etc. to do that. No one wants the median across both the main MCMC iterations and elements of theta
. I don’t think foo(fit, "theta", FUN = median)
is a thing.
In numpy ufuncs
we have axis
. Stan stats functions should be wrapped as ufuncs.
for fit.apply(fun, ...)
maybe, what functions should it eat? Any that takes 1D array?
This would probably be a small for loop, (parallelized with ProcessPool / ThreadPool (/dask?).
Just to reinforce this discussion . . . I was fitting some Stan models recently, and again I wanted to grab the fitted parameters and posterior simulations. Going through and grabbing with as.matrix() or extract() is annoying. And, more to the point, it takes what should be the very natural statistical step of grabbing estimates, and turning it into a laborious and distracting computational step involving lists and arrays.
So I think this set of functions would be a huge benefit for Bayesian workflow using Stan.
Yep, I think so. It would basically just provide the information about which axes to perform the function over. It could be the underlying thing that an overloaded median
et al call to actually do the computation. @bgoodri what do you think about such a method on a fit object?
Yes, then it would be similar we currently have for ArviZ wrap_xarray_ufunc
which is quite flexible (it wraps ufuncs which can take wanted dimensions; our rhat / ess etc use this)
I don’t see why we need something like an apply
method that unlike the apply
function also takes a character string indicating what parameter it applies to when someone can just do apply(fit$theta, MARGIN = 1, FUN = mean)
if they want to apply the mean
function over the first dimension.
And that will get them means of theta flattened across all chains? Seems fine to me. I guess there’s some nicety to having things like that be discoverable via methods, but that also seems like pretty standard R, so.
You can flatten over whatever dimension(s) you want, although most of the common reduction functions would flatten over main iterations and chains.
@andrewgelman: what would you like to see instead of extract
and as.matrix
?
I’m not sure what you mean by “grabbing estimates”. Doesn’t extract
give you the posterior draws? There are other functions in RStan to extract means, etc., if you mean point estimates.
Do you know this vignette on dealing with fit objects?
https://cran.r-project.org/web/packages/rstan/vignettes/stanfit-objects.html
My guess is that this is what @andrewgelman means when he says:
Bob:
From earlier in the thread:
What I’d generally like is pointwise summaries; e.g., if theta is a 2 x 3 x 4 array, then fit$theta would return a n_sims x 2 x 3 x 4 array, and foo(fit, “theta”, median) would return a 2 x 3 x 4 array.
Just to further clarify:
-
Yes, extract gives the posterior draws, so I think extract(fit)$theta will return that n_sims x 2 x 3 x 4 array. But I had the feeling we were being discouraged from using “extract”.
-
I’d like that foo function so that foo(fit, “theta”, median) would return a 2 x 3 x 4 array. Currently, “foo” does not exist, and it’s awkward to grab these. Again, one of the challenges here is that I’d like to do this for arrays, not just scalar parameters, whcih is why as.matrix will not solve this problem.
Two more things:
1a. We can do extract(fit)$theta to get that n_sims x 2 x 3 x 4 array. I’d also thought it could make sense to do it as extract(fit, “theta”). But for some reason now when we do it that way in R, we have to do extract(fit, “theta”)$theta. That is, extract returns a list, and even fi only one named parameter is extracted, it still makes it a list with one item; hence as a user I need to do extract(fit, “theta”)$theta or extract(fit, “theta”)[[1]]. Maybe that’s just the way R is, it’s no big deal.
1b. I’d also like extract_one_draw(fit, i) which returns the list of all the parameters for one draw (the i’th draw). For example, if alpha is a scalar, beta is a vector of length 10, and theta is a 2 x 3 x 4 array, then
extract_one_draw(fit, 34)$alpha is a scalar
extract_one_draw(fit, 34)$beta is a vector of length 10
extract_one_draw(fit, 34)$theta is a 2 x 3 x 4 array,
all corresponding to the same simulation draw.
These are equivalent to extract(fit)$alpha[34], extract(fti)$beta[34,], and extract(fit)$theta[34,], but the point is that we can grab them all at once. This is super-important for propagating uncertainty and for probabilistic programming in general.
Because no one will know what margins to use:
Gotcha. So I think maybe the solution here is to view the interface roadmap as a set of minimal functions required and let @bgoodri or other RStan contributors like @jonah decide if they’d like to add the functions @andrewgelman is asking for. I think this roadmap unification is an attempt to find the maximal common ground between the interfaces that their leads are already thinking about rather than a top-down prescription.
[edit] I have to say, I am sympathetic to the desire to have the extra fit$apply
method just because it can apply the right margins automatically and it doesn’t cost us anything.