[Interface roadmap] fit objects and `extract`

I don’t like that we would start to remove dimensions if they are 1. That is going to be real pain in the long run (because we don’t have named dimensions).

For example, user takes a parameter and transforms it to array with shape of (20, 3, 4, 4000) and shares it with another user.

Was the original shape 20,3 or 20,3,4?

If a parameter has a dimension of 1 somewhere then there should be an array index for it. I was just saying that functions in the interface can keep the distinction between the last two dimensions when doing diagnostics and ignore the distinction between the last two dimensions when doing summaries, but that is only possible without reallocating memory when the chains and main iterations are the last two dimensions.

Oh yes, my mistake. You have 1000 vs 4000.

I’m still confused about what this N x K x 4 x 1000 notation corresponds to, especially since multiplication is commutative 😛. Since you said my example is backwards I’m guessing that might be corresponding to 11.1, 11.2, 21.1, 21.2, 12.1, 12.2, 22.1, 22.2 for the table above, is that right?

For a parameter declared as

real theta[2];

the resulting array would be 2 \times \mbox{chains} \times \mbox{iterations}. In R and Python, the first six iterations on two chains would look like

> array(LETTERS[1:24], dim = c(2, 2, 6), 
        dimnames = list(theta = 1:2, chain = 1:2, iteration = 1:6))
, , iteration = 1

     chain
theta 1   2  
    1 "A" "C"
    2 "B" "D"

, , iteration = 2

     chain
theta 1   2  
    1 "E" "G"
    2 "F" "H"

, , iteration = 3

     chain
theta 1   2  
    1 "I" "K"
    2 "J" "L"

, , iteration = 4

     chain
theta 1   2  
    1 "M" "O"
    2 "N" "P"

, , iteration = 5

     chain
theta 1   2  
    1 "Q" "S"
    2 "R" "T"

, , iteration = 6

     chain
theta 1   2  
    1 "U" "W"
    2 "V" "X"

where, for example, U & V are the realization of theta on the sixth main iteration of the first chain.

Okay, I’m not good enough at R to easily parse that, let me try popping back to just a table and a list to try to work with the example. So here you’re using letters instead of numbers and saying the list is just A-X in order, [A, B, C, D, E, F, G, H...]. We’d parse that list as the following table such that the thetas are nested within chain which are nested within iteration like so:

iteration chain theta.1 theta.2
1 1 A B
1 2 C D
2 1 E F
2 2 G H

Is that right?

Here is some viz converning this issue. It’s in row-major, but basically same as F, just in another direction.

https://eli.thegreenplace.net/2015/memory-layout-of-multi-dimensional-arrays

Thanks for posting the nice visualizations. I just want to make sure can explicitly write out our own example for the roadmap doc that is exactly what we want so we make sure we are all on the same page and we make things easier for implementers.

1 Like

Ah, there’s something I really like from the link that I’ll add to the doc - we can talk about which indices “change the fastest.” Here I think we are saying that StanDimension changes fastest, chain 2nd fastest, and iteration 3rd fastest.

Right now, as.array() on a stanfit gives an array of iterations x chains x parameters.

m <- rstanarm::stan_glm(Sepal.Length ~ 1, gaussian(), iris)
dimnames(as.array(m$stanfit))
#> $iterations
#> NULL
#> 
#> $chains
#> [1] "chain:1" "chain:2" "chain:3" "chain:4"
#> 
#> $parameters
#> [1] "(Intercept)"   "sigma"         "mean_PPD"      "log-posterior"

1 Like

Yes, what we are doing now is how it has been done since the 1990s and we want to change it for Stan3.

Cool! So reading between the lines a little here it sounds like this post got the ordering right. I’ll write something up with that and also describe it with that “which index changes fastest” idea.

Thanks all!

I would suggest asking @andrewgelman, who often has opinions on interface issues.

On the specific topic of fit$extract(...) as extract(fit, ...), they’re interchangeable to me.

On the other hand, fit$theta feels odd to me as it feels like the structure of the fit object changes.

I’m all trial and error and str() in R, so it doesn’t matter that much to me.

As for StanDimensions x Chains x MainIterations, yes, that makes most sense to me (assuming we mean C-style indexing here, where we first provide a Stan dimension (picking out a single scalar parameter) then get a collection of chains for that parameter). It lets you get the chain x iterations for each parameter dimension, which is what we need for the posterior.

I almost never look at my own chains, so it’d be nice to also have the collapsed option of getting StanDimensions x Draws. It doesn’t need to be permuted.

Is the Stan fit object in R going to be like the current one in that it includes input, model binary, etc.?

What’s the purpose of flattening theta[2] (a size 2 array or 2D vector)? I won’t ever have a use for that. What I think we need is be able to give a parameter (say in string form with indexes as it’s done now, say theta[1]) and get either

  1. N chains, each consisting of M draws of theta[1], or

  2. N * M draws of theta[1].

I’ve never needed (1).

What I do find clumsy is having to use paste to manipulate strings:

print(fit, pars = c("alpha", paste("theta[", 1:5, "]"))
1 Like

The theta element of fit would be read-only.

No, I don’t think so. I’m not entirely sure because I think we will want to support a workflow where the user asks for additional main iterations after the first batch of iterations has been obtained, but we may just require the user to pass back the first batch.

Not much. I think the interfaces would be much better if users could access things with the Stan dimensions preserved. It is more if the user wants all of the parameters flattened into a matrix.

I’d like to have fit$theta because this comes up a lot! I’d also like to do something like foo(fit$theta, median) or foo(fit, “theta”, median) that would grab the median over the simulation draws (or it could be foo(fit, “theta”, mean) or foo(fit, “theta”, quantile, c(0.25, 0.75)) etc. Currently I waste a lot of time in R grabbing simulations, then trying to figure out how to use “apply” or whatever to get summaries. This is a particular issue if theta is a vector or matrix or array. What I’d generally like is pointwise summaries; e.g., if theta is a 2 x 3 x 4 array, then fit$theta would return a n_sims x 2 x 3 x 4 array, and foo(fit, “theta”, median) would return a 2 x 3 x 4 array.

I’m also thinking the default would be to concatenate all chains together, but of course there should also be an option to grab one chain at a time, which could come up for diagnostics or whatever.

1 Like

I meant that the signature of the object varies based on which variables are in the Stan program. Not that it’s mutable. Though I’m happy it’s not mutable.

My background in Java and C++ makes this seem weird, as they have fixed signatures defined at compile time.

I think R: fit$theta/ Python: fit['theta']/ fit.theta are quite common (at least in dynamical languages)

Just think about it as the parent class exists at installation time which has some basic methods defined and the child class is created at runtime that has additional accessor methods for the parameters / transformed parameters / generated quantities that were actually declared in the Stan program.

1 Like

Gotcha, so essentially have we described the behavior of as.Matrix(fit) w.r.t. the ordering of the flattened output / raw storage? And we’ll have fit.theta or fit$theta for accessing unflattened StanDimensions with different chains merged in / flattened (ie. like that wiki page says, theta = fit["theta"] # shape (param_stan_dimensions, num_draws * num_chains) NEW!).

I’ll add that in.

interesting - it seems like foo is basically just apply with the correct MARGIN chosen, right? I guess we could add a method that did that to the fit object, maybe fit$apply(FUN) (and fit$apply_chains(FUN)`)? What do you think of that, @bgoodri and @ariddell?