Interface roadmap - last draft before ratification vote

After a few template defaults come into play, tracking down the source of bugs can become ridiculously difficult. (This is from someone that loves templates…) It also seems to reduce the effectiveness of unit tests in ensuring that code is structured properly.

5 Likes

Okay, just to make this concrete, we’re saying here that we can have fit[i,] satisfy the extract_one_draw interface that Andrew is asking for such that if we have two parameters, theta[3] and phi[4] then fit[i,] has length two and the first element is an array of length 3 and the 2nd an array of length 4, right? Versus as.matrix(fit)[i,] which would have returned something of length 7 in this case. Do I have that right?

As far as large projects go, there are 3 new interfaces on there and a new compiler. I’ll add a note about GPU compiler support for reducing transfers. I’ll also add a section about extending the Stan language to cover closures, ragged arrays, and sparse matrices, and whatever else was in your grant (?) since it seems likely we’ll get to that this year. The initial topics were a result of the people who were able to commit to coming to a roadmap meeting.

One topic I’d like to put on a future roadmap less focused on the interfaces would be extending the Stan language such that folks can write and distribute functions in Stan that are as efficient as they would be in C++. I was thinking changes like that would have required more participants than were interested in engaging with the process, but maybe some of them don’t require much conversation and can be done online.

I’ll go through and do another update pass when I hear back about fit[i,] and hopefully that’s pretty close to the end of this :)

Yes, fit[i,] can return a list of arrays and as.matrix(fit)[i,] can return a vector.

1 Like

So in python fit[i] should return dictionary of ndarray?

And something would return ndarray in a table format.

fit.theta[i] would still return ndarray?

It sounds like ndarrays can’t support the raggedness required for that, right? so in Python maybe there’s the fit.dataframe() or fit.matrix() or whatever version that has a square matrix where parameters have been flattened, but fit[i] returns nested Python lists bottoming out in Python’s equivalent to whatever underlying type it is; matrix -> 2d ndarray, vector/row vector/real[] -> 1d ndarray, scalars -> scalar, etc?

Another way to say that would be to map Stan arrays to Python lists and the rest to their obvious numpy equivalent.

I just realized I’m still confused about the interface (And I’m going to remove any notes about the internal storage order from the roadmap doc; the API unification shouldn’t be about internals).

@bgoodri So we’re saying that fit[i,] is iteration first it seems. What is fit$theta[i,] returning now in this world? Is it the first dimension of theta or the first iteration of theta?

I went through and did another big update pass; I think the main thing left to resolve is fit[i] vs fit$theta.

fit[i,] or fit[i] or whatever isn’t really iteration first because Andrew wants it to be a list of parameters that have the same dimensions as in the Stan program. So, if i were 4000, it would be pulling out the last main iteration of alpha, beta, gamma, etc.

fit$theta[i,] would really only work if theta were a (specialized) vector or row_vector in the Stan program in which return all 4000 draws for the i-th element.

1 Like

If I understand what @andrewgelman and @bgoodri are saying then here’s a concrete example of what fit[i] would be. Suppose in Stan we have

real alpha;
row_vector[2] beta;
matrix[3,3] Tau;

In R fit[i] would return a single draw of the parameters structured in a list like this:

draw_i <- fit[i]

print(draw_i)
$alpha
[1] 0.5

$beta
     [,1] [,2]
[1,]  0.2  0.8

$Tau
      [,1]  [,2]  [,3]
[1,] -0.29  1.52  0.34
[2,] -0.49  1.77 -0.29
[3,]  0.38 -0.68 -0.09
str(draw_i)
List of 3
 $ alpha: num 0.5
 $ beta : num [1, 1:2] 0.2 0.8
 $ Tau  : num [1:3, 1:3] -1.06 -0.54 -0.11 0.02 0.51 -1.55 -1.73 -0.67 -0.87

So indexing fit directly is in some sense iteration first (really iteration only) in that the index i refers to a single iteration, but there’s actually no iteration (or chain) dimension to the returned object.

But I suppose it could be confusing to users that directly indexing the fit object works differently than directly indexing fit$parameter_name, so that’s definitely something to consider.

2 Likes
real alpha;
row_vector[2] beta;
matrix[3,3] Tau;

For PyStan I would really like to be explicit about the shapes.

draw_i = fit[i]
draw_i["alpha"] # scalar
draw_i["beta"].shape == (1,2)
draw_i["Tau"].shape == (3,3)

Currently, we would have

draw_i["beta"].shape == (2,)

which is the same as for vector[2].

But, I’m not sure should we also force users to have explicit structure for the input. Currently, row_vector and vector can have the same input.

When sparse matrices land, I think PyStan should use scipy.sparse or sparse -module. https://sparse.pydata.org/en/latest/

For ragged arrays. I’m not currently sure what would be the best option.

cc. @ariddell

2 Likes

Also, i suppose it doesn’t have to be fit[i] if we think that will be confusing. We could have a method, e.g. fit$iteration(i), that accomplishes the same thing. whatever it’s called (doesn’t have to be iteration), it would avoid the standalone function problem while also avoiding any confusion introduced by indexing the fit object. It’s also more consistent with the “everything is a method” direction.

2 Likes

Agree with Chris here. I think default template parameter values kick off a whole other set of template instantiation logic that can get weird. The ones Sean mentioned below are probably okay since we tend to define those manually at the call site. Though imo I prefer template things to be more verbose

1 Like

What makes you want to have different shapes for row and column vectors? My impression from most users is that they don’t really care about the row / column distinction and will use whatever makes the Stan code easier, so I think forcing those to be different would be a disservice to that type of user.

@bgoodri, @andrewgelman, @ariddell / @ahartikainen how do you feel about fit$iteration(i) vs. fit[i]?

Shape issues are hard. I’m not saying we can’t ignore them, but there must be a way to extract parameters in the correct shape.

With iteration (like extract_one_draw) it is quite trivial to fix the shapes, but tbh advanced python folks rarely do iteration and actually use broadcasting e.g. dot product shape1 against shape2 along some specific dimension. This all goes wrong really fast when there is no deterministic rules for the shapes.

fit.iteration() might be more flexible. If we would go with fit[i] what should happen with slicing: fit[:i]?

“row vectors and column vectors are equivalent” is a deterministic rule for shapes, haha. What’s the case for distinguishing between them? numpy itself doesn’t care and lets you dot product two arrays of the same shape.

1 Like

(I wrote something here: the point of this is just say that things can get hard. I’m not saying we need to go with explicit order)

Stan code

parameters {
vector[3] arr1;
row_vector[3] arr2;
}

1D arrays

arr1 = np.array([1,2,3])
arr2 = np.array([1,2,3])

print(arr1.dot(arr2), arr1.dot(arr2.T), arr1.T.dot(arr2), arr1.T.dot(arr2.T))
# (14, 14, 14, 14)

2D array

arr1 = np.array([[1],[2],[3]])
arr2 = np.array([[1,2,3]])

print(arr1.dot(arr2))
# [[1 2 3]
#  [2 4 6]
#  [3 6 9]]

print("Shape:", arr1.dot(arr2).shape)
# Shape: (3, 3)

print(arr1.T.dot(arr2.T))
# [[14]]

print("Shape:", arr1.T.dot(arr2.T).shape)
# Shape: (1, 1)

Broadcasting

arr1 = np.repeat(np.expand_dims(np.array([[1],[2],[3]]), axis=2), 200, axis=2)
arr1.shape
# (3, 1, 200)

arr2 = np.repeat(np.expand_dims(np.array([[1,2,3]]), axis=2), 200, axis=2)
arr2.shape
# (1, 3, 200)

print(np.einsum("ijn,jkn->ikn", arr1, arr2)[:, :, 0])
# [[1 2 3]
#  [2 4 6]
#  [3 6 9]

print("Shape:", np.einsum("ijn,jkn->ikn", arr1, arr2).shape)
# Shape: (3, 3, 200)

And then

vector[3] arr1[10];
row_vector[3] arr2[10];

Should the output shape be
(10,3) & (10,3)
or
(10,3,1) & (10,1,3)

Also, truncating dimensions is not cool with complex models :)

So you’re suggesting that row vectors and column vectors both be represented with two dimensions (what TF would call “rank 2”), essentially as a matrix with either one row or one column? And this is because if we’re encoding nested arrays as essentially numpy tensors in a fit object, it helps when you end up needing to do tensor operations on them?

I say strong maybe here :)

Just saying we need to be careful with the input and output shapes and it should either follow the common strategy with the language (R, Python, Julia) or have similar interaction with all the languages (column or row major?)

Seems like a reasonable thing to think about. Anyone else have input on this idea that we should be representing column and row vectors essentially as matrices with 1 row or column in the interfaces given this previous post? Interface roadmap - last draft before ratification vote - #77 by ahartikainen

/cc @bgoodri @ariddell @jonah @Bob_Carpenter