New array declaration syntax

The current syntax for declaring an array of matrices is not ideal.

matrix[I,J] mat[N];

This declares that mat is a array with N elements; each element being an I\times J-matrix.

The problem is that the type is split into matrix[I,J] and [N]. A pull request on stanc3 repo proposes to allow a contiguous type and deprecate the current array-size-after-identifier syntax.


The proposed syntax for the type is simply matrix[I,J][N]. This is logical (because T[N] is a size N array of elements of type T) but potentially confusing (because the dimensions are in different order when indexing elements as mat[n,i,j]). I didnā€™t find any discussion of this choice so Iā€™m opening a poll here.

How to declare an array of I\times J matrices?

  • matrix[I,J][N] mat;
  • matrix[N,I,J] mat;
  • matrix[N;I,J] mat;
  • matrix[N|I,J] mat;
  • matrix[N][I,J] mat;

0 voters

The second option is there just to catch people who arenā€™t paying attention. cholesky_factor_cov[2,2,2] would be ambiguous because currently cholesky_factor_cov can have either one or two dimensions (one dimension declaration denotes a square matrix, like it does for cholesky_factor_corr, cov_matrix and corr_matrix)

One thing that may be worth considering is compatibility with the proposed ragged array syntax.

row_vector[{3, 2}] w = { [a, b, c], [d, f] };

Itā€™s not clear to me if the plan is to allow composing ragged and rectangular dimensions

// ???
row_vector[2][{3, 2}] u = { { [a, b], [c, d], [d, f] }, { [g, h], [i, j] } };
// ?!?!?!
real[2, {3, 2}] v = { { {a, b, c}, {d, f} }, { {g, h, i}, {j, k} } };
1 Like

Will there be a corresponding change from

int x[N];

to

int[N] x

Also I think it should be matrix[N][I,J] mat_array because youā€™d reference it mat_array[n][i,j] (I think? This might not be true. Regardless, my vote is make the order respect how you would refer to an element.)

1 Like

Yes.

I was going to add that option but figured no one would want it or at least people who want matrix[I,J][N] would object vehemently.

Both mat_array[n][i,j] and mat_array[n,i,j] are valid and refer to the same element.

I think it would be confusing to need to put the array index in different places when declaring and when accessing. If I could recant my vote for matrix[I, J][N] I would.

2 Likes

I agree. I would found that the most logical.

2 Likes

Thanks for working on this!

I think itā€™s worth adding this option to the poll if itā€™s possible to edit polls (or start a new poll). I think Iā€™d lean towards voting for it for the reason given by @anon75146577.

1 Like

Just so other people donā€™t need to read the git discussion, here is @Bob_Carpenter reasoning out matrix[i,j][n]. https://github.com/stan-dev/stanc3/pull/560#issuecomment-645433810

And yeah. This is complicated. A really bad suggestion would be to make it something explicit like array<matrix[I,J], N> and arrary<int, N> instead. (Maybe with square brackets)

Thanks goes to @rybern itā€™s his pull request.

Discourse said I couldnā€™t edit it after five minutes but deleting it and making a new one in itā€™s place worked.

1 Like

As usual Bob makes good points. Now Iā€™m torn.

It is frustrating when people are always right :p I think I like the current version more than any of the proposals OR the more verbose array<type, dimensions> syntax (even though thatā€™s a bit of a BIG change, it is generalizable to arbitrarily complex types)

Actually. Maybe a different question: What is the proposed tuple syntax and can we harmonize this with that?

Tuple syntax is just

(vector[3], vector[3]) vec_t; // vec_t.1 and vec_t.2 are vectors

Tuples require contiguous types but thereā€™s no overlap otherwise.

so glad to see tuples coming along!

agree with Bob on this one.

I see people find Bobā€™s arguments persuasive. I agree ā€“ to an extent. In particular, it is the reason I consider matrix[N][I,J] unworkable. However, if I thought it was obviously the right solution I wouldnā€™t have started this thread.

Keep in mind that the pull request does not really implement the general rule that T[N] is always an array of T for any T. Types like real[2][3] and vector[2][3][4] are not allowed; they must be substituted by the synonyms real[3,2] and vector[2][4,3].
What I would like to have is to add vector[4,3|2] as a synonym for vector[2][4,3] and disallow the latter, just like the current PR disallows vector[2][3][4].
In my opinion making vector[N|K] shorthand for (vector[K])[N] is no worse than making real[N,M] shorthand for (real[M])[N].

I think it is he right solution within the category of solutions being considered, but worse than the current practice. And that the swapping of the indices will lead to no end of woe.

I might have missed this info, but is the implementation of tuples waiting on this, as in it cant be implemented with what we currently have? I might have misunderstood.

Seems likely to get messy to me, what are the arguments against explicitly calling for an array, such as
array[3] matrix[2,2] myFancyArray
? Is it just the extra typing? This would retain correct ordering and minimise confusion, I thinkā€¦ also function more similar to lists in R.

4 Likes

I initially thought matrix[N][I,J] mat; too but switched to matrix[I,J][N] mat; after seeing Bob show the multidimensional array case. I think the single dimension array looks cleaner with the first but once the dimensions of the array > 1 then I prefer Bobā€™s option and generality wins.

yes, entirely correct - see comment above: New array declaration syntax - #12 by nhuurre

1 Like

I like this a lot. The (minimal) extra typing seems like a small price to pay for the clarity (the types are clear and so is the order of indexing):

array[K1] vector[N] x1;              // x1[k1] is an N-vector 
array[K1,K2] vector[N] x2;           // x2[k1,k2] is an N-vector
array[K1,K2,K3] matrix[I,J] x3;      // x3[k1,k2,k3] is an IxJ matrix

@nhuurre @Bob_Carpenter @rybern (tagging people who were reviewing the PR) are there good reasons to avoid this?

6 Likes