New array declaration syntax

nhuurre · June 17, 2020, 6:44pm

The current syntax for declaring an array of matrices is not ideal.

matrix[I,J] mat[N];

This declares that mat is a array with N elements; each element being an I\times J-matrix.

The problem is that the type is split into matrix[I,J] and [N]. A pull request on stanc3 repo proposes to allow a contiguous type and deprecate the current array-size-after-identifier syntax.

The proposed syntax for the type is simply matrix[I,J][N]. This is logical (because T[N] is a size N array of elements of type T) but potentially confusing (because the dimensions are in different order when indexing elements as mat[n,i,j]). I didn’t find any discussion of this choice so I’m opening a poll here.

How to declare an array of I\times J matrices?

matrix[I,J][N] mat;
matrix[N,I,J] mat;
matrix[N;I,J] mat;
matrix[N|I,J] mat;
matrix[N][I,J] mat;

0 voters

The second option is there just to catch people who aren’t paying attention. cholesky_factor_cov[2,2,2] would be ambiguous because currently cholesky_factor_cov can have either one or two dimensions (one dimension declaration denotes a square matrix, like it does for cholesky_factor_corr, cov_matrix and corr_matrix)

One thing that may be worth considering is compatibility with the proposed ragged array syntax.

row_vector[{3, 2}] w = { [a, b, c], [d, f] };

It’s not clear to me if the plan is to allow composing ragged and rectangular dimensions

// ???
row_vector[2][{3, 2}] u = { { [a, b], [c, d], [d, f] }, { [g, h], [i, j] } };
// ?!?!?!
real[2, {3, 2}] v = { { {a, b, c}, {d, f} }, { {g, h, i}, {j, k} } };

anon75146577 · June 17, 2020, 6:54pm

Will there be a corresponding change from

int x[N];

to

int[N] x

Also I think it should be matrix[N][I,J] mat_array because you’d reference it mat_array[n][i,j] (I think? This might not be true. Regardless, my vote is make the order respect how you would refer to an element.)

nhuurre · June 17, 2020, 6:57pm

Yes.

I was going to add that option but figured no one would want it or at least people who want matrix[I,J][N] would object vehemently.

Both mat_array[n][i,j] and mat_array[n,i,j] are valid and refer to the same element.

anon75146577 · June 17, 2020, 7:01pm

I think it would be confusing to need to put the array index in different places when declaring and when accessing. If I could recant my vote for matrix[I, J][N] I would.

rok_cesnovar · June 17, 2020, 7:04pm

I agree. I would found that the most logical.

jonah · June 17, 2020, 7:07pm

Thanks for working on this!

I think it’s worth adding this option to the poll if it’s possible to edit polls (or start a new poll). I think I’d lean towards voting for it for the reason given by @anon75146577.

anon75146577 · June 17, 2020, 7:11pm

Just so other people don’t need to read the git discussion, here is @Bob_Carpenter reasoning out matrix[i,j][n]. https://github.com/stan-dev/stanc3/pull/560#issuecomment-645433810

And yeah. This is complicated. A really bad suggestion would be to make it something explicit like array<matrix[I,J], N> and arrary<int, N> instead. (Maybe with square brackets)

nhuurre · June 17, 2020, 7:11pm

Thanks goes to @rybern it’s his pull request.

Discourse said I couldn’t edit it after five minutes but deleting it and making a new one in it’s place worked.

jonah · June 17, 2020, 7:13pm

As usual Bob makes good points. Now I’m torn.

anon75146577 · June 17, 2020, 7:16pm

It is frustrating when people are always right :p I think I like the current version more than any of the proposals OR the more verbose array<type, dimensions> syntax (even though that’s a bit of a BIG change, it is generalizable to arbitrarily complex types)

anon75146577 · June 17, 2020, 7:17pm

Actually. Maybe a different question: What is the proposed tuple syntax and can we harmonize this with that?

nhuurre · June 17, 2020, 7:21pm

Tuple syntax is just

(vector[3], vector[3]) vec_t; // vec_t.1 and vec_t.2 are vectors

Tuples require contiguous types but there’s no overlap otherwise.

github.com

stan-dev/design-docs/blob/a215ea3e78a87e7582e2b77736028b599bdbda2f/designs/tuples_structs.md

- Feature Name: Tuples and Structs
- Start Date: 2020-04-23
- RFC PR: ??
- Stan Issue:

NOTE:  THIS HASN'T BEEN MERGED INTO A PROPER RFC FORMAT;  IT WAS JUST
COPIED FROM THE WIKI WHERE IT WAS FIRST POSTED

## Declaring Tuples

A declaration would look something like this:

```
(T1, ..., TN) x;
```

where `T1` through `TN` are sized type specifications.

<b>Issue:</b>  To make this fly, we need to have a way of declaring sized types contiguously.  Right now, declarations like `int x[3]` split the `int` and `[3]`.  Mitzi's working on this as part of the general refactor of the underlying type system.

This file has been truncated. show original

mitzimorris · June 17, 2020, 7:53pm

so glad to see tuples coming along!

agree with Bob on this one.

nhuurre · June 17, 2020, 8:22pm

I see people find Bob’s arguments persuasive. I agree – to an extent. In particular, it is the reason I consider matrix[N][I,J] unworkable. However, if I thought it was obviously the right solution I wouldn’t have started this thread.

Keep in mind that the pull request does not really implement the general rule that T[N] is always an array of T for any T. Types like real[2][3] and vector[2][3][4] are not allowed; they must be substituted by the synonyms real[3,2] and vector[2][4,3].
What I would like to have is to add vector[4,3|2] as a synonym for vector[2][4,3] and disallow the latter, just like the current PR disallows vector[2][3][4].
In my opinion making vector[N|K] shorthand for (vector[K])[N] is no worse than making real[N,M] shorthand for (real[M])[N].

anon75146577 · June 17, 2020, 8:28pm

I think it is he right solution within the category of solutions being considered, but worse than the current practice. And that the swapping of the indices will lead to no end of woe.

rok_cesnovar · June 18, 2020, 9:52am

I might have missed this info, but is the implementation of tuples waiting on this, as in it cant be implemented with what we currently have? I might have misunderstood.

Charles_Driver · June 18, 2020, 11:25am

Seems likely to get messy to me, what are the arguments against explicitly calling for an array, such as
array[3] matrix[2,2] myFancyArray
? Is it just the extra typing? This would retain correct ordering and minimise confusion, I think… also function more similar to lists in R.

spinkney · June 18, 2020, 12:13pm

I initially thought matrix[N][I,J] mat; too but switched to matrix[I,J][N] mat; after seeing Bob show the multidimensional array case. I think the single dimension array looks cleaner with the first but once the dimensions of the array > 1 then I prefer Bob’s option and generality wins.

mitzimorris · June 18, 2020, 3:10pm

yes, entirely correct - see comment above: New array declaration syntax - #12 by nhuurre

jonah · June 18, 2020, 4:29pm

I like this a lot. The (minimal) extra typing seems like a small price to pay for the clarity (the types are clear and so is the order of indexing):

array[K1] vector[N] x1;              // x1[k1] is an N-vector 
array[K1,K2] vector[N] x2;           // x2[k1,k2] is an N-vector
array[K1,K2,K3] matrix[I,J] x3;      // x3[k1,k2,k3] is an IxJ matrix

@nhuurre @Bob_Carpenter @rybern (tagging people who were reviewing the PR) are there good reasons to avoid this?

Topic		Replies	Views
Understanding brackets array syntax documentation Modeling specification , docs	2	1060	May 11, 2022
Matrix of vectors General stanc	3	1481	July 3, 2017
Vector and matrix variables with dimension of size 0 Developers	3	1624	March 9, 2017
Arrays of vectors in Stan General	7	686	January 31, 2021
Vector declaration RStan	1	339	November 10, 2023

New array declaration syntax

How to declare an array of I\times J matrices?

Related topics