Ragged array functional specification (proposal)

Bob_Carpenter · December 21, 2017, 6:01am

This discussion is deprecated. For the latest discussion, see:

Now that tuples are well under way, ragged arrays will be next. To start, I created a simple functional specification for ragged arrays (with some input from Mitzi and residual text from the last time I tried to write out the spec):

GitHub

Home

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details. - stan-dev/stan

Comments welcome (which is why I put it here rather than the developer-post-only section).

serenejiang · September 25, 2018, 4:11am

I am a new user of Stan. I was looking at your ragged array spec tutorials on github. However, I was running into a lot of troubles when implementing your code, even the simplest example like

int<lower = 0> N;
int<lower = 0>[N] M;
int[M] y;

R gave me a syntax error like below:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

error in ‘model174444bd7dc11_raggedArray_example’ at line 4, column 6

 2:   int<lower = 0> N;
 3:   int<lower = 0> M [N];
 4:   int[M] y; 
         ^
 5: }

PARSER EXPECTED:
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model ‘raggedArray_example’ due to the above error.

Do you know why? Thanks!

Bob_Carpenter · October 3, 2018, 6:40pm

It’s just a proposal that hasn’t been implemented yet! Maybe in a year or two we’ll have proper ragged arrays. Until then, the manual contains instructions on how to code ragged arrays.

jjramsey · October 3, 2018, 8:18pm

It occurs to me that if you didn’t have to specify the sizes of data arrays, vectors, and matrices in advance, then any data array declaration could be treated as a possibly ragged array declaration. For example, an array of vectors of varying length might be declared as, say, vector foo[];. This would have the added advantage of mirroring the syntax already used for specifying array types in function declarations. Of course, this would mean that the various Stan interfaces would have to infer the intended dimensions of these data arrays, but judging from the “dimensional mismatch” error messages that I’ve seen, it looks like this is already being done anyway.

That said, you’d still need to specify the dimensions of parameter arrays, vectors, and matrices in advance, so you’d still need a ragged array specification for that.

bgoodri · October 3, 2018, 8:22pm

Because it is not implemented yet.

Bob_Carpenter · October 8, 2018, 3:42pm

That’s right. It would be a radical change in terms of how variables are defined. It’d be much easier to do for local variables as @bgoodri has often pointed out—the problem with parameters is that we need rectangular output for HMC, R-hat, etc.

At run time, every variable with a value is sized and it can never be reassigned to a variable of a different size.

Right. There are currently three different type syntaxes:

block variables: +constraints, +sizes
local variables: -constraints, +sizes
function arguments: -constraints, -sizes

We need the constraints on transformed parameters for semantics and we need the sizes on data for the way we do I/O and we need consistent numbers/locations of parameters for all of our posteriors. So we probably don’t want to change block variable declarations.

Function arguments already do what you want.

So let’s look at local variables. What would the size be of a vector declared like this:

for (n in 1:10) {
  vector x; 
  int n = rows(x); // ???

Anything other than zero would be totally arbitrary. So let’s assume they start at size zero. So now what happens when we do this:

vector x;
x[1] = 3;  // error!  x is of size 0

It seems like we’d either need to change our semantics to be like R and resize whenever you get an index out of bound or we’d have to have a way of setting the size. Maybe we have a x.set_rows(int) method that works like C++ std::vector equivalent and trims if it’s longer or expands with default values if it’s shorter. That’s really inefficient (see R), but perhaps doable. The efficiency hit wouldn’t be too bad as we do bounds checking on indexes anyway. We don’t have anything else object-oriented like this or mutable like this for functions, but it might be possible.

Or we could just treat the above as an error. If it’s declared without a size, the only legal thing to do is assign. So then we’d need something like partial size declarations,

int[N, ] x;

for a ragged array of size N with potentially different numbers of elements in each row.

mitzimorris · October 8, 2018, 6:30pm

variable name conflict - also, is this about local variables, and the for loop here is just to set up the local variable - it’s not about changing the size of the vector w/ each look iteration or anything like that is it?

Bob_Carpenter · October 8, 2018, 9:12pm

Sorry for the confusing example—the variable duplication was a mistake and the loop’s a red herring.

The issue would be what to do with this:

vector x;
int k = rows(x);

I think n has to be zero here—that is, x has to start out size zero. But then how do we add elements to it?

The bigger question’s whether we can get rid of sizing on local variables.

Bob_Carpenter · October 8, 2018, 9:14pm

Or whether we want to do size checks on assignment or just allow things to get ragged or change sizes. I believe this is what @bgoodri was suggesting in the special case that the container was initially sized zero.

Topic		Replies	Views
Proposal for ragged containers Developers	5	1021	August 2, 2019
Ragged array example in user's guide is wrong? Modeling	3	727	December 29, 2021
Ragged array expressions Modeling	17	1770	October 22, 2018
Is Ragged Array allowed in Stan? General	20	2798	March 13, 2023
Ragged arrays bug in the language Developers bug	10	1237	May 28, 2021

Ragged array functional specification (proposal)

error in ‘model174444bd7dc11_raggedArray_example’ at line 4, column 6

Related topics