Arrays of vectors in Stan

Hi,

May I ask that does Stan now allow us define arrays of vectors with different length or we still need to use the idea of padding?

Thanks!

No, arrays of vectors still require the individual vectors to all be of the same length

the design doc PR for ragged arrays is here: Ragged Structures by bob-carpenter · Pull Request #7 · stan-dev/design-docs · GitHub

the promise of the new Stanc3 compiler was that it would make it easy to add this to the language, so calling all Stanc3 devs - this is a much needed feature - however quotidien the coding of it might be.

recently I needed ragged arrays, and so I had to use a padded array plus an additional vector of lengths. Stan’s multi-indexing lets you do this, but it’s not pretty:

  /* ...
   * For example, a series of four coefficients phi[1:4] for a
   * disconnected graph containing 1 singleton would have
   * adjacency array {{1, 2}, {2, 3}}, signaling that coefficient 1
   * is adjacent to coefficient 2, and 2 is adjacent to 3,
   * component size array {3, 1}, and (zero-padded) component members
   * array of arrays { { 1, 2, 3, 0}, {4, 0, 0, 0} }.
   ...
   */

  real standard_icar_disconnected_lpdf(vector phi,
				       int[ , ] adjacency,
				       int[ ] comp_size,
				       int[ , ] comp_members) {
    // bunch of checks omitted
    real total = 0;
    for (n in 1:size(comp_size)) {
      if (comp_size[n] > 1)
	total += -0.5 * dot_self(phi[adjacency[1, comp_members[n, 1:comp_size[n]]]] -
				 phi[adjacency[2, comp_members[n, 1:comp_size[n]]]])
	  + normal_lpdf(sum(phi[comp_members[n, 1:comp_size[n]]]) | 0, 0.001 * comp_size[n]);
    }
    return total;
  }
}

also, tuples, structs: [WIP] tuples and structs rough proposal by bob-carpenter · Pull Request #24 · stan-dev/design-docs · GitHub

1 Like

Thanks so much!

That’s very helpful!

This should not be too difficult coding up in stanc3, one part of this is actually already done, that is the new syntax

array[...] real;

My estimate is actually that there is more to do for the back end than for stanc3.
For example

  • I am assuming we want ragged arrays to be used with Stan Math function that accept 2+D arrays?
    If we do, we need to test that all Stan Math functions that take in real[,,], real[,,,],…, vector[,], vector[,], … and all other 2+ dimensional arrays work with ragged arrays. This is mostly checking that apply_* stuff in Stan Math works with std::vectors of std::vectors of unequal length. We need to add a bunch of tests. In Math we have at least some basic tests for all the functions currently used in Stan. If we dont for any of them, its not intentional.

  • check that lub_constrain and other constrain functions support ragged arrays, add tests

  • JSON I/O changes +? Rdump I/O if also required (we havent deprecated it officially for cmdstan which is the only interface that uses it, we should though) - this should be somewhat straightforward

@rybern is already working on this and has a solution where you can do the following:

functions {
  // Nested array/tuple unsized types
  (int, array[] (int, real))
    f(int x, array[,] (array[] (int, int), array[,] int) x2)
  {}
}
transformed data {
  // nested array/tuple sized types and literals
  (int, real, array[2] (int, int)) a = (1, 2.5, {(1,2), (3,4)});

  // simple indexing
  int b = (1, 2.5).1;

  // sized/unsized types unify
  array[2,2] (array[2] (int, int), array[2,2] int) c;
  (int, array[3] (int, real)) d = f(2, c);

  // complex nesting indexing
  array[5] (array[10,5] (int, array[1,2,3] real), real) e;
  real f = e[5].1[10,5].2[1,2,3];
}

The PR is here: Add tuples to the language by rybern · Pull Request #675 · stan-dev/stanc3 · GitHub
There is prerequisite PR to allow

real a = 5, b = 6;
real c = 5, d, e = 7;

here: New syntax for multiple identifiers/definitions in one declaration statement by rybern · Pull Request #670 · stan-dev/stanc3 · GitHub

1 Like

totally awesome! looking forward to seeing this happen soon!

the following is valid JSON:

{
	"ragged": [
		[1, 2, 3],
		[4, 5],
		[6]
	]
}

do you mean that we need to change the CmdStan’s io::json lib? happy to help.

Yeah, I think the reading-in part should work fine as its valid JSON. I am not so sure on the var_context/json_data parts of it. Bob also mentions that in the design doc.

At minimum we need tests for var_context to see if it breaks anywhere. I think there are at least some checks that would throw somewhere.