Arrays of vectors in Stan

stan_beginer · January 29, 2021, 12:57am

Hi,

May I ask that does Stan now allow us define arrays of vectors with different length or we still need to use the idea of padding?

Thanks!

andrjohns · January 29, 2021, 2:50am

No, arrays of vectors still require the individual vectors to all be of the same length

mitzimorris · January 29, 2021, 4:13pm

the design doc PR for ragged arrays is here: Ragged Structures by bob-carpenter · Pull Request #7 · stan-dev/design-docs · GitHub

the promise of the new Stanc3 compiler was that it would make it easy to add this to the language, so calling all Stanc3 devs - this is a much needed feature - however quotidien the coding of it might be.

recently I needed ragged arrays, and so I had to use a padded array plus an additional vector of lengths. Stan’s multi-indexing lets you do this, but it’s not pretty:

  /* ...
   * For example, a series of four coefficients phi[1:4] for a
   * disconnected graph containing 1 singleton would have
   * adjacency array {{1, 2}, {2, 3}}, signaling that coefficient 1
   * is adjacent to coefficient 2, and 2 is adjacent to 3,
   * component size array {3, 1}, and (zero-padded) component members
   * array of arrays { { 1, 2, 3, 0}, {4, 0, 0, 0} }.
   ...
   */

  real standard_icar_disconnected_lpdf(vector phi,
				       int[ , ] adjacency,
				       int[ ] comp_size,
				       int[ , ] comp_members) {
    // bunch of checks omitted
    real total = 0;
    for (n in 1:size(comp_size)) {
      if (comp_size[n] > 1)
	total += -0.5 * dot_self(phi[adjacency[1, comp_members[n, 1:comp_size[n]]]] -
				 phi[adjacency[2, comp_members[n, 1:comp_size[n]]]])
	  + normal_lpdf(sum(phi[comp_members[n, 1:comp_size[n]]]) | 0, 0.001 * comp_size[n]);
    }
    return total;
  }
}

also, tuples, structs: [WIP] tuples and structs rough proposal by bob-carpenter · Pull Request #24 · stan-dev/design-docs · GitHub

stan_beginer · January 29, 2021, 4:36pm

Thanks so much!

stan_beginer · January 29, 2021, 4:36pm

That’s very helpful!

rok_cesnovar · January 31, 2021, 12:18pm

This should not be too difficult coding up in stanc3, one part of this is actually already done, that is the new syntax

array[...] real;

My estimate is actually that there is more to do for the back end than for stanc3.
For example

I am assuming we want ragged arrays to be used with Stan Math function that accept 2+D arrays?
If we do, we need to test that all Stan Math functions that take in real[,,], real[,,,],…, vector[,], vector[,], … and all other 2+ dimensional arrays work with ragged arrays. This is mostly checking that apply_* stuff in Stan Math works with std::vectors of std::vectors of unequal length. We need to add a bunch of tests. In Math we have at least some basic tests for all the functions currently used in Stan. If we dont for any of them, its not intentional.
check that lub_constrain and other constrain functions support ragged arrays, add tests
JSON I/O changes +? Rdump I/O if also required (we havent deprecated it officially for cmdstan which is the only interface that uses it, we should though) - this should be somewhat straightforward

@rybern is already working on this and has a solution where you can do the following:

functions {
  // Nested array/tuple unsized types
  (int, array[] (int, real))
    f(int x, array[,] (array[] (int, int), array[,] int) x2)
  {}
}
transformed data {
  // nested array/tuple sized types and literals
  (int, real, array[2] (int, int)) a = (1, 2.5, {(1,2), (3,4)});

  // simple indexing
  int b = (1, 2.5).1;

  // sized/unsized types unify
  array[2,2] (array[2] (int, int), array[2,2] int) c;
  (int, array[3] (int, real)) d = f(2, c);

  // complex nesting indexing
  array[5] (array[10,5] (int, array[1,2,3] real), real) e;
  real f = e[5].1[10,5].2[1,2,3];
}

The PR is here: [Outdated] Add tuples to the language by rybern · Pull Request #675 · stan-dev/stanc3 · GitHub
There is prerequisite PR to allow

real a = 5, b = 6;
real c = 5, d, e = 7;

here: New syntax for multiple identifiers/definitions in one declaration statement by rybern · Pull Request #670 · stan-dev/stanc3 · GitHub

mitzimorris · January 31, 2021, 5:27pm

totally awesome! looking forward to seeing this happen soon!

the following is valid JSON:

{
	"ragged": [
		[1, 2, 3],
		[4, 5],
		[6]
	]
}

do you mean that we need to change the CmdStan’s io::json lib? happy to help.

rok_cesnovar · January 31, 2021, 5:40pm

Yeah, I think the reading-in part should work fine as its valid JSON. I am not so sure on the var_context/json_data parts of it. Bob also mentions that in the design doc.

At minimum we need tests for var_context to see if it breaks anywhere. I think there are at least some checks that would throw somewhere.

Topic		Replies	Views
Is Ragged Array allowed in Stan? General	20	2777	March 13, 2023
Ragged array example in user's guide is wrong? Modeling	3	720	December 29, 2021
Matrix of vectors General stanc	3	1479	July 3, 2017
How to pass an array of integer arrays of different lengths into Stan? Modeling	5	3034	April 3, 2019
Ragged array expressions Modeling	17	1739	October 22, 2018

Arrays of vectors in Stan

Related topics