Ragged array expressions

The Stan manual says that ragged arrays are not allowed – they must be rectangular. Does this also apply to intermediate expressions that get passed to functions, but not assigned to a variable? That manual seems to imply that the answer is yes, these must also be rectangular, yet the following Stan code works just fine for me. Can I count on this continuing to work, or is it an accident that this works?

functions {
  int sum_nrows(matrix [] M) {
    int n = 0;
    int N = size(M);
    for (i in 1:N)
      n += rows(M[i]);
    return n;
  }
  int sum_ncols(matrix [] M) {
    int n = 0;
    int N = size(M);
    for (i in 1:N)
      n += cols(M[i]);
    return n;
  }
  matrix block_diag(matrix [] M) {
    int nr = sum_nrows(M);
    int nc = sum_ncols(M);
    matrix[nr, nc] result = rep_matrix(0, nr, nc);
    int N = size(M);
    int i = 0;
    int j = 0;
    for (k in 1:N) {
      int nrk = rows(M[k]);
      int nck = cols(M[k]);
      result[(i+1):(i+nrk), (j+1):(j+nck)] = M[k];
      i += nrk;
      j += nck;
    }
    return result;
  }
}
data {
}
transformed data {
  matrix[1, 2] A = [[1.0, 2.0]];
  matrix[2, 1] B = [[3.0], [4.0]];
  matrix[2, 2] C = [[5.0, 6.0], [7.0, 8.0]];
}
parameters {
  real x;
}
transformed parameters {
  matrix [5, 5] M1 = block_diag({ x * A, x * B, x * C});
}
model {
  x ~ normal(0.0, 1.0);
}

Both I think

To clarify if it’s not obvious from the example: the use case is when I want to write a function that takes a variable number of matrix arguments of different shapes, or a variable number of vector arguments of different lengths. The only way I’ve found to do this is to write a function that takes a ragged array of matrices or ragged array of vectors as its argument. I’m happy with that solution, as long as I can be sure that future versions of Stan won’t “fix” this loophole in the requirement that arrays be rectangular.

I get it. My guess is that if it were feasible to check sizes in user-defined functions, the the Stan language would have done so years ago when it introduced user-defined functions. But it wasn’t feasible, which is why you can just declare matrix in a signature instead of something like matrix[N, K]. Same with matrix [].

That has always been one of my frustrations with the Stan language because C++ is fine with a std::vector<Eigen::Matrix<var, Eigen::Dynamic, Eigen::Dynamic> > even if the matrices have different sizes at runtime, but the Stan parser won’t allow that to be assigned to a symbol. I actually proposed a one-line patch to Stan that would allow Eigen things declared as size zero to get resized but it got rejected in order to implement ragged arrays of matrices properly in the Stan language. Years later, we’re still not there, although Mitzi is working hard on it. My guess is that ragged arrays of matrices will get implemented in the Stan language before we figure out how to prohibit ragged arrays of matrices in function signatures.

No! This is a bug. I’m pinging @mitzimorris to create an issue, self assign it, and fix it.

Right. They’re not known until runtime.

Please don’t “fix” this until you have an alternative way of providing the same functionality. Without it there is no way to write a general function that creates a block diagonal matrix from a variable number of matrices of different shapes.

Serious question: is it really a good idea to prioritize fixing a feature-bug that a long-time user is using for their work when we don’t have any record if this causing a bug for somebody else? @Kevin_Van_Horn is the one taking the risk here, we’re clear we’re not supporting this. My POV is that we all have better things to do.

BTW, here’s some context on the issue that I hope will provide some motive for supporting arrays with heterogeneous element shape. The two Stan functions I’ve written that really need this are a function to create a block diagonal matrix from the blocks, and a function to append vectors, both of which take an arbitrary number of arguments. These are used in creating state-space models for time-series analysis.

The bigger picture is that I’m in the middle of creating a concise language for defining Bayesian state-space models, which compiles to Stan. Rather than directly specify the matrices that define the SSM, you define an SSM as the composition of primitive SSMs, and the compiler then works out the needed matrix expressions. In compiling to Stan, my compiler does the following:

  • Type, shape, and bounds inference for variables. The user supplies type, shape, and bounds for fixed model inputs (things that would go in the data block), but not for variables defined in the body of the model; the compiler figures these out.

  • Automatically put variables in the appropriate Stan model block. For each “let” variable (e.g., x = some_expression) or “draw” variable (e.g. x ~ some_distribution) the compiler figures out which Stan model block it belongs in.

  • Non-centered parameterization. For distributions with scale or scale and location parameters, the compiler automatically reparameterizes to get distributions with unit scale and zero offset.

  • Substantial support for quasiperiodicity.

For example the following program defines a SSM that is the sum of a random walk component, a quasiperiodic component with period 7, and white noise:

def main(sigma_q_scale, sigma_h_scale: real{0.0,},
         mu_a: real, sigma_a: real{0.0,},
	 rho_mean: real{0.0,1.0}, sigma_p_scale: real{0.0,})
  =
  sigma_q ~ half_cauchy(sigma_q_scale);
  sigma_h ~ half_normal(sigma_h_scale);
  rho ~ exponential_mt(rho_mean, 1.0);
  sigma_p ~ half_normal(sigma_p_scale);
  accum(wn(sigma_q), mu_a, sigma_a) + wn(sigma_h)
  + qp(7.0, 0.7, 6, rho, sigma_p)

(exponential_mt is a mean-parameterized, truncated exponential distribution. qp defines a quasiperiodic component, with rho controlling the degree of departure from strict periodicity. wn defines a white-noise component. accum essentially takes a time series model and integrates it.)

The current iteration of the compiler transforms the above into the following Stan program:

functions {
  // definition for block_diag
  // definition for exponential_mt_rate
  // definition for scale_matrices
  // definition for vec_append
}
data {
  int<lower = 1> N;
  vector[N] y;
  real<lower = 0.0> sigma_q_scale_;
  real<lower = 0.0> sigma_h_scale_;
  real mu_a_;
  real<lower = 0.0> sigma_a_;
  real<lower = 0.0, upper = 1.0> rho_mean_;
  real<lower = 0.0> sigma_p_scale_;
}
transformed data {
  matrix[1, N] yy = to_matrix(y');
  vector[7] Z_0_ = to_vector({1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0});
  vector[7] a1_0_ = to_vector({mu_a_, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0});
}
parameters {
  real<lower = 0.0> raw_0_;
  real<lower = 0.0> raw_1_;
  real<lower = 0.0, upper = 1.0> rho_;
  real<lower = 0.0> raw_2_;
}
transformed parameters {
}
model {
  real sigma_q_ = raw_0_ * sigma_q_scale_;
  real sigma_h_ = raw_1_ * sigma_h_scale_;
  real sigma_p_ = raw_2_ * sigma_p_scale_;
  vector[6] csigma2s_0_ =
    square(sigma_p_) *
    to_vector({0.6182923634858543, 0.6182923634858543,
               0.27566250834204353, 0.27566250834204353,
               0.10604512817210211, 0.10604512817210211});
  real rhoSqr_0_ = square(rho_);
  real H_0_ = square(sigma_h_);
  matrix[1 + 3 * 2, 1 + 3 * 2] T_0_ =
    block_diag({
      rep_matrix(1.0, 1, 1),
      block_diag(
        scale_matrices(
          sqrt(1.0 - rhoSqr_0_),
          {to_matrix({0.6234898018587336, 0.7818314824680297,
                      (-0.7818314824680297), 0.6234898018587336}, 2, 2),
           to_matrix({(-0.22252093395631434), 0.9749279121818236,
                      (-0.9749279121818236), (-0.22252093395631434)}, 2, 2),
           to_matrix({(-0.900968867902419), 0.43388373911755823,
                       (-0.43388373911755823), (-0.900968867902419)}, 2, 2)
          }
        )
      )
    });
  matrix[1 + 6, 1 + 6] Q_0_ =
    diag_matrix(
      vec_append({
        to_vector({square(sigma_q_)}),
        rhoSqr_0_ * csigma2s_0_
      })
    );
  matrix[1 + 6, 1 + 6] P1_0_ =
    diag_matrix(vec_append({to_vector({square(sigma_a_)}), csigma2s_0_}));
  raw_0_ ~ cauchy(0.0, 1.0) T[0.0, ];
  raw_1_ ~ normal(0.0, 1.0) T[0.0, ];
  rho_ ~ exponential(exponential_mt_rate(rho_mean_)) T[, 1.0];
  raw_2_ ~ normal(0.0, 1.0) T[0.0, ];
  yy ~ gaussian_dlm_obs(to_matrix(Z_0_'), T_0_, rep_matrix(H_0_, 1, 1), Q_0_, a1_0_, P1_0_);
}   

As you can see, there are still some optimizations I need to include (constant folding in shape expressions, and lifting any subexpression having no parameter dependence into the transformed data block), but it’s automating a lot of the work that would otherwise go into creating such a model.

I’m doing this work for my employer Adobe, but I have gotten permission to make it an open-source project. Expect an initial release by the end of the year.

This thread is related to this discussion on ragged array design.

In this case, one of two things needs to happen:

  • Fix the bug.

  • Change the spec.

We can’t have the language not implementing its spec!

If we change the spec, we have to make sure that all of our algorithms do the right thing for ragged arrays—we may be doing things like assuming they are rectangular somewhere.

The current behavior apparently allows hacking around all size requirements on local variables by embedding everything in a 1D array. We have to decide whether we want that or not.

Maybe @mitzimorris will know if this got “fixed” in the 2.19 language refactor. I’m guessing not as it probably still relies on the old assign behavior.

Sorry about that. At least I’m no longer overpromising. All of our timelines are now “we’ll get there when we get there.”

A lot of mistakes were made in the language refactor, primarily doing it in too large a piece and not keeping to a proper refactoring. It wound up taking a year rather than three months.

What that tells me is that unless we get more programming help on the language, we’re not going anywhere fast. Maybe three years for ragged arrays at our current pace of development. Hopefully we’ll get tuples in a year. We could reorder if people are happy with ragged array spec and do that first.

Just want to reiterate that, from my perspective, I’m happy as long as there is SOME way of writing a user-defined function that takes a variable number of arguments of the same type but possibly different shapes. Ragged arrays are just one way of doing that.

just filed an issue for this: https://github.com/stan-dev/stan/issues/2659

here’s a minimal example:

functions {
  void foo(vector [] VS) {
    print("foo");
  }
}
transformed data {
  vector[2] A = [1.0, 2.0]';
  vector[3] B = [3.0, 4.0, 5.0]';
  vector[4] C = [6.0, 7.0, 8.0, 9.0]';
  foo( { A, B, C });
}

pared down from your obviously quite useful program.
confirms that this compiles and runs.

Thanks.

The question’s now whether we want to fix this issue or change the spec. I’m not sure what’ll happen if we do something like to_matrix() on a ragged array. There are lots of operations like that we’d have to check for consistency with raggedness.

The only officially supported (i.e., that matches the spec as currently written) approach that we’ve figured out is the one described in the ragged structure chapter of the manual. For example, suppose you have a function that applies another function foo to each element of a ragged array of matrices, e.g.,

void bar(matrix[] xs) {
  for (matrix x : xs) {
    ... foo(x) ...
  }
}

This can be coded by flattening the values of xs into a vector xvs (each in column major order, I believe), with ms and ns giving the sizes:

void bar(real[] xvs, int[] ms, int[] ns) {
  int start = 0;
  for (i in 1:size(ms)) {
    int size = ms[i] * ns[i];
    matrix x = to_matrix(xvs[start : start + size - 1], ms[i], ns[i]);
    ... foo(x) ...
    start += size;
  }
}

That solution isn’t really workable for my use case. The arguments to block_diag() can themselves be arbitrary matrix-valued expressions, so what your solution means is that I can’t do a function call at all, I have to generate some complex inline code. I suppose that it’s in principle doable, but it tremendously complicates things.

There is a ton of code in the Math library that depends on all nested arrays having the exact same length. So to say that this is a bug or accidental feature that works is a little misleading. So I wouldn’t recommend using this generally as a target for a compiler especially since it will probably be pretty difficult to ensure that you don’t generate code that ends up breaking Stan, even if we don’t end up fixing this any time soon.

Let me make clear the one and only usage pattern in which I have used heterogeneous nested arrays: they are all of the form

f({e1, ..., en})

where f is a user-defined function. The definition of f() looks like

return_type f(T [] x) {
  // all references to x are of the form x[i]
  ...
}

where t is either matrix or vector. So no heterogeneous array ever gets passed to a Stan function.

I’ve submitted a feature request proposing that user-defined functions be allowed to have variable-length argument lists, with a very simple implementation based on this same scheme I’ve been using to simulate them.

Thanks! That looks like something we should definitely do. We’re slammed right now on the language with rebuilding it with a new parser, so I’m not sure when we’ll get to it. Having said that, we probably won’t get to fixing that bug any time soon, either—but we’ll definitely fix it at some point, so you can’t rely on it longer term.

Sorry that we can’t support the current broken state of things—as @seantalts says, ragged arrays will break a lot of our internal functions if they’re allowed, so we want to get rid of them.