What is the idiomatic way to check for logical constraints in input data?

Suppose I want to have something like this in data:

int n;  
array[n] int a;
int s = sum(a);
vector[s] x;
vector[s] y;

But this is not possible, because assignment statements aren’t possible within data.

I had in idea to instead do something like:

data {
  int n;
  array[n] int a;
  int s;  // redundant input
  vector[s] x;
  vector[s] y;
}
transformed data {
 if (s != sum(a)) {fatal_error("does not add up!");}
}

another option would be

data {
  int n;
  array[n] int a;
  vector[sum(a)] x;
  vector[sum(a)] y;
}
transformed data {
  int s = sum(a);
}

but both feel a little clunky and would also be inefficient if instead of the sum was an expensive computation or I have to repeat the pattern many times over.

Would either of the options be considered more idiomatic/preferred for a good reason? Or is there a better way I don’t know of? Or maybe does an optimizer in the compiler recognize it’s a pure function and actually computes it only once?

1 Like

Your second approach is the preferred approach because it can’t fail by inconsistency of s.

I agree it’s clunky. If the function gets complicated, then you need to write it as a user-defined function.

The data and transformed data blocks are only executed once as the data is read in. They don’t require autodiff, so it’s all C++ primitives, which are super-duper fast. The I/O to bring data in from memory will be slower than summing an in-cache vector, so you won’t even be able to measure the slowdown here without a very careful instrumentation effort. It might take an extra microsecond or two if those vectors are 10K long.

OK, thanks for the comment! :)