We’ll be adding ragged arrays to Stan this year, so we should be able to remove the hacks for ragged arrays and replace them with proper solutions.
At that point, I’m going to argue very strongly that we should fix the existing bugs that don’t respect sizes.
For example, this code
print("m = ", to_matrix({ rep_array(1.0, 100), { 0.0 }}));
will try to read 99 values from raw memory:
Chain 1: m = [[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[0,4.34241e-311,1.49167e-154,1.49167e-154,1.99766e-314,6.94786e-310,6.94786e-310,6.94786e-310,6.62048e-322,0,4.94066e-323,2.12899e-314,4.94066e-324,2.12899e-314,1.4822e-323,2.12899e-314,1.97626e-323,2.12899e-314,2.12899e-314,9.88131e-324,2.12899e-314,2.47033e-323,2.12899e-314,2.12899e-314,3.45846e-323,2.129e-314,2.47033e-323,2.12899e-314,4.44659e-323,2.12899e-314,2.12899e-314,5.43472e-323,2.12901e-314,5.92879e-323,3.45846e-323,2.129e-314,2.47033e-323,2.12899e-314,2.12899e-314,2.47033e-323,2.12899e-314,3.45846e-323,2.12899e-314,2.12899e-314,6.91692e-323,2.12899e-314,7.90505e-323,7.41098e-323,2.96439e-322,2.12899e-314,8.39912e-323,2.12899e-314,9.38725e-323,2.12899e-314,9.88131e-323,2.12899e-314,8.89318e-323,2.129e-314,1.03754e-322,2.12899e-314,1.13635e-322,2.12899e-314,2.12899e-314,1.23516e-322,2.12901e-314,1.28457e-322,8.89318e-323,2.129e-314,1.03754e-322,2.12899e-314,2.12899e-314,1.92686e-322,2.12899e-314,2.12899e-314,2.12899e-314,1.03754e-322,2.12898e-314,0,1.99766e-314,6.94786e-310,6.94786e-310,6.94786e-310,1.38338e-322,0,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310,6.94786e-310]]
This’ll likely lead to segfaults on other platforms. If you reverse the order of the rows in the array, you get a 1 x 1 output from to_matrix
.
The runtime assignment is also missing checks for reassigning. So you can do this in a current Stan program:
int a[2, 2] = { { 1, 2 }, { 3, 4 } };
a[1] = {1, 3, 4}; // a is no longer 2 x 2!
print("a = ", a);
which will print (from RStan):
Chain 1: a = [[1,3,4],[3,4]]
The intention in the language was for the size of a variable not to change once it’s created. So this is technically a bug w.r.t. the intention of only supporting rectangular data structures.
Our functions are not designed to work with this kind of raggedness, as they assume all inputs are rectangular and don’t raise exceptions if they’re not. I’m guessing some of them will just segfault on other platforms.
Our sampler’s also not prepared to deal with this kind of resizing. Stan currently allows this buggy block to compile:
transformed parameters {
real a[2, 2] = { { 1.0, 2 }, { 3.0, 4 } };
a[1] = {1.0, 2, 3 };
print("a = ", a);
}
How many elements in a
? 4 according to our MCMC output, but 5 according to the print statement.
I don’t believe we should allow this kind of inconsistnecy in the language.