New slicing rules break my code

chvandorp · October 25, 2021, 5:57pm

Hi all,

I wonder if anyone also ran into the following problem: In some of my Stan models, I want to insert a constant c into a vector x at an arbitrary index i, e.g, x = [1,2,3], c=0, and i=2, then y = [1,0,2,3] is the result. How I solved this previously is:

data {
    int<lower=1> n; // length of vector y
    int<lower=1, upper=n> i; // index
}

parameters { 
    vector[n-1] x;
}

transformed parameters {
    vector[n] y = append_row(x[:i-1], append_row(rep_vector(0.0, 1), x[i:]));
}

model {
    y ~ normal(0, 1);
}

and this used to work for any i = 1,2,\dots, n. However, due to a recent update, this is no longer allowed for i=1 and i=n. I get the error (with n=4 and i=4)

vector[min] indexing: accessing element out of range. index 4 out of range; expecting index to be between 1 and 3

Is this change in behavior intentional? Does anyone have a nice one-line fix?

andrjohns · October 25, 2021, 11:40pm

If you’ve specified n=4 and i=4, then x only has three elements:

vector[n-1] x;

So there’s no fourth element to access (as i=4):

x[i:]

chvandorp · October 25, 2021, 11:57pm

Hi Andrew, thanks for responding! Yes, but the idea is that x[i:] is then an empty vector. Like for instance in Python, and earlier versions of Stan…

rfc · October 26, 2021, 12:24am

example of desired behaviour in python3:

>>> x = list(range(5))
>>> x
[0, 1, 2, 3, 4]
>>> x[4]
4
>>> x[5]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> x[5:]
[]
>>> x[slice(5, len(x), 1)]   # longform version of slice, using a slice object of form (start, end, step)
[]

I’m not familiar with Stan, but here’s my guess at a construction to work around the problem - replace x[i:] with something like tail(x, max(n-i, 0)) – i.e. select n-i items from the tail of the array, but if i is equal or greater than n length of the array, select 0 items:

transformed parameters {
    vector[n] y = append_row(x[:i-1], append_row(rep_vector(0.0, 1), tail(x, max(n-i, 0))));
}

See 6.10 Slicing and blocking functions | Stan Functions Reference and 2.3 Bound functions | Stan Functions Reference .

andrjohns · October 26, 2021, 12:25am

Ah I see what you mean. Yes this technically shouldn’t be allowed, as you’re attempting to index an element of the vector that doesn’t exist.

@rok_cesnovar is this something we should allow for now but with a deprecation warning or similar? This seems like a similar issue to rstanarm's ragged array construction

rfc · October 26, 2021, 12:30am

Ah I see what you mean. Yes this technically shouldn’t be allowed, as you’re attempting to index an element of the vector that doesn’t exist.

Throughout python & numpy ecosystem, instead of defining a slice of an array over bounds that select no elements as “error” it instead can be thought of being defined as “return the array containing all of those selected elements – i.e. the empty array, since the slice didn’t select any elements”. I reckon the latter definition is far more useful to users – it is perfectly well defined, and provided the rest of the functions in the standard library work as you’d expect with size 0 arrays, it doesn’t create problems.

FWIW, it looks like Julia also implements the same convention of slices that select zero elements return empty arrays, and are not errors (with the confusing differences that Julia chose 1-based indexing and includes the end index in a slice range, rather than excluding it)

julia> x = [0, 1, 2, 3, 4]
julia> x[99:5]
Int64[]

jsocolar · October 26, 2021, 2:48am

For what it’s worth, the OP’s code runs for i = 1 and n = 4 in cmdstan versions up to 2.25 but fails from 2.26 on. It makes no difference whether or not I turn off range checks in make/local.

rfc · October 26, 2021, 6:39am

Out of curiosity I had a read through Stan code to see if I could figure out where the range checks might be happening. Here’s a bit of a guess:

stanc3’s frontend parses the i: of x[i:] into the internal representation [Upfrom e] where e is the representation of the expression for i.
stanc3’s stan math backend transforms the Upfrom e into a C++ code fragment of the form index_min(%a), in this case i reckon it would spit out a index_min constructor call taking an integer (the start index) wrapped in a call to rvalue, which are both defined in stan model
this resolves to the definition of rvalue for a tail slice as defined in stan model:

github.com

stan-dev/stan/blob/673d376b72accc00fa55d6b64ca7490837724f39/src/stan/model/indexing/rvalue.hpp#L195-L212

    
      
          /**
           * Return a tail slice of a vector
           *
           * Types:  vector[min:N] = vector
           *
           * @tparam Vec An Eigen vector or `var_value<T>` where `T` is an eigen vector.
           * @param[in] x vector
           * @param[in] name Name of variable
           * @param[in] idx An indexing from a specific minimum index to the end out
           *  of a bottom row of a matrix
           * @throw std::out_of_range If any of the indices are out of bounds.
           */
          template <typename Vec, require_vector_t<Vec>* = nullptr,
                    require_not_std_vector_t<Vec>* = nullptr>
          inline auto rvalue(Vec&& x, const char* name, index_min idx) {
            stan::math::check_range("vector[min] indexing", name, x.size(), idx.min_);
            return x.tail(x.size() - idx.min_ + 1);
          }

from the error message, it looks like the code is failing the runtime check stan::math::check_range("vector[min] indexing", name, x.size(), idx.min_);
but if that check wasn’t there, or was disabled, we’d execute x.tail(x.size() - idx.min_ + 1);, which looks like Eigen code

I had a play with Eigen’s tail method separately from stan. It appears that calls of the form x.tail(0) evaluate to size 0 arrays. Whereas something like x.tail(-1) results in an Eigen assert failing on an attempted out of range access.

I’m not familiar with the code so it’s hard to speculate how exactly this might have worked before. Is there some layer that used to clamp index values at runtime, but no longer does so? Maybe the check_range checks were toggled off?

wds15 · October 26, 2021, 7:58am

@stevebronder wanna take a look?

chvandorp · October 26, 2021, 2:08pm

To argue in favor of returning empty vectors instead of throwing exceptions, I wanted to give another example that I’ve used, to demonstrate how this kind of indexing can be useful and natural. This probably relates to ragged data structures that @andrjohns mentioned (but I have no experience with rstanarm).

Suppose that we have N subjects and for each subject we have a number of observations, some of which may be missing. The number of missing observations K_n differs between subjects and crucially can be zero for some subjects. We want to model these missing observations explicitly using a vector vector[sum(K)] x. The missing observations specific for subject n can be selected from x using

vector[K[n]] y = x[sum(K[:n-1])+1:sum(K[:n])];

To put this in a working Stan model (pre v2.26, thanks @jsocolar):

data {
    int N; // number of subjects
    int<lower=0> K[N]; // number of missing observations for each subject
}

parameters {
    vector[sum(K)] x; // concatenated missing observations
    //real x[sum(K)];
}

model {
    for ( n in 1:N ) {
        // get the part of x corresponding to subject n
        
        vector[K[n]] y = x[sum(K[:n-1])+1:sum(K[:n])];
        //real y[K[n]] = x[sum(K[:n-1])+1:sum(K[:n])];
        
        // do something with y...
    }
    
    x ~ normal(0,1);
}

If we now choose N=5 and K = [1,2,0,4,3], everything works fine, also with the newer versions. Notice that subject 3 has no missing observations. However, if we organize our data such that subject 1 or subject n has no missing observations, we get an exception: (try K = [0,1,2,4,3] or K = [1,2,4,3,0]).

By the way: while doing this I realized that the case K[:0] should also throw an exception under the new rules, but it doesn’t. In turns out that this is only an issue for vectors. The code that is commented out using arrays instead of vectors does work.

In any case, I do think that the user should have been warned before this feature was removed from the language. This could potentially lead to a lot of problems when other users update their Stan version. If you agree I can report an issue on github…

WardBrian · October 26, 2021, 3:17pm

I would bet this behavior changed via Add overloads of rvalue() for efficient slicing by SteveBronder · Pull Request #2965 · stan-dev/stan · GitHub

My two cents on this is that this is the correct behavior. A Stan array is must closer to a C array than a Python list semantically, and I think this error and behavior are correct

For the record, head and tail work as you desire when size==0:

transformed parameters {
    vector[n] y = append_row(head(x, i-1), append_row(rep_vector(0.0, 1), tail(x, n-i)));
}

I believe this version of your program works for i=1…4 and n=4 as you intend

wds15 · October 27, 2021, 7:17am

So it is about putting 0 in the first or last position of K which causes the 2.28 version to throw over? Declaring 0-sized data structures has always been possible and this was to me always a very useful feature to have, which I would assume was and is intended.

To me this sounds like a bug and not intended behaviour (but it seems we need a discussion around it). Can you please file an issue so that we can discuss on GitHub the matter? It’s hard to say where to file it… maybe the Stan repo is the right one?

rok_cesnovar · October 27, 2021, 7:30am

I am not sure I completely follow, what the issue is.

If I understand correctly you have

vector[3] x; // or
row_vector[3] y; // or
array[3] real z;

and then you do x[4:] it does not work, as well as y[4:], but z[4:] works?

By not work I mean throws an error.

If I am getting this correctly I think it should either not allow it in both cases or if we want to allow it, we should allow it for vectors, but not arrays. The former are Eigen vector and those allow this behavior I believe.

If [:0] works I would say that is a bug not an intended feature.

syclik · October 27, 2021, 1:16pm

+1. There should be consistency.

I also believe Stan should throw an error if indexed with something out of bounds. The preference I have is of having unintended behaviors minimized. If out of bounds indexing returned 0-length containers and we are able to handle those seamlessly, if this is truly a programming error, Stan won’t indicate that it is. I definitely understand the convenience of not emitting an error, but I believe the benefit of failing early outweighs the convenience (in this case).

chvandorp · October 27, 2021, 1:19pm

So it is about putting 0 in the first or last position of K which causes the 2.28 version to throw over?

Yes (in version >=2.26). In the example, sum(K[:1]) == sum({0}) = 0, and then we get vector[0] y = x[1:0], which is an empty vector.

chvandorp · October 28, 2021, 3:54pm

FYI: I created an issue on github: new vector slicing bound checks break code · Issue #3076 · stan-dev/stan · GitHub

Topic		Replies	Views
Greetings all, my stan code displays this problem Modeling	4	372	October 8, 2022
Slicing in Stan using double colon Modeling	6	467	July 7, 2023
Exception error in STAN program Modeling fitting-issues , specification	11	674	August 11, 2020
Help understanding what's happening in this code General	3	481	October 8, 2019
Vectorization of real valued parameter Modeling	8	585	January 9, 2020

New slicing rules break my code

Related topics