Stan equivalent of R's %in% function


#1

Hi all -

I had a need to create a version of R’s %in% function for Stan, so I decided to post the code here in case anyone finds it useful. Essentially checks if an integer matches a list of integers, and is useful for determining if an element index is in a pre-specified list of indices. It is only vectorized one way (the list of possible index matches), instead of 2-way as R’s is.

test_R_in.stan (790 Bytes)
R_in.R (326 Bytes)

Here’s the Stan code, and demonstration R code is in the attached files:

functions {
  // function that is comparable to R's %in% function
  // pos is the value to test for matches
  // array pos_var is the 1-dimensional array of possible matches
  // returns pos_match=1 if pos matches at least one element of pos_var and pos_match=0 otherwise
  // example code: 
  // if(r_in(3,{1,2,3,4})) will evaluate as TRUE
  int r_in(int pos,int[] pos_var) {
    int pos_match;
    int all_matches[size(pos_var)];
    
    for (p in 1:(size(pos_var))) {
      all_matches[p] = (pos_var[p]==pos);
    }
    
    if(sum(all_matches)>0) {
      pos_match = 1;
      return pos_match;
    } else {
      pos_match = 0;
      return pos_match;
    }
    
  }
}

Create parameter constraint index vector inside the transformed data block
#2

That’s great! Would it be possible to use the size() function and avoid the third argument?


#3

Yes! Thanks for the tip. Code updated.


#4

Nice! @Bob_Carpenter I’m curious, is this also how you would code this up?

Somewhat related: A while ago I started planning and fiddling with an R package of tested user defined functions in the Stan language, which I hope to resume soonishly (will be on GitHub, contributions welcome). Functions would be exposed to R so users can try them out in R before using them in Stan programs, and they would be able to be added to Stan programs easily. This seems like a good candidate for one of the early entries.


#5

Sure thing Jonah. I think that kind of package would be useful if we can find good categories for functions so that people can find what they are looking for, i.e., data-munging vs. statistical vs. arithmetic functions.


#6

A package as you suggest would be really useful. I have a huge utils.stan file by now which make my life so much simpler when coding up some data munging or whatever in Stan.

What I am saying is that I would try to contribute to this.


#7

I’m not sure how ‘hot’ the paths you’d use this ever would be, but here are some considerations to optimize the code considerably

functions {
   // if(r_in(3,{1,2,3,4})) will evaluate as 1
  int r_in(int pos,int[] pos_var) {
   
    for (p in 1:(size(pos_var))) {
       if (pos_var[p]==pos) {
       // can return immediately, as soon as find a match
          return 1;
       } 
    }
    return 0;
  }
}

Two things that are changed

  1. definitely no need to sum everything, as that essentially makes you traverse the vector twice, on top of doing more work (summing).

  2. can exit as soon as find a match, no need to keep scanning, as we know at least one is present, if you get through the entire loop without an exit, that means no matches so return 0. If you do need to iterate over the entire thing (say return the number of times was in the vector), then store the count elsewhere and increment it every time a match, rather than doing a sum of a whole bunch of 0’s.

In general, the ‘tricks’ such as using sum for presence (within R itself) are artifacts of R’s inherent slowness of explicit loops, so explicitly looping and returning early will actually be slower than just doing a vectorized sum. In C++, this is not the case, so you can take advantage of not needing to think always in terms of contorting activities to be vectorized.

Finally, I haven’t been keeping up with stan these days so I’m not sure if its valid or not, but might want to consider changing the function signature as the return value being a boolean, rather than 0/1.

Cheers


#8

Thanks much! Faster code is always appreciated. I’ll update the post.


#9

Also re: return value, the Stan manual says boolean functions return 0 or 1, so just going off of that.


#10

I’d have coded it the way @dpastoor did for exactly the reasons he gave.

That’s all we have for boolean in Stan to date. I’d like to add a proper boolean type. It’d be a subtype of int the way int is a subtype of real so as not to break backward compatibility.