Stan equivalent of R's %in% function

saudiwin · April 12, 2018, 6:37pm

Hi all -

I had a need to create a version of R’s %in% function for Stan, so I decided to post the code here in case anyone finds it useful. Essentially checks if an integer matches a list of integers, and is useful for determining if an element index is in a pre-specified list of indices. It is only vectorized one way (the list of possible index matches), instead of 2-way as R’s is.

test_R_in.stan (790 Bytes)
R_in.R (326 Bytes)

Here’s the Stan code, and demonstration R code is in the attached files:

functions {
  // function that is comparable to R's %in% function
  // pos is the value to test for matches
  // array pos_var is the 1-dimensional array of possible matches
  // returns pos_match=1 if pos matches at least one element of pos_var and pos_match=0 otherwise
  // example code: 
  // if(r_in(3,{1,2,3,4})) will evaluate as TRUE
  int r_in(int pos,int[] pos_var) {
    int pos_match;
    int all_matches[size(pos_var)];
    
    for (p in 1:(size(pos_var))) {
      all_matches[p] = (pos_var[p]==pos);
    }
    
    if(sum(all_matches)>0) {
      pos_match = 1;
      return pos_match;
    } else {
      pos_match = 0;
      return pos_match;
    }
    
  }
}

aaronjg · April 12, 2018, 9:06pm

That’s great! Would it be possible to use the size() function and avoid the third argument?

saudiwin · April 13, 2018, 1:14am

Yes! Thanks for the tip. Code updated.

jonah · April 13, 2018, 2:00am

Nice! @Bob_Carpenter I’m curious, is this also how you would code this up?

Somewhat related: A while ago I started planning and fiddling with an R package of tested user defined functions in the Stan language, which I hope to resume soonishly (will be on GitHub, contributions welcome). Functions would be exposed to R so users can try them out in R before using them in Stan programs, and they would be able to be added to Stan programs easily. This seems like a good candidate for one of the early entries.

saudiwin · April 13, 2018, 12:16pm

Sure thing Jonah. I think that kind of package would be useful if we can find good categories for functions so that people can find what they are looking for, i.e., data-munging vs. statistical vs. arithmetic functions.

wds15 · April 13, 2018, 12:52pm

A package as you suggest would be really useful. I have a huge utils.stan file by now which make my life so much simpler when coding up some data munging or whatever in Stan.

What I am saying is that I would try to contribute to this.

dpastoor · April 15, 2018, 6:20pm

saudiwin:

functions {
// function that is comparable to R’s %in% function
// pos is the value to test for matches
// array pos_var is the 1-dimensional array of possible matches
// returns pos_match=1 if pos matches at least one element of pos_var and pos_match=0 otherwise
// example code:
// if(r_in(3,{1,2,3,4})) will evaluate as TRUE
int r_in(int pos,int pos_var) {
int pos_match;
int all_matches[size(pos_var)];
for (p in 1:(size(pos_var))) {
  all_matches[p] = (pos_var[p]==pos);
}

if(sum(all_matches)&gt;0) {
  pos_match = 1;
  return pos_match;
} else {
  pos_match = 0;
  return pos_match;
}
}
}

I’m not sure how ‘hot’ the paths you’d use this ever would be, but here are some considerations to optimize the code considerably

functions {
   // if(r_in(3,{1,2,3,4})) will evaluate as 1
  int r_in(int pos,int[] pos_var) {
   
    for (p in 1:(size(pos_var))) {
       if (pos_var[p]==pos) {
       // can return immediately, as soon as find a match
          return 1;
       } 
    }
    return 0;
  }
}

Two things that are changed

definitely no need to sum everything, as that essentially makes you traverse the vector twice, on top of doing more work (summing).
can exit as soon as find a match, no need to keep scanning, as we know at least one is present, if you get through the entire loop without an exit, that means no matches so return 0. If you do need to iterate over the entire thing (say return the number of times was in the vector), then store the count elsewhere and increment it every time a match, rather than doing a sum of a whole bunch of 0’s.

In general, the ‘tricks’ such as using sum for presence (within R itself) are artifacts of R’s inherent slowness of explicit loops, so explicitly looping and returning early will actually be slower than just doing a vectorized sum. In C++, this is not the case, so you can take advantage of not needing to think always in terms of contorting activities to be vectorized.

Finally, I haven’t been keeping up with stan these days so I’m not sure if its valid or not, but might want to consider changing the function signature as the return value being a boolean, rather than 0/1.

Cheers

saudiwin · April 16, 2018, 1:24am

Thanks much! Faster code is always appreciated. I’ll update the post.

saudiwin · April 16, 2018, 1:26am

Also re: return value, the Stan manual says boolean functions return 0 or 1, so just going off of that.

Bob_Carpenter · April 17, 2018, 1:24am

I’d have coded it the way @dpastoor did for exactly the reasons he gave.

That’s all we have for boolean in Stan to date. I’d like to add a proper boolean type. It’d be a subtype of int the way int is a subtype of real so as not to break backward compatibility.

Topic		Replies	Views
Stan equivalent to R's which() Algorithms	12	2920	August 21, 2021
How to find the location of a value in a vector General	3	1514	December 13, 2020
Find or Which command in Stan? General	8	2488	July 20, 2017
Equivalent of R's which? General	5	110	August 8, 2024
Syntax and scope for Stan language includes Developers	13	1618	February 7, 2017

Stan equivalent of R's %in% function

Related topics