Improving the efficiency of *_dot_product for trinary-and-redundant matrices

The order dependence you describe above made me suspect of something in the compiler optimizations. Here is what I get on my machine with stanc --O1 and default make/local (aka O=3).

# A tibble: 10 × 2
   name                value
   <fct>               <dbl>
 1 yesdiff_order1_mike  2.41
 2 yesdiff_order0_mike  2.45
 3 nodiff_order0_cdp    2.98
 4 yesdiff_order1_cdp   3.08
 5 yesdiff_order0_cdp   3.13
 6 cdp_alone            3.22
 7 nodiff_order1_cdp    3.34
 8 nodiff_order0_mike  12.0 
 9 mike                13.6 
10 nodiff_order1_mike  15.0 

If I do this again with O=0 and no stanc optimizations, I get:

# A tibble: 10 × 2
   name                value
   <fct>               <dbl>
 1 mike                 2.38
 2 yesdiff_order1_mike  2.62
 3 yesdiff_order0_mike  2.67
 4 nodiff_order1_mike   2.84
 5 nodiff_order0_mike   2.88
 6 yesdiff_order1_cdp   3.41
 7 nodiff_order0_cdp    3.45
 8 nodiff_order1_cdp    3.48
 9 cdp_alone            3.49
10 yesdiff_order0_cdp   3.63

which is… weird. All the mikes are now faster than the cdps, and the overall variance is way lower. At this point I think I know what’s going on.

So I ran with stanc optimizations but still O=0 in make/local and got:

# A tibble: 10 × 2
   name                value
   <fct>               <dbl>
 1 yesdiff_order1_mike  2.80
 2 yesdiff_order0_mike  3.00
 3 yesdiff_order1_cdp   3.53
 4 cdp_alone            3.54
 5 nodiff_order1_cdp    4.00
 6 nodiff_order0_cdp    4.01
 7 yesdiff_order0_cdp   4.43
 8 mike                12.5 
 9 nodiff_order0_mike  17.3 
10 nodiff_order1_mike  17.4 

Which I’m willing to call essentially identical to the first run since I had other processes running, etc.

Basically, it seems like the weird results you were observing are 100% due to edge cases in the Stanc compiler optimizations. I’m going to do some digging into what is happening, and probably open a bug report based on it, but that’s the answer.

4 Likes