Improving the efficiency of *_dot_product for trinary-and-redundant matrices

WardBrian · January 23, 2023, 3:52pm

The order dependence you describe above made me suspect of something in the compiler optimizations. Here is what I get on my machine with stanc --O1 and default make/local (aka O=3).

# A tibble: 10 × 2
   name                value
   <fct>               <dbl>
 1 yesdiff_order1_mike  2.41
 2 yesdiff_order0_mike  2.45
 3 nodiff_order0_cdp    2.98
 4 yesdiff_order1_cdp   3.08
 5 yesdiff_order0_cdp   3.13
 6 cdp_alone            3.22
 7 nodiff_order1_cdp    3.34
 8 nodiff_order0_mike  12.0 
 9 mike                13.6 
10 nodiff_order1_mike  15.0

If I do this again with O=0 and no stanc optimizations, I get:

# A tibble: 10 × 2
   name                value
   <fct>               <dbl>
 1 mike                 2.38
 2 yesdiff_order1_mike  2.62
 3 yesdiff_order0_mike  2.67
 4 nodiff_order1_mike   2.84
 5 nodiff_order0_mike   2.88
 6 yesdiff_order1_cdp   3.41
 7 nodiff_order0_cdp    3.45
 8 nodiff_order1_cdp    3.48
 9 cdp_alone            3.49
10 yesdiff_order0_cdp   3.63

which is… weird. All the mikes are now faster than the cdps, and the overall variance is way lower. At this point I think I know what’s going on.

So I ran with stanc optimizations but still O=0 in make/local and got:

# A tibble: 10 × 2
   name                value
   <fct>               <dbl>
 1 yesdiff_order1_mike  2.80
 2 yesdiff_order0_mike  3.00
 3 yesdiff_order1_cdp   3.53
 4 cdp_alone            3.54
 5 nodiff_order1_cdp    4.00
 6 nodiff_order0_cdp    4.01
 7 yesdiff_order0_cdp   4.43
 8 mike                12.5 
 9 nodiff_order0_mike  17.3 
10 nodiff_order1_mike  17.4

Which I’m willing to call essentially identical to the first run since I had other processes running, etc.

Basically, it seems like the weird results you were observing are 100% due to edge cases in the Stanc compiler optimizations. I’m going to do some digging into what is happening, and probably open a bug report based on it, but that’s the answer.

Topic		Replies	Views
Dot_product vs vectorization Developers	6	2195	April 17, 2018
Rows_dot_product is intended as a shorthand rather than compute optimization, right? Modeling	10	1080	February 15, 2021
PSA: where possible, use columns_dot_product rather than rows_dot_product Modeling techniques , specification	1	775	October 28, 2021
Using rows_dot_product in a loop to set column contents; any optimization ideas? Modeling techniques , specification	2	396	August 16, 2021
[Case-study preview] Speeding up Stan by reducing redundant computation Publicity performance	8	2119	June 6, 2020

Improving the efficiency of *_dot_product for trinary-and-redundant matrices

Related topics