The order dependence you describe above made me suspect of something in the compiler optimizations. Here is what I get on my machine with stanc --O1
and default make/local
(aka O=3
).
# A tibble: 10 × 2
name value
<fct> <dbl>
1 yesdiff_order1_mike 2.41
2 yesdiff_order0_mike 2.45
3 nodiff_order0_cdp 2.98
4 yesdiff_order1_cdp 3.08
5 yesdiff_order0_cdp 3.13
6 cdp_alone 3.22
7 nodiff_order1_cdp 3.34
8 nodiff_order0_mike 12.0
9 mike 13.6
10 nodiff_order1_mike 15.0
If I do this again with O=0
and no stanc optimizations, I get:
# A tibble: 10 × 2
name value
<fct> <dbl>
1 mike 2.38
2 yesdiff_order1_mike 2.62
3 yesdiff_order0_mike 2.67
4 nodiff_order1_mike 2.84
5 nodiff_order0_mike 2.88
6 yesdiff_order1_cdp 3.41
7 nodiff_order0_cdp 3.45
8 nodiff_order1_cdp 3.48
9 cdp_alone 3.49
10 yesdiff_order0_cdp 3.63
which is… weird. All the mike
s are now faster than the cdp
s, and the overall variance is way lower. At this point I think I know what’s going on.
So I ran with stanc optimizations but still O=0
in make/local
and got:
# A tibble: 10 × 2
name value
<fct> <dbl>
1 yesdiff_order1_mike 2.80
2 yesdiff_order0_mike 3.00
3 yesdiff_order1_cdp 3.53
4 cdp_alone 3.54
5 nodiff_order1_cdp 4.00
6 nodiff_order0_cdp 4.01
7 yesdiff_order0_cdp 4.43
8 mike 12.5
9 nodiff_order0_mike 17.3
10 nodiff_order1_mike 17.4
Which I’m willing to call essentially identical to the first run since I had other processes running, etc.
Basically, it seems like the weird results you were observing are 100% due to edge cases in the Stanc compiler optimizations. I’m going to do some digging into what is happening, and probably open a bug report based on it, but that’s the answer.