That's only a couple percent variation, which is normal for any kind of speed comparison unless you take extreme measures to make sure no background processes ever run.
I start to think the memory bandwidth bound is the real explanation. The 12.6GB/s is with single core, thus 4 cores should have higher bandwidth. I haven't measure it but it's may be several times higher.
That's right. The variation looks normal. There shouldn't be any random scheduling problem.
And what, if you start 4 processes of ad_advertisement with cmdstan, does it differ from RStan?