Speed difference between rstan and cmdstan for a simple model

rfc · November 7, 2021, 7:35pm

I used linux perf to measure where the CPU spends time when sampling from the model with both cmdstan 2.27.0 and cmdstan 2.28.1 .

# n.b "bench" is the name of the model
perf record -F 99 ./bench sample data file=bench-data.json output file=/dev/null random seed=1234
perf report

The perf report output shows the percentage of CPU time spent inside each function call. The raw output is a bit overwhelming, but when we print the time by shared library name per cmdstan version, we can see a big difference in the amount of time spent in libc-2.31.so:

Shared Object	cmdstan 2.27.0	cmdstan 2.28.1
[kernel.kallsyms]	6.05%	4.15%
bench	65.33%	44.51%
libc-2.31.so	19.15%	38.42%
libm-2.31.so	0.21%	0.03%
libpthread-2.31.so	0.27%	0.30%
libstdc++.so.6.0.28	8.68%	12.42%
total	99.69%	99.83%

When we drill down and compare the CPU time spent inside the top 5 function calls inside libc-2.31.so there’s quite a big difference in __GI___printf_fp_l:

Shared Object	Symbol	cmdstan 2.27.0	cmdstan 2.28.1
libc-2.31.so	__GI___printf_fp_l	7.24%	17.45%
libc-2.31.so	__memmove_avx_unaligned_erms	4.93%	4.24%
libc-2.31.so	hack_digit	1.53%	3.59%
libc-2.31.so	__vfprintf_internal	1.21%	2.98%
libc-2.31.so	__mpn_mul_1	0.82%	1.44%

Both __printf_fp_l and hack_digit appear to be internal parts of glibc used to print floating point values: glibc/printf_fp.c at glibc-2.31 · bminor/glibc · GitHub

This suggests cmdstan 2.28.1 might be spending 2x as much time as cmdstan 2.27.0 printing floating point values! Is this expected? Superficially, when I compare the file sizes of both output.csv files, they’re the same, so it doesn’t seem like the user is getting twice as many floats printed in 2.28.1 to compensate them for roughly the double CPU time.

Topic		Replies	Views
CmdStan slower than PyStan? Interfaces cmdstan , pystan	26	2928	April 17, 2019
Cmdstanr sampling very slow compared to rstan CmdStan	13	731	December 24, 2024
RStan is slower than Pystan Interfaces	34	6847	March 2, 2018
Any way to make Stan competitive with Tensorflow for maximum likelihood? Algorithms	43	6753	July 31, 2019
Slower sampling after rstan update RStan	13	811	January 24, 2024

Speed difference between rstan and cmdstan for a simple model

Related topics