Nonlinear increase in step speed with more rows of data -- possible bug?

I’m kind of confused by this. Using rstan 2.18.1. Everything in the model (hierarchical state space) is the same, just the loops over N rows of data are obviously longer. 5400 rows of data (180 subjects) took ~ 0.17s for gradient calcs, while 5800 (200 subjects) took ~ 0.9s . With fewer than 5400 rows, speed changes are roughly linear, as I would expect. These differences in grad calc time are not spurious, they are at least roughly mirrored by actual sampling performance with a fixed, low treedepth – I started looking into it because of unexpected slow performance. The win7 64 PC has heaps of RAM and is only allocating 500mb or so per chain anyway. This nonlinearity a) doesn’t manifest using ubuntu on my laptop, b) seems increased using o3 compiler flags, c) doesn’t occur with rstan 2.17.4, d) isn’t specific to the data (dropped from both front and back). I can’t post the data but if needed I can generate some and try to reproduce issue…
edit: now it is occurring with 2.17.4…

Is that with StanHeaders 2.17.x? There are potentially a lot of changes between 2.17 and 2.18.

yes, though the fact that it reported the ‘correct’ faster grad calcs once with the full data after downgrading does make me wonder. I have the same c++14 makevars setup in both cases, perhaps I should change that?

I don’t think the C++14 flag by itself should make much difference but it is worth a try. Also, are you using clang++ or g++?

on the laptop:

Reading specs from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_4/lib/gcc/x86_64-unknown-linux-gnu/5.5.0/specs
Target: x86_64-unknown-linux-gnu
Configured with: …/configure --with-isl=/home/linuxbrew/.linuxbrew/opt/isl@0.18 --with-bugurl= --prefix=/home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_4 --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-5 --with-gmp=/home/linuxbrew/.linuxbrew/opt/gmp --with-mpfr=/home/linuxbrew/.linuxbrew/opt/mpfr --with-mpc=/home/linuxbrew/.linuxbrew/opt/libmpc --enable-stage1-checking --enable-checking=release --enable-lto --with-build-config=bootstrap-debug --disable-werror --with-pkgversion=‘Homebrew gcc 5.5.0_4’ --with-boot-ldflags=’-static-libstdc++ -static-libgcc ’ --disable-nls --disable-multilib
Thread model: posix
gcc version 5.5.0 (Homebrew gcc 5.5.0_4)

Well this was a wonderfully confusing time waster that seems to have just evaporated. After fully correcting the package for the older rstan 2.17.4, the problem doesn’t exist. Then, after upgrading to rstan 2.18.1, the problem doesn’t exist. My only guesses are that a) it will come back, or b) rstan 2.18.1 might have been built from source before, perhaps under some semi correct make setup that led to this weirdness.

I think this increase in time could be explained by speed of different memory caches. See this excellent series of blog posts discussing why memory access time is nonlinearly increasing
This is highly recommended reading for anyone making speed tests with Stan or any other software (non-parallel or parallel)


That logic makes a lot of sense, I don’t know where it would apply to this case, as I’d have thought we were already onto RAM with the reduced data cases, but maybe I’m lacking imagination :) :)