I have a fairly complicated model (~120 LOC entire .stan file). The input JSON has like 8MB, and the matrices I use during the computations are about four times the size of the input. However, when I run optimize/sample with the compiled stan program, it very quickly eats all the memory (up to 20GB) and then crashes with a simple error message saying “Killed”. I don’t really understand where this is coming from: as I say, the structures I’m using aren’t themselves that big by far, but I don’t really understand what other memory stan requires for usage. What steps can I take to understand why the memory consumption is so high?
Are you locally scoping the matrices?
I kinda managed to solve the problems by vectorizing a lot of the code. For example I wanted to compute a per-row cumulative sum of some matrix, which using for-cycles exploded the memory, but then I figured I can achieve the same logic by matrix-multiplying by a triangular matrix precomputed in trasform data. This then reduced the memory consumption significantly even though this feels… worse readable?¨
Anyway, I still don’t understand what caused the 1000x memory increase for the same logic, I suppose it may have something to do with autodiff, but I find it confusing, so mainly I wanted to ask about tools or techniques that would help me understand what causes the memory to be allocated, not to help with fixing a specific piece of code.
One (somewhat indirect) way to observe Stan’s autodiff memory usage is with the built-in profile
blocks. Targetted use of these can help you find what parts of your model seem to be particularly heavy on the AD system if you run into problems like this, though as you’ve noticed vectorizing and doing as much computation as you can in transformed data is almost always a good idea