Future rstan is broken with clang++

@wds15 and I have been unable to figure out why clang++ incorrectly compiles a branch of RStan that works with the master branches of Stan Math and Stan Libraries. The usual behavior of Stan is that after the model is compiled, sampling will use essentially a constant amount of RAM, which is what happens when everything is compiled with g++ (at least on Linux). However, if everything is compiled with clang++, sampling will keep slowly consuming more RAM until you run out, even though recover_memory is getting called and even though none of the tools we have used so far has found evidence of a memory leak.

There is a wiki page with various peopleā€™s experience at

and a mostly clean C++ file that can be compiled but will eat all your RAM if you do MCMC mvn_orange.cpp (31.4 KB) .

If there is someone who knows how to use Instruments that comes with Xcode, that might be helpful. Or someone with Linux that knows how to work with objdump or similar.

Unfortunately, we donā€™t have any good alternatives right now. The RStan on CRAN (which does header-only compilation, except for SUNDIALS) wonā€™t work with Catalina, but the default compiler for Catalina is clang++ so non-trivial models will exhaust the RAM if do the CRTP pattern. Also, RStudio Cloud hasnā€™t worked with RStan on CRAN for a couple of releases because we canā€™t do header-only compilation within the 1GB limit users are given. We could get under that if do CRTP but only if the userā€™s model is compiled with clang++, which would then exceed the 1GB limit when sampling is called.

1 Like

Mamma Mia! Few Qs

  1. This is for both mac and linux?

  2. What version(s) of clang++ have you tried out?

  3. Is this only for rstan (and not cmdstan or pystan)?

Yes.

Several. Me personally 7 and 9 on Linux. @wds15 did 6, 7, and 8 on Mac.

Not CmdStan. No one has tried PyStan (I donā€™t know how close PyStan is to being able to compile the services in advance, which seems to be a necessary condition to see this behavior from clang++ with RStan).

Is rstan compiled as a .so? The below may be helpful at least in understanding the issue with exceptions, but not with the mem issue

Yes. We are banking on the fact that using CRTP avoids the problem with exceptions. So, the facts that clang++ and g++ compile the same code differently, clang++ creates a much bigger .so object than g++ (which was not the case for rstan <= 2.19.x), clang++ compiled code eats all the RAM despite recover_memory() being called but manages to avoid memory leak detection from all static analysis and instrumenting tools, and clang++ calculates the same log_prob value as g++ but returns a vector of zeros for the gradient are problems.

I am still working on this. So far I have instrumented the model generated by Rstan and plugged it into CmdStan. Doing so gives me a nicely sampling program. It appears that there is indeed some nasty R/C++ interaction going of the rails here. This makes it a lot harder to debug.

(I never wanted to learn so much RCpp stuff, but, oh well, ā€¦)

1 Like

I would guess that as well due to the fact that compiling rstan with g++ and the model with clang++ is sufficient to trigger the problems, so it is likely a problem with the compiled model. On the other hand, the only Rcpp stuff is returning a pointer to the instantiated C++ object, which has the right address and is of the right type because the log_prob and write_array methods work.

Is it possible that with clang++, the autodiff tree is put on the heap somewhere by the userā€™s model rather than by rstanā€™s stan_fit class? And then rstan canā€™t find it? Something like that is my only explanation for why the memory explodes even though I put print statements into recover_memory.hpp saying that it is being called and for why, when I call grad_log_prob it returns a vector of zeros.

OK. I put print statements after
https://github.com/stan-dev/math/blob/develop/stan/math/rev/mat/functor/gradient.hpp#L46
and they come up on the zero-th iteration like

x_var =
-0.552004 -1.90983 -0.436813 -1.83706 -0.53393 0.0264679 1.22124 -1.36579 -1.0082 0.239205
x_var.adj()=
0 0 0 0 0 0 0 0 0 0

So, why is .adj() being initialized to zeros instead of ones?

With -fsanitize=undefined with the model compiling via g++ or `clang++1 I am getting:

/home/ben/r-devel/library/StanHeaders/include/stan/math/rev/core/operator_addition.hpp:17:37: runtime error: member access within address 0x5565bc1d7778 which does not point to an object of type ā€˜variā€™
0x5565bc1d7778: note: object is of type ā€˜stan::math::variā€™
00 00 00 00 00 cc fe b0 1c 7f 00 00 b8 87 a3 3c ac cf e2 3f 00 00 00 00 00 00 00 00 00 cc fe b0
^~~~~~~~~~~~~~~~~~~~~~~
vptr for ā€˜stan::math::variā€™
/home/ben/r-devel/library/StanHeaders/include/stan/math/rev/scal/fun/exp.hpp:14:58: runtime error: member access within address 0x5565bc1d7778 which does not point to an object of type ā€˜variā€™
0x5565bc1d7778: note: object is of type ā€˜stan::math::variā€™
00 00 00 00 00 cc fe b0 1c 7f 00 00 b8 87 a3 3c ac cf e2 3f 00 00 00 00 00 00 00 00 00 cc fe b0
^~~~~~~~~~~~~~~~~~~~~~~
vptr for ā€˜stan::math::variā€™
/home/ben/r-devel/library/StanHeaders/include/stan/math/rev/core/operator_multiplication.hpp:17:25: runtime error: member access within address 0x5565bc1d7790 which does not point to an object of type ā€˜variā€™
0x5565bc1d7790: note: object is of type ā€˜stan::math::variā€™
00 00 00 00 00 cc fe b0 1c 7f 00 00 f4 33 4d 53 4d c0 e9 bf 00 00 00 00 00 00 00 00 00 cc fe b0
^~~~~~~~~~~~~~~~~~~~~~~
vptr for ā€˜stan::math::variā€™
/home/ben/r-devel/library/StanHeaders/include/stan/math/rev/core/operator_addition.hpp:17:25: runtime error: member access within address 0x5565bc1d7760 which does not point to an object of type ā€˜variā€™
0x5565bc1d7760: note: object is of type ā€˜stan::math::variā€™
00 00 00 00 00 cc fe b0 1c 7f 00 00 dc 70 3b c5 42 f3 f2 bf 00 00 00 00 00 00 00 00 00 cc fe b0
^~~~~~~~~~~~~~~~~~~~~~~
vptr for ā€˜stan::math::variā€™
/home/ben/r-devel/library/StanHeaders/include/stan/math/rev/scal/fun/value_of_rec.hpp:17:58: runtime error: member access within address 0x5565bc1d7790 which does not point to an object of type ā€˜variā€™
0x5565bc1d7790: note: object is of type ā€˜stan::math::variā€™
00 00 00 00 00 cc fe b0 1c 7f 00 00 f4 33 4d 53 4d c0 e9 bf 00 00 00 00 00 00 00 00 00 cc fe b0
^~~~~~~~~~~~~~~~~~~~~~~
vptr for ā€˜stan::math::variā€™
/home/ben/r-devel/library/StanHeaders/include/stan/math/rev/scal/fun/value_of_rec.hpp:17:58: runtime error: member access within address 0x5565bc1d77a8 which does not point to an object of type ā€˜variā€™
0x5565bc1d77a8: note: object is of type ā€˜stan::math::variā€™
00 00 00 00 00 cc fe b0 1c 7f 00 00 54 30 73 f4 c4 81 f5 bf 00 00 00 00 00 00 00 00 00 cc fe b0
^~~~~~~~~~~~~~~~~~~~~~~
vptr for ā€˜stan::math::variā€™
/home/ben/r-devel/library/StanHeaders/include/stan/math/rev/scal/fun/value_of.hpp:23:54: runtime error: member access within address 0x5565bc1d7790 which does not point to an object of type ā€˜variā€™
0x5565bc1d7790: note: object is of type ā€˜stan::math::variā€™
00 00 00 00 00 cc fe b0 1c 7f 00 00 f4 33 4d 53 4d c0 e9 bf 00 00 00 00 00 00 00 00 00 cc fe b0
^~~~~~~~~~~~~~~~~~~~~~~
vptr for ā€˜stan::math::variā€™
/home/ben/r-devel/library/StanHeaders/include/stan/math/rev/core/precomputed_gradients.hpp:73:23: runtime error: member access within address 0x5565bc1d7790 which does not point to an object of type ā€˜variā€™
0x5565bc1d7790: note: object is of type ā€˜stan::math::variā€™
00 00 00 00 00 cc fe b0 1c 7f 00 00 f4 33 4d 53 4d c0 e9 bf 00 00 00 00 00 00 00 00 00 cc fe b0
^~~~~~~~~~~~~~~~~~~~~~~
vptr for ā€˜stan::math::variā€™
/home/ben/r-devel/library/StanHeaders/include/stan/math/rev/core/operator_addition.hpp:23:18: runtime error: member access within address 0x5565bc1d7760 which does not point to an object of type ā€˜variā€™
0x5565bc1d7760: note: object is of type ā€˜stan::math::variā€™
00 00 00 00 00 cc fe b0 1c 7f 00 00 dc 70 3b c5 42 f3 f2 bf 00 00 00 00 00 00 00 00 00 cc fe b0
^~~~~~~~~~~~~~~~~~~~~~~
vptr for ā€˜stan::math::variā€™
/home/ben/r-devel/library/StanHeaders/include/stan/math/rev/core/operator_multiplication.hpp:23:18: runtime error: member access within address 0x5565bc1d7838 which does not point to an object of type ā€˜variā€™
0x5565bc1d7838: note: object is of type ā€˜stan::math::variā€™
20 05 a0 bf 00 cc fe b0 1c 7f 00 00 d0 34 9c 74 dd b8 c8 3f d0 34 9c 74 dd b8 c8 bf e8 14 d0 ab
^~~~~~~~~~~~~~~~~~~~~~~
vptr for ā€˜stan::math::variā€™
/home/ben/r-devel/library/StanHeaders/include/stan/math/rev/scal/fun/exp.hpp:15:29: runtime error: member access within address 0x5565bc1d7778 which does not point to an object of type ā€˜variā€™
0x5565bc1d7778: note: object is of type ā€˜stan::math::variā€™
84 d5 e1 3f 00 cc fe b0 1c 7f 00 00 b8 87 a3 3c ac cf e2 3f 00 00 00 00 00 00 00 00 00 cc fe b0
^~~~~~~~~~~~~~~~~~~~~~~
vptr for ā€˜stan::math::variā€™
/home/ben/r-devel/library/StanHeaders/include/stan/math/rev/core/operator_addition.hpp:24:18: runtime error: member access within address 0x5565bc1d7778 which does not point to an object of type ā€˜variā€™
0x5565bc1d7778: note: object is of type ā€˜stan::math::variā€™
84 d5 e1 3f 00 cc fe b0 1c 7f 00 00 b8 87 a3 3c ac cf e2 3f 14 57 aa 3e ef d3 dc bf 00 cc fe b0
^~~~~~~~~~~~~~~~~~~~~~~
vptr for ā€˜stan::math::variā€™

Does this mean anything to anybody?

So, why is .adj() being initialized to zeros instead of ones?

Thatā€™s the default for adj_ no? Maybe the chain method is not being called? May be useful to print out the type though Iā€™m not sure if the answer there will print out the right type in this context since since the type is really a pointer to a vari and we want the dependent type

Using undefined option turns on a bunch of others, the vptr for ā€˜stan::math::variā€™ is from the vptr

-fsanitize=vptr : Use of an object whose vptr indicates that it is of the wrong dynamic type, or that its lifetime has not begun or has ended. Incompatible with -fno-rtti . Link must be performed by clang++ , not clang , to make sure C+Ā±specific parts of the runtime library and C++ standard libraries are present.

Can you post the model / hpp file?

Trying to look up what causes that is taking me to a stackoverflow post and a part of the gcc manual that may be useful but Iā€™m still looking around

(See the "Problems with exceptions section as well below)
https://gcc.gnu.org/wiki/Visibility

It feels like not handling the virtual methods correctly could cause both the wrong values and mem errors?

OK, I guess initializing the .adj() to all zeros is expected, but I am even more in the dark as to why its is returned as all zeros. The example I was doing was just 8 schools from the wiki page in the OP after compiling with -fsantize=undefined and linking with -lubsan.

I agree that undefined behavior could well explain both the wrong answers and the memory consumption, as well as why it works in some cases (g++, CmdStan) but not others (clang++, RStan).

Is your print statement after .chain() is called? If so then itā€™s def an issue, sounds like the .chain() method of whatever is actually calling the vari chain method (ala doing nothing) which would also mean the vptr error is a true error instead of the false positive like in the link above.

What version of ubsan are you using? Would be good to check if itā€™s the version that was patched in above link. If not then the runtime vptr error could be related to the above link (maybe idk?)

Just got off work, Iā€™ll try to look at this tmrw. Do you have a branch I can run to try this out? I think it would be best to just slam open R with a gdb session and figure out what all the objects looks like

Just a quick sanity check, does this happen with clang++ and cmdstan?

No. Not on macOS Mojave for meā€¦all good with CmdStan.

1 Like

I suppose it is an open question as to whether chain is getting called, but I am putting the print statments before and after grad is called in

It is sort of annoying to get all the pieces set up, but instructions are on the wiki page in the OP.

I am still not unclear what is going on, but I am at least able to trigger a segfault which allows to intersect things with a debugger.

So when you add

-DSTAN_MATH_REV_CORE_INIT_CHAINABLESTACK_HPP -g -O0

to the CXXFLAGS14, then the automatic initialization of the AD stack is not done. In this case you have to initialize the AD tape at the right position in your code. I have then inserted

stan::math::ChainableStack ad_stack;

into the command function of stan_fit.cpp. This way the ad stack will be initialized whenever it is neededā€¦ but this leads to segfaults which are super weird. So you can run

R -d lldb

to run R inside the clang lldb debugger. There I merely start a

run -f mvn_sample.R

this R script then compiles the mvn stan model and starts sampling. It immediatley crashes on me with this message:

> post  <- sampling(sm, init=list(init), data=data, seed=1234, chains=1)
INITIALIZING STACK
INIT STACK_ALLOC

SAMPLING FOR MODEL 'mvn_orange' NOW (CHAIN 1).
NOT INITIALIZING STACK
AGAIN INITIALIZING STACK
INIT STACK_ALLOC
Process 39240 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x88)
    frame #0: 0x00000001470ace24 file9948134ca6c9.so`stan::math::stack_alloc::alloc(this=0x0000000000000048, len=24) at stack_alloc.hpp:174:20
   171 	  //inline void* alloc(size_t len) {
   172 	  void* alloc(size_t len) {
   173 	    // Typically, just return and increment the next location.
-> 174 	    char* result = next_loc_;
   175 	    next_loc_ += len;
   176 	    // Occasionally, we have to switch blocks.
   177 	    if (unlikely(next_loc_ >= cur_block_end_)) {
Target 0: (R) stopped.
(lldb) p len
(size_t) $0 = 24
(lldb)

I have inserted a few prints to ensure that the stack does get initialized (as does the stack_alloc). However, the alloc command fails badly. It looks as if the stack_alloc members are not initialized, but that canā€™t be right since the constructor has been called.

I donā€™t get whats going on here.

I am willing to believe this is a super-nasty clang compiler bug and we should return to the old way of compiling things for clang. I admit that this is strange given that things work for cmdstan with clang.

Thinking about it a bit, we probably have issues with global variables in the shared dynamic rstan.so thing. At least this is something I would follow up. Another route to debug this could be via the RInside package:

http://dirk.eddelbuettel.com/code/rinside.html

This allows one to use R within C++. Thus, one can write a C++ program which starts the Stan program through invoking R. In turn we should be able to debug this properly.

Can you push the code that triggers the segfault to the for_2.20 branch of rstan and I will update the wiki so that people can choose between the unlimited RAM consumption or the segfault.

Also, can the segfault be triggered just by calling the $grad_log_prob method of the model_base reference class in R without going through the command C++ function?

I agree that the global variables in rstan.so sounds crashy, although I canā€™t explain why g++ manages to compile something that works perfectly.

Can you do that stack allocation stuff inside the new_model function that returns a pointer to the instantiated model?

Will do all that in a moment. I just stumbled over this:

And we are likely hit by this problem. So when you have multiple shared libraries munged together, the globals are a problem. I think this is our situation. What would resolve it is to have the stan_fit & command function as a pre-compiled object file around which is statically linked into the model so file, and this is loaded into the R process. This way things get pre-compiled, but the model is a single translational unit of its own and everything is pre-build. Let me know what you think about that.

1 Like

I love it (not sure I totally understand though). Worth trying definitely. I actually did try linking the userā€™s model to rstan.so at one point but I guess it would have been dynamically.

1 Like