Future rstan is broken with clang++

So I just upgraded to Catalina and I first installed R 3.6.1. Next I installed all packages required to run rstan, including Rcpp. I did so using the binary packages. Then I tried the exception Rcpp example you shared, @bgoodri. It did fail as it showed an “unkown exception”. Then I also tried to get rstan to sample the mvn example - it failed with very strange errors. However, I figured that installing everything from source could help here… and it does!

So I reinstalled a fresh R and installed all packages from source instead of using the binary packages. Now the example with the exception works just fine and the mvn_orange example also just samples fine. I am using the vanilla clang++ of the system.

In short - I don’t think that we need to do anything for Catalina. Users just need to install from source their R packages. That’s not so convenient, but likely something which we don’t have in our hands… if I should try another model, let me know (I just took 8schools which works just fine as well)

2 Likes

Can we provide unofficial (non-cran) binary that users could use?

What happens when you have Rcpp, rstan, etc. installed from source but you load a completely unrelated C++ package like readr before running MCMC?

Whow.

I installed readr as binary and then executed the exception example you shared (loading readr installed as binary and then Rcpp installed from source) => and it failed with “unknown reason”.

Then I swapped back the readr to the source compiled version and the exception thing worked again.

Next, I reinstalled readr as binary, then loaded first Rcpp (installed from source) and after that the readr. Then again the exception thing failed.

So it looks as if loading any C++ based package which has not been compiled from source on Catalina breaks the system.

2 Likes

We have an additional problem in that the for_2.20 branch of rstan is not currently installable with StanHeaders 2.19.x on CRAN. I get this error when doing so about a call to a deleted constructor at line 60 of autodiffstackstorage.hpp .

It would be good if we could work around this on the for_2.20 branch of rstan because StanHeaders 2.21.0 is not compatible with rstan 2.19.x on CRAN for deeper reasons. You will have to pull in the most recent changes.

Just to confirm: You want to install the rstan 2.21 into a R installation with StanHeaders 2.19.x … and you expect that to work?

Why would that be necessary? I mean, wouldn’t we add to rstan 2.21 a pre-requisite of a 2.21 StanHeaders such that the above situation would never arise?

Or did you mean something else?

Well, we either have to upload StanHeaders first or rstan first. So, one of them has to pass all the R CMD check with the other as it is on CRAN. In this case, I think it is easier to start with rstan (that has the 2.21 Stan library copied into its src/ directory), but we have one compiler error from Stan Math 2.19.

You gimme a lot of home-work!

The Catalina compile problem is solved.

I just fixed rstan 2.21 so that it does compile and install whenever StanHeaders 2.19 is around. There was a left-over reference to ChainableStack which is handled in the background by stan-math.

However, with the setup of a rstan 2.21 and StanHeaders 2.19 I cannot compile any more a Stan model, because of:

In file included from filefc7f18865831.cpp:743:
/Users/sebi/R/2019-11-01-transient/rstan/include/rstan_next/stan_fit.hpp:7:10: fatal error: 'stan/model/model_base.hpp' file not found
#include <stan/model/model_base.hpp>
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
17 warnings and 10 errors generated.
make: *** [filefc7f18865831.o] Error 1

which makes sense, since this header was introduced in 2.20 of Stan. Not sure how you (or if you) want to address this. If you don’t mind I will add me as author of rstan as of now? Looks like I have to be blamed for things around this if in doubt.

EDIT: I noticed that the latest for_2.20 stops working due to missing symbols related to stan_fit. Turns out that in commit 7bda1eb106920e52c131173952b2a55aa0e3491c the empty destructor for stan_fit got removed from stan_fit.cpp - but we need that declaration since this results otherwise in a missing symbol.

So now everything is working. We can

  • install rstan 2.21 into a R installation with StanHeaders 2.19 - but then you cannot compile any models as explained above
  • install rstan 2.21 into a R installation with StanHeaders 2.21 … and everything works; even on macOS Catalina
2 Likes

Yeah, I have to copy the Stan 2.21 library and algorithms into the rstan sources in order to compile new models, while hopefully compiling old models. I’ll take a look. Thanks.

Sounds good.

BTW, what do you think about the Catalina story?

We could include into the rstan plugin thing for the PKG_LDFLAGS variable the hard-coding of the ABI library as I suggested in the other thread. This should make things work for rstan with XCode’s clang or with clang7 from CRAN. So I think it would make rstan more robust. My suggestion would be to make this only for Catalina.

It does sound good. I would like to hear from a CRAN / Mac expert though. Do you have a link to the thread on R-SIG-Mac? If possible, we should do it for Catalina and any Mac OS post-Catalina.

Assuming your fix is solid, we have to come up with some Makevars logic to do it in packages that come with Stan programs in addition to the rstan plugin.

My message is awaiting approval for posting as I am not signed up to that list. I am curious what they say…I really think this is an Apple clang bug.

I did that. Upon further review, I can patch the two version.hpp files to work around our _REENTRANT and Eigen plugin problems

which would allow a more straightforward upload of StanHeaders 2.21 to CRAN followed by rstan 2.21 without having to worry about making StanHeaders 2.19 and rstan 2.21 work together.

However, none of the reverse dependencies of StanHeaders are ready for TBB, so they throw stuff like

** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘RBesT’ in dyn.load(file, DLLpath = DLLpath, …):
unable to load shared object ‘/tmp/RBesT.Rcheck/00LOCK-RBesT/00new/RBesT/libs/RBesT.so’:
/tmp/RBesT.Rcheck/00LOCK-RBesT/00new/RBesT/libs/RBesT.so: undefined symbol: _ZN3tbb8internal26task_scheduler_observer_v37observeEb
Error: loading failed
Execution halted
ERROR: loading failed

if there is a TBB on the system path or else a compiler error about not finding any <tbb/*.h> due to lack of LinkingTo: RcppParallel.

You mentioned before that you can ifdef things to make the TBB stuff optional for now. Can you push those onto the StanHeaders_2.21 branch of Stan Math

and, if necessary, the StanHeaders_2.21 branch of Stan Library

If you need those branches in a StanHeaders, you can source

https://raw.githubusercontent.com/stan-dev/rstan/for_2.20/StanHeaders/install-github.R

and call it like

install_StanHeaders(rstan_branch = "for_2.20", 
                    math_branch = "StanHeaders_2.21",
                    library_branch = "StanHeaders_2.21")

The last two commits from me to stan-math StanHeaders_2.21 branch do this:

  • The TBB is not used at all in cases whenever STAN_THREADS is not defined. The TBB is used if STAN_THREADS is defined.
  • The stan::math::lgamma now always uses the boost implementation as I was not able to get things to work otherwise. It appears as if _REENTRANT gets somewhere define after including cmath. If that happens, then we have a catastrophe which causes cmath to not export lgamma_r, but stan-math will think it is defined. This means a ~5% performance it or so for Linux&macOS for models using this function heavily.

With the above changes I can have StanHeaders 2.21.0 and rstan 2.19.2 installed and successfully compile & load RBesT.

For the lgamma stuff I suggest that we take the small performance hit for now and tell people to use the new _REENTRANT flag when they build their package (so put it in src/Makevars). As alternative we may introduce a STAN_HAS_LGAMMA_R macro which people can set if they are sure that lgamma_r is available.

Before I forget: Have you backported the boost::math::lgamma fixes which avoid overflow for large arguments of lgamma? That would be needed to get finite results from that function for arguments beyond 10^{25}.

Thanks for adding me to the author list.

OK. I am getting the same results now. Which boost::math::lgamma commit(s) do I need to backport to rstan?

It’s this commit:

You need to be able to run the stan-math tests for lgamma. These do call lgamma from boost directly and they only work if you have a patched boost in use.

The boost files involved are

  • lib/boost_1.69.0/boost/math/special_functions/detail/lanczos_sse2.hpp
  • lib/boost_1.69.0/boost/math/special_functions/gamma.hpp

BTW… we will not get the full benefits of faster compilation as we do not distribute the pre-compiled headers (for good reason as these blow up the binary by ~100MB). However, what would you say about the following approach: The standard is to not use the pre-compiled headers, but whenever people build the binary rstan package on their systems and define NOT_CRAN=true then we build during the compilation phase the pre-compiled headers which we use then during building the models. This way only people who build rstan from source in a specific will get this feature. This would be a quick and dirty way to include this functionality in a semi-official way and we can improve further later on. Makes sense?

OK, I can supercede those Boost files.

I do think having an option for precompiled headers would be good, but we may need to make a meta-header in rstan to facilitate that.

Ok… I will have a look, but that’s optional for releasing the first 21 I suppose.

Let’s see what Mac r experts say :

https://stat.ethz.ch/pipermail/r-sig-mac/2019-November/013116.html

1 Like

I have yet another fix for the Eigen plugin problem

that uses the zzz.R file to set PKG_CPPFLAGS. In principle, we could also put -D_REENTRANT there, and it would be set for everything except OpenMx and nlmixr (which do not use rstan and therefore do not load StanHeaders) but I have separate PRs for those two packages.

Hm… I tried the pre-compiled headers thing for rstan. So I did generate a model_header.hpp.gch which I did feed into clang accordingly. For the mvn_orange example I get these compilation times:

  • cmdstan 2.21: 17.5s
  • rstan 2.19: 40s
  • rstan 2.21: 27.2s
  • rstan 2.21 with pre-compiled: 23s

So cmdstan is clearly best and rstan 2.21 is a huge win over 2.19, but adding pre-compiled headers to rstan is not that much of a gain right now. We probably need to review in detail to see where the difference is coming from, but I would say we just leave it as is for rstan for the moment.