Compiling fails with STAN_CPP_OPTIMS=TRUE

I am trying to optimize some Stan models using cmdstan-2.30.1.
When I use the following make command, everything works well:


However, if I introduce the STAN_CPP_OPTIMS flag, everything fails:

I am not sure what the difference between these two optimization flags, but would love to know if I should be using STAN_CPP_OPTIMS, and how I might be able to solve the errors that occur.

Since there are so many errors, I only copied some of the terminal output here:
stan_cpp_optims errors.txt (9.1 KB)

Many thanks for your help!

I think this is actually due to an out-of-date compiler, what output do you get from :

g++ -v

Thanks, Andrew!

The output I get is this:

Using built-in specs.
Target: x86_64-pc-linux-gnu
Configured with: ./configure --prefix=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01 --program-prefix= --exec-prefix=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01 --bindir=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01/bin --sbindir=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01/sbin --sysconfdir=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01/etc --datadir=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01/share --includedir=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01/include --libdir=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01/lib64 --libexecdir=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01/libexec --localstatedir=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01/var --sharedstatedir=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01/var/lib --mandir=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01/share/man --infodir=/n/helmod/apps/centos7/Core/gcc/9.3.0-fasrc01/share/info
Thread model: posix
gcc version 9.3.0 (GCC)

Hmm, it seems like there’s something strange with your toolchain. This error:

stan/lib/stan_math/stan/math/prim/eigen_plugins.h:18:25: error: expected type-specifier
 using double_return_t = std::conditional_t<std::is_const<std::remove_reference_t<T>>::value,

Happens when a compiler is too old to fully support the c++14 standard, but yours is more than new enough. Is this a computing cluster/server where you have to load a compiler module before compiling code? Is there a chance that step was missed with the cpp_optims code?

Hi Andrew,
I’m embarrassed to say you are probably right, and I might have missed this step earlier :O
Rerunning it with the appropriate loading of the gcc module, I get the following error:

— Compiling, linking C++ code —
g++ -std=c++1y -pthread -D_REENTRANT -Wno-sign-compare -Wno-ignored-attributes -DSTAN_THREADS -I stan/lib/stan_math/lib/tbb_2020.3/include -flto -fuse-linker-plugin -fdevirtualize-at-ltrans -fweb -fivopts -ftree-loop-linear -floop-strip-mine -floop-block -floop-nest-optimize -ftree-vectorize -ftree-loop-distribution -funroll-loops -floop-unroll-and-jam -fsplit-loops -fvisibility=hidden -fvisibility-inlines-hidden -DSTAN_NO_RANGE_CHECKS -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.9 -I stan/lib/stan_math/lib/boost_1.78.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials -DBOOST_DISABLE_ASSERTS -DSTAN_NO_RANGE_CHECKS -c -Wno-ignored-attributes -fweb -fivopts -ftree-loop-linear -floop-strip-mine -floop-block -floop-nest-optimize -ftree-vectorize -ftree-loop-distribution -funroll-loops -floop-unroll-and-jam -fsplit-loops -fvisibility=hidden -fvisibility-inlines-hidden -flto -fuse-linker-plugin -fdevirtualize-at-ltrans -x c++ -o /net/rcstorenfs02/ifs/rc_labs/gershman_lab/users/Stan/1_code/shared_code/rdm_hierarchical2/rdm_hierarchical_super_model_untransformedParams_posneglrnexoCombined_deltaExp_bConstrained_TauBound_LitPriors_exoFrac_optim2.o /net/rcstorenfs02/ifs/rc_labs/gershman_lab/users/Stan/1_code/shared_code/rdm_hierarchical2/rdm_hierarchical_super_model_untransformedParams_posneglrnexoCombined_deltaExp_bConstrained_TauBound_LitPriors_exoFrac_optim2.hpp
/net/rcstorenfs02/ifs/rc_labs/gershman_lab/users/Stan/1_code/shared_code/rdm_hierarchical2/rdm_hierarchical_super_model_untransformedParams_posneglrnexoCombined_deltaExp_bConstrained_TauBound_LitPriors_exoFrac_optim2.hpp:1: sorry, unimplemented: make: *** [/net/rcstorenfs02/ifs/rc_labs/gershman_lab/users/Stan/1_code/shared_code/rdm_hierarchical2/rdm_hierarchical_super_model_untransformedParams_posneglrnexoCombined_deltaExp_bConstrained_TauBound_LitPriors_exoFrac_optim2] Error 1

Does this give any useful information for the source of the error?
Thanks again for being so helpful!

Unfortunately that doesn’t seem to contain a full error message. Can you try cleaning and rebuilding cmdstan with the new gcc loaded?

And another debugging step, does this only occur with your model or does it occur for you with the bernoulli example model as well?

Thanks for these suggestions., Andrew
I tried cleaning and rebuilding like so:

make clean-all
make build

and also tested the example bernoulli model, but this still resulted in the same error.

I recently ran into a similar issue on a cluster environment because STAN_CPP_OPTIMS enabled some features my compiler there did not support. In particular the error was caused by the -floop family of flags, the error was something like sorry, unimplemented: Graphite loop optimizations cannot be used.

It may just be the case that the STAN_CPP_OPTIMS flag is not well suited for your exact environment, but you can manually enable the flags it contains one by one and see if they work

Hi Brian,

Thanks for pointing this out. I think I must be doing something wrong in trying to set the CXXFLAGS myself. I followed this example in two ways:

  1. Within the make command:
    make STAN_THREADS=TRUE CXXFLAGS+= -fvectorize -ftree-vectorize -fslp-vectorize -ftree-slp-vectorize -fno-standalone-debug -fstrict-return -funroll-loops -flto=full -fstrict-vtable-pointers -fforce-emit-vtables STAN_NO_RANGE_CHECKS=TRUE STANCFLAGS=--O1 /Path/To/File/bernoulli_example

  2. In the make/local file:
    CXXFLAGS+= -fvectorize -ftree-vectorize -fslp-vectorize -ftree-slp-vectorize -fno-standalone-debug -fstrict-return -funroll-loops -flto=full -fstrict-vtable-pointers -fforce-emit-vtables

However, it seems like none of these optiones are being detected properly:

make: vectorize: No such file or directory
make: tree-vectorize: No such file or directory
make: slp-vectorize: No such file or directory
make: tree-slp-vectorize: No such file or directory
make: no-standalone-debug: No such file or directory
make: strict-return: No such file or directory
make: unroll-loops: No such file or directory
make: lto=full: No such file or directory
make: strict-vtable-pointers: No such file or directory
make: force-emit-vtables: No such file or directory
make: *** No rule to make target `force-emit-vtables'.  Stop.

I would appreciate your feedback on what it is that I’m doing wrong.

On the command line, you cannot use +=, and I believe you have to use quotes, like

make STAN_THREADS=TRUE CXXFLAGS="-fvectorize -ftree-vectorize -fslp-vectorize -ftree-slp-vectorize -fno-standalone-debug -fstrict-return -funroll-loops -flto=full -fstrict-vtable-pointers -fforce-emit-vtables" STAN_NO_RANGE_CHECKS=TRUE STANCFLAGS=--O1 /Path/To/File/bernoulli_example

I don’t know why the snippet you posted would not have worked in make/local

I see. Since this also failed, I introduced each flag separately.

Using -ftree-vectorize, -ftree-slp-vectorize, -funroll-loops, or -flto=full got this error:

--- Compiling, linking C++ code ---
fatal error: stan/model/model_header.hpp: No such file or directory
    3 | #include <stan/model/model_header.hpp>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

For other flags I simply get this error:
g++: error: make: ***

Could it be that the environment I’m working in does not support any of these options?