Example of use of AutodiffStackSingleton (STAN_THREADS-related)

Thanks for checking this.

Just tested it and I’m getting an immediate segfault again, this time at:

125           instance_ = new AutodiffStackStorage();

originating in init_chainablestack.hpp as before. This is the first case, so when is_initialized is false, as in

    if (!is_initialized) {
      is_initialized = true;
      instance_ = new AutodiffStackStorage();
      return true;

If I set STAN_MATH_REV_CORE_INIT_CHAINABLESTACK_HPP then I get it later, at the same point, when I try to instantiate the struct.

This is at least a different sort of problem, right? How can the assignment itself be causing a segfault?

My nm output again:

000000000005aac0 W stan::math::AutodiffStackSingleton<stan::math::vari, stan::math::chainable_alloc>::AutodiffStackStorage::AutodiffStackStorage()
000000000005aac0 W stan::math::AutodiffStackSingleton<stan::math::vari, stan::math::chainable_alloc>::AutodiffStackStorage::AutodiffStackStorage()
0000000000000008 u stan::math::AutodiffStackSingleton<stan::math::vari, stan::math::chainable_alloc>::instance_
0000000000054930 W stan::math::AutodiffStackSingleton<stan::math::vari, stan::math::chainable_alloc>::~AutodiffStackSingleton()
0000000000054930 W stan::math::AutodiffStackSingleton<stan::math::vari, stan::math::chainable_alloc>::~AutodiffStackSingleton()

u means: ‘“u” The symbol is a unique global symbol. This is a GNU extension to the standard set of ELF symbol bindings. For such a symbol the dynamic linker will make sure that in the entire process there is just one symbol with this name and type in use.’

I wonder if this might be a useful discussion. An answer to a SO question “Is Meyers’ implementation of the Singleton pattern thread safe?”: https://stackoverflow.com/a/1661787

The Meyer Singleton is what we used before. It’s safe and works, but is slow and blows up on windows.

What function from Stan services kicks off the sampling? Can you point me there?

The function is hmc_nuts_diag_e_adapt in stan/services/sample/hmc_nuts_diag_e_adapt.hpp.

I’m happy with anything that fixes this. I would also like to know what the cause of the segfault is. If it’s a bug against gcc/clang or against Python’s method of linking, there’s a very large community that stands to benefit from a fix.

edit: fixed last sentence

Looks like clang might work, see https://github.com/stan-dev/pystan/pull/626

I’ve been using gcc 7. I’ll check to see if gcc 8 or gcc 9 segfault. If they don’t that would be a nice fix.

Looks like things work fine in clang 7 with Stan 2.20 (no need for the additional guards). As this primarily concerns PyStan 3 (where STAN_THREADS is not optional), I’m happy just dropping support for older versions of gcc on unix.

It would be nice to know whether or not gcc 8 or gcc 9 segfault. I’ll look into this.

Sounds good. Possibly older gcc versions still work, no?

Another route which may be nice to the gcc in question is to stick an instance of chainablestack into the stan services utils generate transitions function (and then also disable withe the define the automatic init of the global stack). That’s where ad is going to be used, I think. This would be a temporary hack of stan.

In any case, I will file an issue + pr to fix the static init problem…sorry for that, but this was very subtle.

I don’t think we have a solution for gcc on unix. From what I’ve seen I don’t think patching stan services will work, as the segfault happens immediately, in init_chainablestack.hpp. Clang seems to work fine. That is, your documented procedure of creating an instance of ChainableStack works. No need to do anything in services there.

I’ll test gcc 8 and 9 today.

I would patch the services and make the define I referred to earlier. The define will disable global initialization entirely and then patch the services makes everything safe as it does not rely on static initialization at all.

Sounds good! I’m happy to test the patched services.

(Also, I did verify that the segfault occurs on gcc 9.)

If the assumption holds that within each thread you create an instance of the model and then sample from that, then I would suggest you to include an instance of stan:math::ChainableStack in the model_base class.

If that assumption does not hold, then a better route could be to create these instances in the gradient functions which are part of src/stan/model/gradient.hpp in stan.

(I really hope we go in the future to the Intel TBB which will handle the initialisation of thread resources automatically)