The use of a pointer is motivated by performance reasons for the
threading case. When a TLS is used, initialization with a constant
expression at compile time is required for fast access to the
TLS. As the autodiff storage struct is non-POD, its initialization
is a dynamic expression at compile time. These dynamic expressions
are wrapped, in the TLS case, by a TLS wrapper function which slows
down its access. Using a pointer instead allows to initialize at
compile time to nullptr, which is a compile time
constant. In this case, the compiler avoids the use of a TLS
wrapper function.
So the part where it says that because Autodiffstorage is non-POD, its initialization is a dynamic expression at compile time makes intuitive sense. I’m not sure that I understand the pointer workaround though or why it works. I’m kind of a noob at this, so I was wondering if someone would be willing to explain. Perhaps I missed another part of the documentation where it was explained in greater detail. If so, please forgive my carelessness.
To understand why we ended up with the design we have now required me to use goodbolt and look at assembler code.
In short, global tls things which are non pod are accessed with some nasty wrapper code by default. The wrapper code checks for a previous initialization and then returns the existing copy or creates one…but this check is always done. Using a pointer, which is a pod, avoids all of that at the cost of having to initialize things properly in order.
Thank you for your response! It’s greatly appreciated.
My motivation for digging in is a deeply unhealthy obsession with the minor details of everything I use that probably comes from years having algorithms break unexpectedly back when I was an undergraduate research assistant (and then having to explain to profs why I was unable to meet deadlines because I couldn’t figure out why certain algorithms were failing or weren’t converging) and just an insatiable level of curiosity. I get sucked into pretty much every rabbit hole I come across to the great annoyance of people around me.
I wish I could mark two posts as being solutions, but I can’t so I will instead thank you both for your input. I will also try out godbolt because it seems really cool.
He’s not the only one. We try very hard to make sure there’s nothing in the math library that only one person understands. @rok_cesnovar was heavily involved in this and @stevebronder and @bbbales2 been heavily involved in our autodiff refactoring. I was the one who built all this stuff in the first place, and I still understand what we’re doing now pretty well.
some tutorials on initialization order in C++ and what is/isn’t guaranteed, and
some tutorials on singleton patterns and how to use that initialization order and protect your code in different situations.
Sadly, the language is undefined at the edges, so much of the defensive programming is to deal with undefined behavior in the language spec.
The reason we need thread-local storage is because our autodiff stack for reverse mode is a global static variable. Declaring the global autodiff stack storage variable as thread local makes a copy in each thread rather than sharing a single static variable across threads. As others have pointed out, the pointer is so we can be lazy (in the technical sense, not the slacker sense—this was a lot of work!) in initializing.