Math library namespaces: Request For Comments

Hey all,

There’s been some offline conversation (mostly between Bob and myself) about math library namespaces and I wanted to bring the conversation out to the broader community for comments. The problem we’re noticing is that as it currently stands, the math library has a large surface area and there are many components that are public that are more like implementation details, that we would like to 1) be able to change without downstream warning and 2) put less effort into documenting. Things that fit into this category include a lot of the metaprogramming stuff in the math library, things like VectorView (or really most stuff in /meta/).

The C++ language doesn’t have an amazing solution to this, and it seems like most people in the community just designate certain namespaces as non-public, called something like detail or internal (we actually have both of these in our code already).

That said, Bob pointed out that we might have several different math library user profiles, and there’s an open question about how to categorize these and how closely to model our internal namespaces after how we think our users are categorized. One could imagine a strict hierarchy, where some users always want more of the internals than others, or you could imagine functional slices of the library relevant to different users, etc. For example, some users just want to describe and fit models using the existing functionality and some users might want to create their own distributions and even algorithms.

What do you all think is a good split? My gut instinct (being pretty new and not knowing any external users of the math library) would be to treat everything used in the Stan language as public and everything else as internal (or detail, I have no preference on the name).

-Sean

I can imagine two kinds of users of the math library. Firstly people might want to use the autodiff types, the corresponding implementations of the math functions, and the autodiff functionals to evaluate derivatives of functions written in C++. This would mostly closely correspond what shows up in the Stan language, but also included the tools necessary to use it as stand-alone C++ code. Secondly, people might want to implement their own functions/distributions with some of our metaprogams to give vectorization or otherwise efficient computation.

So from my perspective that naively suggests

1: var, fvar types, and autodiff functionals.
2: Internal autodiff code that implements the functionals.
3: Math library implementations for var, fvar.
4: Metaprograms.

1 and 3 can also be split up into var and fvar, which might end up being useful when we start trying to control when we admit first-order and higher-order autodiff (for example, maybe pragma’ing on “using namespace (f)var;” lines).

All 4 (or 6) of these categorizations seem pretty flat and generally fall into “user-facing code” or “internal development tools” buckets. I’d be happy just having a bunch of namespaces mirroring the directory organization that we already have, separating the library into these basic components.

I can not pretend that this is a good practice because it is a personal trick but more than one year ago I left the Internal:: or Detail:: namespaces for something much more simpler. I simply add __internal__ in front of stuff relevant to implementation details:

Examples:

auto __internal__foo() noexcept
{
}

auto foo() noexecept
{
return __internal__foo();
}

Another one:

struct __internal__Slice_Tag
{
};
template <typename… OBJ>
struct All_Are_Slices :
__internal__All_ExactTypes_Inherit_Tag<__internal__Slice_Tag, OBJ…>
{
};

At first glance this can seem quite radical, but my return of experience is good. The main advantages are:

  1. all operations like search, switch public / internal are much more easier than their counter parts using namespaces because in most of the cases a simple “text” or “regex” is sufficent (no need for C++ semantic).
  2. you are sure to do not introduce “Argument-dependent lookup” bugs or compile-time errors when switching public/internal as you do not modify your C++ namespace hierarchy.
  3. this notation is straightforward to read/understand (no need to know that you are in a namespace Internal {} block or that there is a “using Internal”…)

Comments:

  1. A tricky bug from ADL (point 2) was my initial motivation for the “__internal__” notation
  2. I still use namespaces to split logical parts, I only banished the internal:: or detail:: ones

Vincent

nb: 3 attempts to have a post with the underscores: _ ↔ &#95; code !

I’m all for two namespaces (math and internal) for marking what is API and what can change without notice. If there are more than that I have to keep looking up where stuff lives. I’m intrigued by Michael’s idea of dicing it more finely but I don’t know that it would help anyone unless we can really make these things modular and independently documented.

We probably wouldn’t bring in all of math::internal at once.

What kind of ADL issues are you talking about?

That’s what I was suggesting. That for a user defining a new function, a lot of the metaprogramming is going to be required.

At the very highest level, it’s just all the functions. Those are what’s documented in the Stan manual. With those, you can do autodiff with the functionals and never worry about the var and fvar types (at least in theory—in practice, you’d want to know what was going on).

I will try to write down the complete story in a note this week and put the link here once I have completed it. In the example I want to show the code behavior is different (without compile-time error) according to some parts of the code belong or not in an internal:: namespace.

I finally wrote my note about internal:: namespace. I put it on GitHub. It is quite long and I do not know if it worth reading. Maybe a better alternative is Andrzej’s C++ blog post. My main point is that many namespaces increase code complexity and requires more attention concerning side effect. An alternative for internal:: I currently use is a simple __internal__ prefix. I do not pretend it is the way to go, just wanted to say that I am happy with it, explain why and share this piece of information.

  • Vincent

I didn’t understand the argument not to use an internal namespace. Of course that user “full of goodwill” you mention will break argument-dependent lookup (ADL)—that just follows from how ADL works. There are lots of ways to break C++ code, so I don’t quite see the point here. The whole point of ADL is to be able to write code that can deal with new namespaces and new implementations without having to worry about them ahead of time—if you lock down a qualified function call, it won’t work.

Technically those double underscores are illegal identiiers, but we’ve never had problems with them—we use something like that in the generated Stan code because all the compilers seem OK with our non-compliant code, so it hasn’t been a priority.

Thanks for reading because as I also presented a context the text is quite long. My main argument is that: the goal of introducing an internal namespace is to clearly identify and split apart 1/ what is implementation detail, 2/ what can be safely reused (the interface to document…). This is clear at the definition point: it belongs or not in an internal {} block. But now if you read code elsewhere the identifier has not always its internal:: qualification (because of ADL for instance). Hence you are not able to immediately identify what relies on unstable internal details from the stable interface. Thus IMHO the initial goal is not fulfilled. As this mechanism introduces an extra layer of complexity with side effects I am just wondering if it worth it. I personaly chose a simpler prefix approach, like __internal__ that always clearly show you what is internal and which is without any side effect nor extra complexity. That is what I wanted to say.

I was aware that according to the standard (§17.6.4.3.2) _,_ “is reserved to the implementation for any use.”. As you remark it works but that is perhaps not my best idea and I certainly must use a prefix like internal_ or impl_.

Thanks for the summary. I understand what you’re worried about now and agree it could be a problem if you have unqualified internal functions used with the ADL.

Enforcing the convention that we explicitly qualify internal:: on all internal function calls would have the same effect as adding __internal__ to all of the function names, but it wouldn’t allow internal:: to be omitted accidentally.

I think we can cut off the whole concern by never putting functions that would be invoked by ADL in the internal namespace. Do you have an example in Stan where you think it’s going to happen? I’m pretty sure we’re exclusively using ADL for autodiff types. So the ADL will look into stan::math because that’s where the autodiff scalar types stan::math::var and stan::math::fvar are defined. Will that also go down into stan::math::internal if we have things there? I would very much be against defining any classes in the internal namespace that get exposed externally and thus wouldn’t be used in a fully-qualified form internally.

Thank you for your comment, yes that was my concern.

To conclude about my opinion (but that is no more than an opinion).
If “internal” must tag implementation details with a red flag, then I think the prefix works better with its clear intention.
In the same time I understand the reluctance to use this approach like it is more like a C stuff than a C++ mainstream practice.

That said I agree with all your analysis. I do not know Stan code in detail but each time I had a look I found it very well done.

I can add that I am not a big fan of many nested namespaces too. Primary role of namespace is to avoid name conflict. Usually projects already have an implicit but natural organization in term of file & directory structure. IMHO too many nested namespaces mainly brings a costly redundancy with the risk of divergence.

-Vincent

As you noted, namespaces are also for controlling argument-dependent lookup. I think you’re right about the __internal__ thing being more like C. I spent two years coding C and what I like about C++ is that it lets you avoid a lot of the pain of C (while introducing a whole new level of pain with templates).

Starting from Java, Daniel and I included more namespaces than was usual in C++. We’ve backed off that recently when we refactored the math library into its own module.