Keeping libraries binary-compatible with old versions is hard. Recently, GCC was in the unenviable situation of having to switch its std::string implementation.1 GCC used inline namespaces and ABI tags to minimize the extent of breakage and to ensure that old and new versions could only be combined in a safe way. Here, we'll have a look at those mitigation techniques: what they are and how they work.

First, some background: GCC used to have a copy-on-write implementation for std::string. However, C++11 does not allow this anymore because of new iterator and reference invalidation rules. So, GCC 5.1 introduced a new implementation of std::string. The new version was not binary-compatible with the old one: exchanging strings between old (pre-5.1) and new code would crash. We say the application binary interface (ABI) changed.

To understand the consequences of this, let's look at several scenarios:

  • A program uses only code compiled with GCC < 5.1: only old strings, works
  • A program uses only code compiled with GCC >= 5.1: only new strings, works
  • A program mixes code compiled with different GCC versions: both types of strings exist in the program, it could crash.

Let's dig into the last bullet point. When would it crash? Whenever "new" code accesses an "old" string or vice versa. Here are some examples where f() is compiled with an older GCC version and called from "new" code:

void f(int i);                 // (a) safe, no std::string involved
void f(const std::string& s);  // (b) will crash when f accesses s
std::string f();               // (c) will crash when the returned string is used

Given how common std::string is, such crashes would happen frequently if "old" and "new" code were combined. Unfortunately, it's surprisingly easy to end up with a program with some pre-5.1 parts and some newer ones. It's sufficient to link a "new" executable to an "old" library. Given the bad consequences, the GCC developers needed to solve this.

The solution is to change the symbol names of the GCC 5.1 std::string and all functions using it. Because the linker uses symbol names to resolve function calls into libraries, this would cause link-time errors for cases (b) and (c) while (a) would still work. Exactly the intended behavior.

How could this be done? The mechanism that converts C++ names into symbol names is called name mangling. The generated symbol names contain information about namespaces, function names, argument types, etc. Putting the new std::string into a separate namespace would change the symbol names of all functions accepting std::string as argument. But that's crazy—std::string needs to be in namespace std, right?

The solution for this is inline namespaces. All classes, functions, and templates declared in an inline namespace are automatically imported into the parent namespace. Their mangled name still references the original location, though.

Here's what GCC 5.1 does:

namespace std {
    inline namespace __cxx11 {
        template<typename _CharT, ...>  basic_string;
        typedef basic_string<char>      string;
    }
}

Looking at f(const std::string& s), what would the symbol names be when compiled with older and newer GCC versions?

Older GCCGCC 5.1
Symbol name_Z1fRKSs_Z1fRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Decoded symbol name2f(std::basic_string<char, ...> const&)f(std::__cxx11::basic_string<char, ...> const&)

Indeed, the symbol names are different and the linker would give an error if GCC 5.1 code called f(const std::string &s) from a library compiled with an older GCC version. This solves the compatibility problem for all functions taking std::string as argument.

One problem remains: case (c) from above, std::string f(). The return value type of a function is not part of its mangled name.3 Thus, the symbol name wouldn't change for functions that return std::string, which could still lead to runtime crashes.

GCC 5.1 solves this with the use of ABI tags. From the documentation:

The abi_tag attribute can be applied to a function, variable, or class declaration. It modifies the mangled name of the entity to incorporate the tag name, in order to distinguish the function or class from an earlier version with a different ABI

[...]

When a type involving an ABI tag is used as the type of a variable or return type of a function where that tag is not already present in the signature of the function, the tag is automatically applied to the variable or function.

GCC applies the attribute __abi_tag__ ("cxx11")) to the std::__cxx11 namespace. This affects all classes therein, including the new version of std::string. The ABI tag propagates to all functions that return a string and changes their symbol name.

Let's look at the symbol names for std::string f() for different compiler versions:

Older GCCGCC 5.1
Symbol name_Z1fv_Z1fB5cxx11v
Decoded symbol name2f()f[abi:cxx11]()

Again, the symbol name is different, and the "cxx11" ABI tag is applied to std::string f() with GCC 5.1. This completes the second part of GCC's solution for the std::string ABI change.

These two methods to induce symbol name changes for anything using std::string make the migration path to GCC 5.1 much easier. If the program links correctly it should work, and be safe from difficult-to-debug runtime errors. There would be more to discuss about the GCC 5.1 changes, such as how libstdc++.so still exports the old std::string implementation for compatibility. But this post has gone on too long already, so let's leave it at that :-)


1 GCC 5.1 also introduced a new version of std::list. It is handled analogously to std::string.
2 Decoded using c++filt from the binutils package.
3 The return value type doesn't need to be part of the symbol name because it doesn't get used for overload resolution.


Comments