GCC compatibility: inline namespaces and ABI tags

Keeping libraries binary-compatible with old versions is hard. Recently, GCC was in the unenviable situation of having to switch its std::string implementation.1 GCC used inline namespaces and ABI tags to minimize the extent of breakage and to ensure that old and new versions could only be combined in a safe way. Here, we'll have a look at those mitigation techniques: what they are and how they work.

First, some background: GCC used to have a copy-on-write implementation for std::string. However, C++11 does not allow this anymore because of new iterator and reference invalidation rules. So, GCC 5.1 introduced a new implementation of std::string. The new version was not binary-compatible with the old one: exchanging strings between old (pre-5.1) and new code would crash. We say the application binary interface (ABI) changed.

To understand the consequences of this, let's look at several scenarios:

  • A program uses only code compiled with GCC < 5.1: only old strings, works
  • A program uses only code compiled with GCC >= 5.1: only new strings, works
  • A program mixes code compiled with different GCC versions: both types of strings exist in the program, it could crash.

Let's dig into the last bullet point. When would it crash? Whenever "new" code accesses an "old" string or vice versa. Here are some examples where f() is compiled with an older GCC version and called from "new" code:

void f(int i);                 // (a) safe, no std::string involved
void f(const std::string& s);  // (b) will crash when f accesses s
std::string f();               // (c) will crash when the returned string is used

Given how common std::string is, such crashes would happen frequently if "old" and "new" code were combined. Unfortunately, it's surprisingly easy to end up with a program with some pre-5.1 parts and some newer ones. It's sufficient to link a "new" executable to an "old" library. Given the bad consequences, the GCC developers needed to solve this.

The solution is to change the symbol names of the GCC 5.1 std::string and all functions using it. Because the linker uses symbol names to resolve function calls into libraries, this would cause link-time errors for cases (b) and (c) while (a) would still work. Exactly the intended behavior.

How could this be done? The mechanism that converts C++ names into symbol names is called name mangling. The generated symbol names contain information about namespaces, function names, argument types, etc. Putting the new std::string into a separate namespace would change the symbol names of all functions accepting std::string as argument. But that's crazy—std::string needs to be in namespace std, right?

The solution for this is inline namespaces. All classes, functions, and templates declared in an inline namespace are automatically imported into the parent namespace. Their mangled name still references the original location, though.

Here's what GCC 5.1 does:

namespace std {
    inline namespace __cxx11 {
        template<typename _CharT, ...>  basic_string;
        typedef basic_string<char>      string;
    }
}

Looking at f(const std::string& s), what would the symbol names be when compiled with older and newer GCC versions?

Older GCCGCC 5.1
Symbol name_Z1fRKSs_Z1fRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Decoded symbol name2f(std::basic_string<char, ...> const&)f(std::__cxx11::basic_string<char, ...> const&)

Indeed, the symbol names are different and the linker would give an error if GCC 5.1 code called f(const std::string &s) from a library compiled with an older GCC version. This solves the compatibility problem for all functions taking std::string as argument.

One problem remains: case (c) from above, std::string f(). The return value type of a function is not part of its mangled name.3 Thus, the symbol name wouldn't change for functions that return std::string, which could still lead to runtime crashes.

GCC 5.1 solves this with the use of ABI tags. From the documentation:

The abi_tag attribute can be applied to a function, variable, or class declaration. It modifies the mangled name of the entity to incorporate the tag name, in order to distinguish the function or class from an earlier version with a different ABI

[...]

When a type involving an ABI tag is used as the type of a variable or return type of a function where that tag is not already present in the signature of the function, the tag is automatically applied to the variable or function.

GCC applies the attribute __abi_tag__ ("cxx11")) to the std::__cxx11 namespace. This affects all classes therein, including the new version of std::string. The ABI tag propagates to all functions that return a string and changes their symbol name.

Let's look at the symbol names for std::string f() for different compiler versions:

Older GCCGCC 5.1
Symbol name_Z1fv_Z1fB5cxx11v
Decoded symbol name2f()f[abi:cxx11]()

Again, the symbol name is different, and the "cxx11" ABI tag is applied to std::string f() with GCC 5.1. This completes the second part of GCC's solution for the std::string ABI change.

These two methods to induce symbol name changes for anything using std::string make the migration path to GCC 5.1 much easier. If the program links correctly it should work, and be safe from difficult-to-debug runtime errors. There would be more to discuss about the GCC 5.1 changes, such as how libstdc++.so still exports the old std::string implementation for compatibility. But this post has gone on too long already, so let's leave it at that :-)


1 GCC 5.1 also introduced a new version of std::list. It is handled analogously to std::string.
2 Decoded using c++filt from the binutils package.
3 The return value type doesn't need to be part of the symbol name because it doesn't get used for overload resolution.



Building External Libraries with Boost Build

I recently ran into the problem of having to use an external library not available for our distribution. Of course, it would be possible to build Debian or RPM packages, deploy them, and then link against those. However, building sanely-behaving library packages is tricky.

The easier way is to include the source distribution in a sub-folder of the project and integrate it into our build system, Boost Build. It seemed prohibitive to create Boost Build rules to build the library from scratch (from *.cpp to *.a/*.so) because the library had fairly complex build requirements. The way to go was to leverage the library's own build system.

Easier said than done. I only arrived at a fully integrated solution after seeking help on the Boost Build Mailing List (thanks Steven, Vladimir). This is the core of it:

path-constant LIBSQUARE_DIR : . ;

make libsquare.a
    : [ glob-tree *.c *.cpp *.h : generated-file.c gen-*.cpp ]
    : @build-libsquare
    ;
actions build-libsquare
{
    ( cd $(LIBSQUARE_DIR) && make )
    cp $(LIBSQUARE_DIR)/libsquare.a $(<)
}

alias libsquare
    : libsquare.a                                        # sources
    :                                                    # requirements
    :                                                    # default-build
    : <include>$(LIBSQUARE_DIR) <dependency>libsquare.a  # usage-requirements
    ;

The first line defines a filesystem constant for the library directory, so the build process becomes independent of the working directory.

The next few lines tell Boost Build about a library called libsquare.a, which is built by calling make. The cp command copies the library to the target location expected by Boost Build ($(<)). glob-tree defines source files that should trigger library rebuilds when changed, taking care to exclude files generated by the build process itself.

The alias command defines a Boost Build target to link against the library. The real magic is the <dependency>libsquare.a directive: it is required because the library build may produce header files used by client code. Thus, build-libsquare must run before compiling any dependent C++ files. Adding libsquare.a to an executable's dependency list won't quite enforce this: it only builds libsquare.a in time for the linking step (*.oexecutable), but not for the compilation (*.cpp*.o). In contrast, the <dependency>libsquare.a directive propagates to all dependent build steps, including the compilation, and induces the required dependencies.

I created an example project on GitHub that demonstrates this. It builds a simple executable relying on a library built via make. The process is fully integrated in the Boost Build framework and triggered by a single call to bjam (or bb2).

If you need something along these lines give this solution a try!


Remarks on enable_shared_from_this

std::enable_shared_from_this is a template base class that allows derived classes to get a std::shared_ptr to themselves. This can be handy, and it's not something that C++ classes can normally do. Calling std::shared_ptr<T>(this) is not an option as it creates a new shared pointer independent of the existing one, which leads to double destruction.

The caveat is that before calling the shared_from_this() member function, a shared_ptr to the object must already exist, otherwise undefined behavior results. In other words, the object must already be managed by a shared pointer.

This presents an interesting issue. When using this technique, there are member functions (those that rely on shared_from_this()) that can only be called if the object is managed via a shared_ptr. This is a rather subtle requirement: the compiler won't enforce it. If violated, the object may even work at runtime until a problematic code path is executed, which may happen rarely – a nice little trap. At the very least, this should be prominently mentioned in the class documentation. But frankly, relying on the documentation to communicate such a subtle issue sounds wrong.

The correct solution is to let the compiler enforce it. Make the constructors private and provide a static factory method that returns a shared_ptr to a new instance. Take care to delete the copy constructor and the assignment operator to prevent anyone from obtaining non-shared-pointer-managed instances this way.

Another point worth mentioning about enable_shared_from_this is that the member functions it provides, shared_from_this() and weak_from_this(), are public. Not only the object itself can retrieve it's owning shared_ptr, everyone else can too. Whether this is desirable is essentially an API design question and depends on the context. To restrict access to these functions, use private inheritance.

Overall, enable_shared_from_this is an interesting tool, if a bit situational. However, it requires care to use safely, in a way that prevents derived classes from being used incorrectly.


Boost Range Highlights

Last week, I presented Boost Range for Humans: documentation for the Boost Range library with concrete code examples. This week we'll talk about some of the cool features in Boost Range.

irange

boost::irange() is the C++ equivalent to Python's range() function. It returns a range object containing an arithmetic series of numbers:

boost::irange(4, 10)    -> {4, 5, 6, 7, 8, 9}
boost::irange(4, 10, 2) -> {4, 6, 8}

Together with indexed() (see below), it serves as a range-based alternative to the classic C for loop.

combine

boost::combine() takes two or more input ranges and creates a zipped range – a range of tuples where each tuple contains corresponding elements from each input range.

The input ranges must have equal size.

std::string str = "abcde";
std::vector<int> vec = {1, 2, 3, 4, 5};
for (const auto & zipped : boost::combine(str, vec)) {
    char c; int i;
    boost::tie(c, i) = zipped;

    // Iterates over the pairs ('a', 1), ('b', 2), ...
}

Algorithms

Most if not all algorithms from the C++ standard library that apply to containers (via begin/end iterator pairs) have been wrapped in Boost Range. Examples include copy(), remove(), sort(), count(), find_if().

Adaptors

Adaptors are among the most interesting concepts Boost Range has to offer.

There are generally two ways to use adaptors, either via function syntax or via a pipe syntax. While the former is handy for simple cases, while the latter allows chaining data transformations into an easy-to-read pipeline.

bool is_even(int n) { return n % 2 == 0; }
const std::vector<int> vec = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

// Function-style call:
for (int i : boost::adaptors::filter(vec, is_even)) { ... }

// Pipe-style call:
for (int i : vec | boost::adaptors::filtered(is_even)) { ... }

To see the power of the latter syntax, consider a transformation pipeline:

int square(int n) { return n*n; }
std::map<int, int> input_map = { ... };

using boost::adaptors;
auto result = input_map | map_values
                        | filtered(is_even)
                        | transformed(square);

indexed

The boost::adaptors::indexed() adapter warrants special mention: it is analogous to Python's enumerate(). Given a Range, it gives access to the elements as well as their index within the range. Boost 1.56 or higher is required for this to work properly.

accumulate

boost::accumulate() by default sums all the items in an input range, but other reduction functions can be supplied as well. Together with range adapters, this makes map-reduce pipelines easy to write.

std::vector<int> vec = {1, 2, 3, 4, 5};
int product = boost::accumulate(vec, 1, std::multiplies<int>());

as_literal

boost::as_literal() may be less a highlight and more of a crutch, but it bears mentioning still. Boost Range functions accept a wide variety of types, among them strings. C++ style strings (std::string) always work as expected, but with character arrays (char[]), there is an ambiguity as to whether the argument should be interpreted as array (including the terminal '\0' character) or as string (excluding the terminator).

By default, Boost Range treats them as arrays, which they are, after all. In practice, this is often a pitfall for newcomers. If any string-related range operations don't work as expected, this is a common reason.

To force the library to treat character arrays as strings, they can be wrapped in an as_literal() call. Alternatively, the C strings can be cast to std::string as well.

Summary

There are several interesting aspects about Boost Range. It plays very well with C++11's range-based for loops and makes code operating on containers much easier to write and (most importantly) read. In addition, it makes it possible to lay out data processing pipelines a lot more clearly.

Container iteration and modification becomes as easy as it is in modern scripting languages, which is a huge, huge step for the C++ language.

Let's hope that C++17 brings similar capabilities in the standard library. Until then, Boost Range is the way to go, so check out the docs and try it yourself.


Boost Range for Humans

Boost Range encapsulates the common C++ pattern of passing begin/end iterators around by combining them into into a single range object. It makes code that operates on containers much more readable. One wonders why such functionality was not included in the C++ standard library in the first place, and indeed, similar ideas could be added to C++17, see N4128 and Ivan Cukic's Meeting C++ presentation. In my opinion, Boost Range is something that every C++ programmer should know about.

The library is reasonably well documented, but I was often missing concrete code examples and an explicit mention what headers are required for which function. Since this presumably happens to other people as well, I invested the time to change that situation.

Thus, I present Boost Range for Humans. It contains working example code for every function in Boost Range, along with required headers and links to the official documentation and the latest source code. I hope it will make Boost Range more accessible and furthers its adoption.

Next week, we'll look into some of the highlights of what Boost Range can offer.


Debugging riddle of the day

One of our services failed to start on a test system (Ubuntu 12.04 on amd64). The stdout/stderr log streams contained only the string “Permission denied” – less than helpful. strace showed that the service tried to create a file under /run, which it doesn't have write permissions to. This caused the it to bail out:

open("/run/some_service", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = -1 EACCES (Permission denied)

Grepping the source code and configuration files for /run didn't turn up anything that could explain this open() call. Debugging with gdb gave further hints:

Breakpoint 2, 0x00007ffff73e3ea0 in open64 () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff73e3ea0 in open64 () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7bd69bf in shm_open () from /lib/x86_64-linux-gnu/librt.so.1
#2  0x0000000000400948 in daemonize () at service.cpp:93
#3  0x00000000004009ac in main () at main.cpp:24
(gdb) p (char*)$rdi
$1 = 0x7fffffffe550 "/run/some_service"
(gdb) frame 2
#2  0x0000000000400948 in daemonize () at service.cpp:93
9           int fd = shm_open(fname.c_str(), O_RDWR | O_CREAT, 0644);
(gdb) p fname
$2 = {...., _M_p = 0x602028 "/some_service"}}

The open("/run/some_service", ...) was caused by an shm_open("/some_service", ...).

This code is working on other machines, why does it fail on this particular one? Can you figure it out? Bonus points if you can explain why it is trying to access /run and not some other directory. You might find the shm_open() man page and source code helpful.

I'll be waiting for you.

Caaaaaaaaat!

The solution is pretty evident after examining the Linux version of shm_open(). By default, it tries to create shared memory files under /dev/shm. If that doesn't exist, it will pick the first tmpfs mount point from /proc/mounts.

In Ubuntu 12.04, /dev/shm is a symlink to /run/shm. On this machine the symlink was missing, which caused shm_open() to go hunting for a tmpfs filesystem, and /run happened to be the first one in /proc/mounts.

Re-creating the symlink solved the problem. Why it was missing in the first place is still unclear. In the aftermath, we're also improving the error messages in this part of the code to make such issues easier to diagnose.


Mixing C runtime library versions

When compiling code with Microsoft's Visual C/C++ compiler, a dependency on Microsoft's C runtime library (CRT) is introduced. The CRT can be linked either statically or dynamically, and comes in several versions (different version numbers as well as in debug and release variants).

Complications arise when libraries linked into the same program use different CRTs. This happens if they were compiled with different compiler versions, or with different compiler flags (static/dynamic CRT linkage or release/debug switches). In theory, this could be made to work, but in practice it is asking for trouble:

  • If the CRT versions differ (either version number or debug/release flag), you can't reliably share objects generated by CRT A with any code that uses CRT B. The reason is that the two CRTs may have a different memory layout (structure layout) for the that object. The memory location that CRT A wrote the object size to might be interpreted by CRT B as a pointer, leading to a crash when CRT B tries to access that memory.
  • Even if the same CRT version is included twice (once statically, once dynamically linked), they won't share a heap. Both CRTs track the memory they allocate individually, but they don't know anything about objects allocated by the other CRT. If CRT A tries to free memory allocated by CRT B, that causes heap corruption as the two CRTs trample on each others feet. While you can freely share objects between CRTs, you have to be careful whenever memory is allocated or freed. This can sometimes be managed when writing C code, but is very hard to do correctly in C++ (where e.g. pushing to a vector can cause a reallocation of its internal buffer).

Accordingly, having multiple CRTs in the same process is fragile at best. When mixing CRTs, there are no tools to check whether objects are shared in a way that's problematic, and manual tracking is subtle, easy to get wrong, and unreliable. Mistakes will lead to difficult-to-diagnose bugs and intermittent crashes, generally at the most inconvenient times.

To keep your sanity, ensure that all code going into your program uses the same CRT.1 Consequently, all program code, as well as all libraries, need to be compiled from scratch using the same runtime library options (/MD or /MT). Pre-compiled libraries are a major headache, because they force the use of a specific compiler version to get matching CRT version requirements. If multiple pre-compiled libraries use different CRT versions, there may not be any viable solution at all.

This situation will improve with the runtime library refactoring in VS2015, which promises CRT compatibility to subsequent compiler versions. Thus, this inconvenience should mostly be solved in the future.


1 Dependency Walker can be used to list all dynamically linked CRT versions. I'm not sure whether that can be done for statically linked CRTs, I generally avoid those.


Profiling

Profiling is hard. Measuring the right metric and correctly interpreting the obtained data can be difficult even for relatively simple programs.

For performance optimization, I'm a big fan of the poor man's profiler: run the binary to analyze under a debugger, periodically stop the execution, get a backtrace and continue. After doing this a few times, the hotspots will become apparent. This works amazingly well in practice and gives a reliable picture of where time is spent, without the danger of skewed results from instrumentation overhead.

Sometimes it's nice to get a more fine-grained view. That is, not only find the hotspot, but get an overview how much time is spent where. That's where 'real' profilers come in handy.

Under Windows, I like the built-in "Event Tracing for Windows" (ETW), which produces files that can be analyzed with Xperf/Windows Performance Analyzer. It is a really well thought out system, and the Xperf UI is amazing in the analyzing abilities that it offers. Probably the best place to start reading up on this is ETW Central.

Under Linux, I haven't found a profiler I can really recommend, yet. gprof and sprof are both ancient and have severe limitations. OProfile may be nice, but I haven't had a chance to use it yet, as it wasn't available for my Ubuntu LTS release.

I have used Callgrind from the Valgrind toolkit in combination with the KCachegrind GUI analyzer. I typically invoke it like this:

valgrind --tool=callgrind --callgrind-out-file=callgrind-cpu.out ./program-to-profile
kcachegrind callgrind-cpu.out

Callgrind works by instrumenting the binary under test. It slows down program execution, often by a factor of 10. Further, it only measures CPU time, so sleeping times are not included. This makes it unsuitable for programs that wait a significant amount of time for network or disk operations to complete. Despite these drawbacks, it's pretty handy if CPU time is all that you're interested in.

If blocking times are important (as they are for so many modern applications - we generally spend less time computing and more time communicating), gperftools is a decent choice. It includes a CPU profiler that can be run in real-time sampling mode, and the results can viewed in KCachegrind. It is recommended to compile libprofiler.so into the binary to analyze, but using LD_PRELOAD works decently well:

CPUPROFILE_REALTIME=1 CPUPROFILE=prof.out LD_PRELOAD=/usr/lib/libprofiler.so ./program-to-profile
google-pprof --callgrind ./program_to_profile prof.out > callgrind-wallclock.out
kcachegrind callgrind-wallclock.out

If it works, this gives a good overall profile of the application. Unfortunately, it sometimes fails: on amd64, there are sporadic crashes from within libunwind. It's possible to just ignore those and rerun the profile, at least interesting data is obtained 50% of the time.

The more serious problem is that CPUPROFILE_REALTIME=1 causes gperftools to use SIGALARM internally, conflicting with any applications that want to use that signal for themselves. Looking at the profiler source code, it should be possible to work around this limitation with the undocumented CPUPROFILE_PER_THREAD_TIMERS and CPUPROFILE_TIMER_SIGNAL environment variables, but I couldn't get that to work yet.

You'd think that perf has something to offer in this area as well. Indeed, it has a CPU profiling mode (with nice flamegraph visualizations) and a sleeping time profiling mode, but I couldn't find a way to combine the two to get a real-time profile.

Overall, there still seems to be room for a good, reliable real-time sampling profiler under Linux. If I'm missing something, please let me know!