Building External Libraries with Boost Build

I recently ran into the problem of having to use an external library not available for our distribution. Of course, it would be possible to build Debian or RPM packages, deploy them, and then link against those. However, building sanely-behaving library packages is tricky.

The easier way is to include the source distribution in a sub-folder of the project and integrate it into our build system, Boost Build. It seemed prohibitive to create Boost Build rules to build the library from scratch (from *.cpp to *.a/*.so) because the library had fairly complex build requirements. The way to go was to leverage the library's own build system.

Easier said than done. I only arrived at a fully integrated solution after seeking help on the Boost Build Mailing List (thanks Steven, Vladimir). This is the core of it:

path-constant LIBSQUARE_DIR : . ;

make libsquare.a
    : [ glob-tree *.c *.cpp *.h : generated-file.c gen-*.cpp ]
    : @build-libsquare
    ;
actions build-libsquare
{
    ( cd $(LIBSQUARE_DIR) && make )
    cp $(LIBSQUARE_DIR)/libsquare.a $(<)
}

alias libsquare
    : libsquare.a                                        # sources
    :                                                    # requirements
    :                                                    # default-build
    : <include>$(LIBSQUARE_DIR) <dependency>libsquare.a  # usage-requirements
    ;

The first line defines a filesystem constant for the library directory, so the build process becomes independent of the working directory.

The next few lines tell Boost Build about a library called libsquare.a, which is built by calling make. The cp command copies the library to the target location expected by Boost Build ($(<)). glob-tree defines source files that should trigger library rebuilds when changed, taking care to exclude files generated by the build process itself.

The alias command defines a Boost Build target to link against the library. The real magic is the <dependency>libsquare.a directive: it is required because the library build may produce header files used by client code. Thus, build-libsquare must run before compiling any dependent C++ files. Adding libsquare.a to an executable's dependency list won't quite enforce this: it only builds libsquare.a in time for the linking step (*.oexecutable), but not for the compilation (*.cpp*.o). In contrast, the <dependency>libsquare.a directive propagates to all dependent build steps, including the compilation, and induces the required dependencies.

I created an example project on GitHub that demonstrates this. It builds a simple executable relying on a library built via make. The process is fully integrated in the Boost Build framework and triggered by a single call to bjam (or bb2).

If you need something along these lines give this solution a try!

Working with angles (is surprisingly hard)

Due to the periodic nature of angles, especially the discontinuity at 2π / 360°, working with them presents some subtle issues. For example, the angles 5° and 355° are "close" to each other, but are numerically quite different. In a geographic (GIS) context, this often surfaces when working with geometries spanning Earth's date-line at -180°/+180° longitude, which often requires multiple code paths to obtain the desired result.

I've recently run into the problem of computing averages and differences of angles. The difference should increase linearly, it should be 0 if they are equal, negative if the second angle lies to one side of the first, positive if it's on the other side (whether that's left or right depends on whether the angles increase in the clockwise direction or in counter-clockwise direction). Getting that right and coming up with test cases to prove that it's correct was quite interesting. As Bishop notes in Pattern Recognition and Machine Learning, it's often simpler to perform operations on angles in a 2D (x, y) space and then back-transform to angles using the atan2() function. I've used that for the averaging function; the difference is calculated using modulo arithmetic.

Here's the Python version of the two functions:

def average_angles(angles):
    """Average (mean) of angles

    Return the average of an input sequence of angles. The result is between
    ``0`` and ``2 * math.pi``.
    If the average is not defined (e.g. ``average_angles([0, math.pi]))``,
    a ``ValueError`` is raised.
    """

    x = sum(math.cos(a) for a in angles)
    y = sum(math.sin(a) for a in angles)

    if x == 0 and y == 0:
        raise ValueError(
            "The angle average of the inputs is undefined: %r" % angles)

    # To get outputs from -pi to +pi, delete everything but math.atan2() here.
    return math.fmod(math.atan2(y, x) + 2 * math.pi, 2 * math.pi)


def subtract_angles(lhs, rhs):
    """Return the signed difference between angles lhs and rhs

    Return ``(lhs - rhs)``, the value will be within ``[-math.pi, math.pi)``.
    Both ``lhs`` and ``rhs`` may either be zero-based (within
    ``[0, 2*math.pi]``), or ``-pi``-based (within ``[-math.pi, math.pi]``).
    """

    return math.fmod((lhs - rhs) + math.pi * 3, 2 * math.pi) - math.pi

The code, along with test cases can also be found in this GitHub Gist. Translation of these functions to other languages should be straight-forward, sin()/cos()/fmod()/atan2() are pretty ubiquitous.

Resource management with Python

There should be one – and preferably only one – obvious way to do it.

There are multiple ways to manage resources with Python, but only one of them is save, reliable and Pythonic.

Before we dive in, let's examine what resources can mean in this context. The most obvious examples are open files, but the concept is broader: it includes locked mutexes, started client processes, or a temporary directory change using os.chdir(). The common theme is that all of these require some sort of cleanup that must reliably be executed in the future. The file must be closed, the mutex unlocked, the process terminated, and the current directory must be changed back.

So the core question is: how to ensure that this cleanup really happens?

Failed solutions

Manually calling the cleanup function at the end of a code block is the most obvious solution:

f = open('file.txt', 'w')
do_something(f)
f.close()

The problem with this is that f.close() will never be executed if do_something(f) throws an exception. So we'll need a better solution.

C++ programmers see this and try to apply the C++ solution: RAII, where resources are acquired in an object's constructor and released in the destructor:

class MyFile(object):
    def __init__(self, fname):
        self.f = open(fname, 'w')

    def __del__(self):
        self.f.close()

my_f = MyFile('file.txt')
do_something(my_f.f)
# my_f.__del__() automatically called once my_f goes out of scope

Apart from being verbose and a bit un-Pythonic, it's also not necessarily correct. __del__() is only called once the object's refcount reaches zero, which can be prevented by reference cycles or leaked references. Additionally, until Python 3.4 some __del__() methods were not called during interpreter shutdown.

A workable solution

The way to ensure that cleanup code is called in the face of exceptions is the try ... finally construct:

f = open('file.txt', 'w')
try:
    do_something(f)
finally:
    f.close()

In contrast to the previous two solutions, this ensures that the file is closed no matter what (short of an interpreter crash). It's a bit unwieldy, especially when you think about try ... finally statements sprinkled all over a large code base. Fortunately, Python provides a better way.

The correct solution™

The Pythonic solution is to use the with statement:

with open('file.txt', 'w') as f:
    do_something(f)

It is concise and correct even if do_something(f) raises an exception. Nearly all built-in classes that manage resources can be used in this way.

Under the covers, this functionality is implemented using objects known as context managers, which provide __enter__() and __exit__() methods that are called at the beginning and end of the with block. While it's possible to write such classes manually, an easier way is to use the contextlib.contextmanager decorator.

from contextlib import contextmanager

@contextmanager
def managed_resource(name):
    r = acquire_resource(name)
    try:
        yield r
    finally:
        release_resource(r)

with managed_resource('file.txt') as r:
    do_something(r)

The contextmanager decorator turns a generator function (a function with a yield statement) into a context manager. This way it is possible to make arbitrary code compatible with the with statement in just a few lines of Python.

Note that try ... finally is used as a building block here. In contrast to the previous solution, it is hidden away in a utility resource manager function, and doesn't clutter the main program flow, which is nice.

If the client code doesn't need to obtain an explicit reference to the resource, things are even simpler:

@contextmanager
def managed_resource(name):
    r = acquire_resource(name)
    try:
        yield
    finally:
        release_resource(r)

with managed_resource('file.txt'):
    do_something()

Sometimes the argument comes up that this makes it harder to use those resources in interactive Python sessions – you can't wrap your whole session in a gigantic with block, after all. The solution is simple: just call __enter__() on the context manager manually to obtain the resource:

cm_r = managed_resource('file.txt')
r = cm_r.__enter__()
# Work with r...
cm_r.__exit__(None, None, None)

The __exit__() method takes three arguments, passing None here is fine (these are used to pass exception information, where applicable). Another option in interactive sessions is to not call __exit__() at all, if you can live with the consequences.

Wrap Up

Concise, correct, Pythonic. There is no reason to ever manage resources in any other way in Python. If you aren't using it yet - start now!

Remarks on enable_shared_from_this

std::enable_shared_from_this is a template base class that allows derived classes to get a std::shared_ptr to themselves. This can be handy, and it's not something that C++ classes can normally do. Calling std::shared_ptr<T>(this) is not an option as it creates a new shared pointer independent of the existing one, which leads to double destruction.

The caveat is that before calling the shared_from_this() member function, a shared_ptr to the object must already exist, otherwise undefined behavior results. In other words, the object must already be managed by a shared pointer.

This presents an interesting issue. When using this technique, there are member functions (those that rely on shared_from_this()) that can only be called if the object is managed via a shared_ptr. This is a rather subtle requirement: the compiler won't enforce it. If violated, the object may even work at runtime until a problematic code path is executed, which may happen rarely – a nice little trap. At the very least, this should be prominently mentioned in the class documentation. But frankly, relying on the documentation to communicate such a subtle issue sounds wrong.

The correct solution is to let the compiler enforce it. Make the constructors private and provide a static factory method that returns a shared_ptr to a new instance. Take care to delete the copy constructor and the assignment operator to prevent anyone from obtaining non-shared-pointer-managed instances this way.

Another point worth mentioning about enable_shared_from_this is that the member functions it provides, shared_from_this() and weak_from_this(), are public. Not only the object itself can retrieve it's owning shared_ptr, everyone else can too. Whether this is desirable is essentially an API design question and depends on the context. To restrict access to these functions, use private inheritance.

Overall, enable_shared_from_this is an interesting tool, if a bit situational. However, it requires care to use safely, in a way that prevents derived classes from being used incorrectly.

published May 21, 2016
tags c++