The Kelly Criterion: Comparison with Expected Values

This is the final post in a four-part series exploring the Kelly criterion:

In a previous post, we looked at the Kelly formula, which maximizes earnings in a series of gambles where winnings are constantly re-invested. Is this equivalent to maximizing the expected return in each game? It turns out that the answer is "no". In this post we'll look into the reasons for this and discover the pitfalls of expected values.

We will look at the same game as in the previous post:

$$ \frac{V_1}{V_0} = (1+lr_W)^W(1+lr_L)^{(1-W)} $$

with the variables:

  • \(V_0, V_1\): the available money before and after the first round
  • \(l\): fraction of available money to bet in each round (the variable to optimize)
  • \(r_W, r_L\): return on win and loss, 0.4 and -1 in our example (i.e. 40% of wager awarded on win, otherwise 100% of wager lost)
  • \(W\): Random variable describing our chances to win; valued \(1\) with \(p=0.8\), \(0\) with \(p=0.2\)

The Kelly formula obtained from maximizing \(\log V_1/V_0\) tells us to invest 30% of our capital in such a gamble. Let's see what the result is if we maximize the expected value \(E[V_1/V_0]\) instead.

This is trivial by hand, but we'll use SymPy, because we can:

import sympy as sp
import sympy.stats as ss
sp.init_printing()

l = sp.symbols('l')             # Define the symbol/variable l
W = ss.Bernoulli('W', 0.8)      # Random variable, 1 with p=0.8, else 0
def f1(W):                      # Define f1 = V_1/V_0
    return (1 + 0.4*l)**W * (1 - l)**(1-W)
ss.E(f1(W))                     # Calculate the expected value

Evaluating this gives:

$$ E[\frac{V_1}{V_0}] = 1 + 0.12l $$

Uh-hum... so the expected return has no maximum, but grows linearly with increasing \(l\). Essentially, this approach advises that you should bet all your money, and more if you can borrow it for negligible interest rates.

Could the problem be that we only look at a single round? Let's examine the expected return after playing 10 rounds:

# Note: We can not do f10 = f1(W)**10, since we need independent samples
W_list = [ss.Bernoulli('W_%d' % i, 0.8) for i in range(10)]
f10 = sp.prod([f1(Wi) for Wi in W_list])
sp.expand(ss.E(f10))
$$ E[\frac{V_{10}}{V_0}] = 6 \cdot 10^{-10} l^{10} + 5 \cdot 10^{-8} l^9 + ... + 1.2 l + 1.0 $$

All the coefficients of the polynomial are positive, there are no maxima for \(l >= 0\). What's going on?

Time to dig deeper. Let's say we bet all of our money each round. If we lose just once, all of our money is gone. After 10 rounds playing this strategy, the probability of total loss is:

$$ p({\text{one loss in 10 games}}) = 1 - 0.8^{10} = 0.89 $$

So 89% of the time we would lose all our money. However, the expected return after 10 rounds at \(l=1\) is:

$$ E[\frac{V_{10}}{V_0}]_{l=1} = 3.1 $$

So on average we'd have \$3.10 after 10 rounds for every Dollar initially bet, but 90% of the time we'd lose everything. Strange.

The big reveal

Things become clearer when we look at in more detail at the calculation of \(E[V_{10}/V_0]\):

\begin{align*} E[\frac{V_{10}}{V_0}]_{l=1} && = && 1.4^0 0^{10} && \times && 0.8^0 0.2^{10} && + \\ && && 1.4^1 0^9 && \times && 0.8^1 0.2^{9} && + \\ && && 1.4^2 0^8 && \times && 0.8^2 0.2^{8} && + \\ && && ... && && \\ && && 1.4^9 0^1 && \times && 0.8^9 0.2^{1} && + \\ && && 1.4^{10} 0^0 && \times && 0.8^{10} 0.2^{0} \tag{*} \\ \end{align*}

The expected value is the sum of probability-weighted outcomes(\(1.4\) and \(0\) are the per-round outcomes for win and loss). Since a single loss results in loss of all money, the only non-zero term in the sum is the starred one, that occurs with about 11% probability, at a value gain of \(1.4^{10} = 28\). This high gain is enough to drag the expected return up to 3.1. When more then 10 rounds are played, these numbers become more extreme: the winning probability plummets, but the winning payoff skyrockets, dragging the expected return further upwards.

This is a bit reminiscent of the St. Petersburg paradox, in that arbitrarily small probabilities can drag the expected return to completely different (read "unrealistic") values.

Different kinds of playing

The Kelly approach builds on the assumption that you play with all your available wealth as base capital, and tells you what fraction of that amount to invest. It requires reinvestment of your winnings. Obviously, investing everything in one game (\(l=1\)) is insane, since a single loss would brankrupt you. However, following Kelly's strategy is the fastest way to grow total wealth.

The expected-value approach of "invest everything you have" is applicable in a different kind of situation. Let's say you can play only one game per day, have a fixed gambling budget each day, and thus are barred from reinvesting your wins. If you invest your full daily gambling budet, you may win or lose, but over the long run you will average a daily return of \(1.4 \cdot 0.8 = 1.12\) for every dollar invested. The more you can invest per day, the higher your wins, thus the pressure towards large \(l\) values.

In a way, Kelly optimizes for the highest probability of large returns when re-investing winnings, while the expected value strategy optimizes for large returns, even if the probability is very low.

A dubious game

Should you play a game where the winning probability \(p\) is \(10^{-6}\), but the winning return \(r_W\) is \(2\cdot 10^6\)? Mathematically it seems like a solid bet with a 100% return on investment in the long run. The question is whether you can reach "the long run". Can you afford to play the game a million times? If not, you'll most likely lose money. If you can afford to play a few million times, it becomes a nice investment indeed.

Kelly would tell you to invest only a very small fraction of your total wealth into such a game. The expected-value formalism advises to invest as much as possible, which for most people is bad advice even when playing with a fixed daily budget and no reinvestment (i.e. the expected-value play style).

This is an interesting example for two reasons:

  • It demonstrates one of the ways how "rich become richer" - the game has high returns, but also a significant barrier to entry.
  • It demonstrates a downside of both the Kelly- and the expected-value approach. The two strategies are optimal in their use cases in the limit of infinitely many games, however for finitely many games they may give bad advice, especially regarding to very low-probability winning scenarios.

Conclusion

So, a brief discussion of the relationship between the Kelly strategy and the expected return. For me, it was striking how two seemingly similar approaches ("maximize the moneys") lead to so different results and how unintuitively the expectation value can be in the face of outliers.

If you're interested in a Jupyter Notebook containing several interactive plots that really helped me understand this material, you can find it here.

The Kelly criterion

Over the course of this blog post series, we looked at the classical Kelly criterion in the first post, and how it can be extended to situations such as stock buying, with multiple parallel investment opportunities, in the second post. Next, we investigated the origin of the logarithm in the Kelly formula in the third post, before finishing up with the current discussion about expected values.

Surely, there's more to say about the Kelly criterion. If you want to leave your thoughts, please do so in the comments below!


The Kelly Criterion: Where does the logarithm come from?

This is the third post in a four-part series exploring the Kelly criterion:

The neat thing about the derivations in the last two posts is that they give a motivation for "optimizing the logarithm of wealth". The logarithm is not put in by decree, but is a mathematical technicality that arises from the repeated betting process! Kelly mentions this in his original paper:

At every bet he maximizes the expected value of the logarithm of his capital. The reason has nothing to do with the value function which he attached to his money, but merely with the fact that it is the logarithm which is additive in repeated bets and to which the law of large numbers applies.

This argument is very general. Let's say we model the wealth \(V_n\) after \(n\) rounds of betting based on the initial wealth \(V_0\) in terms of a function \(f(R, l) = V_{n+1} / V_n\) as

$$ V_n = V_0 f(R, l)^n \tag 1 $$

where \(R\) is a random variable describing the possible outcomes of the game, and \(l\) is the fraction of available money to invest in each round. Then we can derive the following formula for the best \(l\) value:

$$ l_{opt} = \operatorname*{argmax}_l \prod_j f(r_j, l)^{p_j n} = \operatorname*{argmax}_l \sum_j p_j \log f(r_j, l) = \operatorname*{argmax}_l E \left[ \log \frac{V_{n+1}}{V_n} \right] \tag 2 $$

where \(r_j\) are the possible investment outcomes and \(p_j\) are the associated probabilities. The crucial change from random variable \(R\) to outcomes and probabilities \(r_j\) and \(p_j\) is justified by the law of large numbers. Based on the exponential nature of the formula, switching to a logarithmic view feels very natural.

Consequently, neither the details of the game — represented by the random variable \(R\) — nor the exact form of the per-round return \(f\) matter. Any iterative scheme with reinvestment of profits should be representable in the form of equation (1), leading to the logarithm in solution (2). Beyond the origin of the logarithm, this analysis also shows the universality of the Kelly derivation.

Unfortunately, this argument is mostly skipped in online discussions. Often the logarithm is not justified at all, or it is treated as "genius from the 1950s says: use \(\log\)". Sometimes the result is also linked to utility theory, which posits that having twice the money is not twice as useful. While utility theory may be true, reasonable people can disagree on their utility function — exactly how useful more or less money is to them. However, Kelly's result is not grounded in utility, and the \(\log\) does not represent logarithmic utility of money. Consequently, even people who disagree on their utility function should agree that the Kelly criterion is the fastest way to gain wealth.

I hope this post shed some light on the reasoning behind the Kelly decision scheme. If you're interested in a Jupyter Notebook containing several interactive plots that really helped me understand this material, you can find it here.

In the next post, we'll take a closer look at the relation of the Kelly criterion to expected values.


The Kelly Criterion: Multiple Investment Opportunities

This is the second post in a four-part series exploring the Kelly criterion:

After the last post introduced the Kelly criterion and its application in deciding how much to invest in a single gamble, we'll investigate whether Kelly can help us choosing between multiple investment opportunities.

We'll start with a mathematical model:

$$ V_1 = V_0 \left( (1+r_0)(1 - \sum_i l_i) + \sum_i l_i(1 + r_i) \right) $$

Simple, right? :-)

  • \(V_0, V_1\) are the available money before and after the first round, as before.
  • \(r_0\) is the risk-free return rate. This allows us to model opportunity costs (could put the money into a bank account instead of investing, \(r_0 = 0.004\)), or inflation (non-invested money loses value over time \(r_0 = -0.02\)).
  • \(l_i\) is the fraction of the available money invested in stock \(i\).
  • \(r_i\) is a random variable describing returns for stock \(i\).

The first term \((1+r_0) (1 - \sum_i l_i)\) represents all the money not invested in any stock, being invested at the risk-free return rate \(r_0\). The second term \(\sum_i l_i(1 + r_i)\) represents the outcomes of the investments in individual stocks \(i\).

We will re-formulate a bit to simplify the expression:

$$ \frac{V_1}{V_0} = (1+r_0) - \sum_i l_i (1+r_0) + \sum_i l_i(1 + r_i) = (1+r_0) + \sum_i l_i(r_i - r_0) $$

We can further simplify by merging the \(l_i\) and \(r_i\) into the vectors \(\vec l\) and \(\vec r\), with \(\vec 1\) being a vector where all elements are \(1\):

$$ \frac{V_1}{V_0} = (1+r_0) + \vec l \cdot (\vec r - r_0 \vec 1) $$

Let's move from a single game round to \(n\) rounds:

$$ \frac{V_n}{V_0} = \left( (1+r_0) + \vec l \cdot (\vec r - r_0 \vec 1) \right)^n $$

Now we are again in a position where we can follow Kelly's prescription: invoking the law of large numbers and finding the \(l\) which maximizes the gain \(V_n/V_0\).

For large \(n\), we can approximate:

$$ \frac{V_n}{V_0} \approx \prod_j \left( (1+r_0) + \vec l \cdot (\vec r_j - r_0 \vec 1) \right)^{p_j n} $$

In this step we switched from the random variable \(\vec r\) to its outcomes \(\vec r_j\). Each possible outcome \(\vec r_j\) occurs with probability \(p_j\). We justify this switch with the law of large numbers: the outcome \(\vec r_j\) will be observed proportionally to its probability, \(p_j n\) times (for large \(n\)). Consequently, we will have \(p_j n\) factors involving \(\vec r_j\) in the overall product, with \(j\) iterating over all potential outcomes of \(\vec r\).

Note that since \(\vec r_j\) is vector-valued, \(p_j\) is a joint probability distribution.

At this point we are nearly finished. We are looking for the vector \(\vec l_{opt}\) that maximizes \({V_n}/{V_0}\). In analogy to last post, we arrive at:

$$ \vec l_{opt} = \operatorname*{argmax}_{\vec l} \frac{V_n}{V_0} = \operatorname*{argmax}_{\vec l} \log \frac{V_n}{V_0} = \operatorname*{argmax}_{\vec l} \sum_j p_j \log \left( (1+r_0) + \vec l \cdot (\vec r_j - r_0 \vec 1) \right) $$

This equation is not analytically solvable, but may be approximated as a quadratic programming problem as described in a paper by Vasily Nekrasov.

Conclusion

It should be obvious that the Kelly criterion is applicable in a wide range of scenarios, from gambling over investment decisions to whether to buy insurance. If you're interested in a Jupyter Notebook containing several interactive plots that really helped me understand this material, you can find it here.

In the next post we'll discuss the origins of the logarithm in the Kelly formula.


The Kelly Criterion: Introduction

This is the first post in a four-part series exploring the Kelly criterion:

The Kelly criterion is a formula used to determine the optimal size of a series of bets in order to maximize wealth. It is often described as optimizing the logarithm of wealth, and will do better than any other strategy in the long run.

Let's look at a simple gamble: when you play, you win with probability \(p\) and lose with probability \(q = 1-p\). If you win, you get back your initial bet plus a fraction \(r\) of that bet. If you lose, your bet is forfeited. The Kelly formula calculating the optimal fraction of your available wealth to bet is:

$$ l_{opt} = \frac{pr - q}{r} = \frac{pr - (1-p)}{r} = p - \frac{1-p}{r} $$

Here is a small demo to explore the formula:

Kelly calculation demo

\(p\){{(p/100).toFixed(2)}}
\(r\){{(r/100).toFixed(2)}}

Probability of winning: {{p}} %
Profit on win, as percentage of placed bet: {{r}} %
Optimal fraction of wealth bet in this game: {{ (100*computeKelly(p/100, r/100)).toFixed(1) }} %

The main takeaways are:

  • In each game you invest a fraction \(l\) of the total amount of money you have. If you play the game from the demo (\(p=0.8, r=0.4\)) and you have $100 available, you should invest $30 (since \(l=0.3\)). If you win, you will have $112 (\(100 + 30 \times 0.4\)) and invest $33.60 (\(112 \times 0.3\)) in the second round. If you lost the first round you'd have $70, and invest $21 in the second round. If you can not reinvest your earnings, Kelly does not apply.
  • This formula holds when a large number of games are played. If you play only few rounds of a game this may not apply.
  • When it applies, the Kelly strategy will generate the most wealth out of your initial starting capital.

Derivation of the Kelly criterion

We will take a short dive into mathematics to understand how Kelly came up with this formula. Let's compute the wealth \(V_1\) after the first round of gaming, when starting from an initial wealth \(V_0\):

$$ V_{1} = V_{0} (1 + Wlr_W + (1-W)lr_L) $$

Here, \(l\) is the fraction of money bet, \(r_W\) is the profit on win, \(r_L\) is the fraction of the bet forfeited on loss (in the above scenario, \(r_L=-1\)). \(W\) is a random variable that's \(1\) with probability \(p\) and \(0\) with \(1-p\). So with probability \(p\) we'll have \(W=1\), giving \(V_{1,win} = V_0 (1+lr_W)\), with probability \(1-p\) we'll get \(V_{1,loss} = V_0 (1+lr_L)\).

We can reformulate the expression in a way that will show an important generalization:

$$ V_1 = V_0 (1+lr_W)^W(1+lr_L)^{(1-W)} $$

The two expressions look very different, but they are equivalent since \(W\) can only take on the values \(0\) and \(1\). If \(W=1\) the second term will vanish since \(x^0 = 1\), and we will get the same result for \(V_{1,win}\). The same consideration holds for \(V_{1,loss}\).

If we repeat this game \(n\) times, we arrive at

$$ V_n = V_0 (1+lr_W)^{W_n}(1+lr_L)^{(n-W_n)} $$

with \(W_n\) being the number of wins in those \(n\) games. Essentially, each won round adds a factor \((1+lr_W)\) to the expression, each lost round \((1+lr_L)\).

Now comes a crucial step: how many wins do we expect after playing many games? Asked mathematically, what value will \(W_n\) have for large \(n\)? Because of the law of large numbers we expect \(W = pn\). So for \(p=0.8\) and \(n=10000\) we would expect to win 8000 out of the 10000 games.

Let's plug this into our formula:

$$ V_n = V_0 (1+lr_W)^{pn}(1+lr_L)^{(1-p)n} $$

The law of large numbers allowed us to eliminate the random variables \(W\), \(W_n\) and instead obtain an explicit formula in terms of \(p, n, r_W, r_L\). Since we are interested in maximizing our profit, we are looking for \(l_opt\) — the \(l\) value that maximizes \({V_n}/{V_0}\).

$$ l_{opt} = \operatorname*{argmax}_l \frac{v_n}{v_0} = \operatorname*{argmax}_l \log \frac{v_n}{v_0} = \operatorname*{argmax}_l p \log(1 + l r_W) + (1-p) \log(1 + l r_L) $$

We are free to introduce the logarithm and drop the \(n\) factor because this does not change the value of \(l_{opt}\).1 Differentiating the logarithmic expression and solving for \(l\) gives:

$$ l_{opt} = -\frac{p}{r_L} - \frac{1-p}{r_W} $$

This is identical with the formula given in the introduction after substituting \(r_L = -1\).

Extension to games with more than two possible outcomes

We can extend the Kelly argument to games that have a greater number of possible outcomes. Such games could be:

  • A game with 6 different rewards selected by roll of a 6-sided die, where rewards may be negative (money lost)
  • An investment in a stock, here we have a continuous range of outcomes.

If we have multiple outcomes \(r_i\), each happening with probability \(p_i\), we obtain:

$$ l_{opt} = \operatorname*{argmax}_l \sum_i p_i \log(1 + l r_i) $$

While this equation is not easily solvable analytically, it is trivial to solve numerically. This allows us to extend the Kelly approach to a new kind of game with some interesting practical applications.

If you're interested in a Jupyter Notebook containing several interactive plots that really helped me understand the material outlined in this post, you can find it here.

The Kelly formalism can be generalized further still, taking into account multiple parallel investment opportunities. We'll look at this in the next post.


1 The logarithm is a strictly monotonically increasing function, so \(\alpha \gt \beta \Leftrightarrow \log \alpha \gt \log \beta\). Thus it doesn't change the location of any maxima: \(\operatorname*{argmax} f(x) = \operatorname*{argmax} \log f(x)\). See here for more information about properties of \(\operatorname*{argmax}\).


Exploring GDB's Python API with Jupyter

GDB — the most common console debugger on Linux systems — has a Python API for adding new debugger commands and pretty-printers for complex data structures, or to automate debugging tasks.

While Python scripts can be loaded from files, it is nice to interactively explore the API or the debugged program. IPython would be perfect for the job, but starting it directly inside GDB doesn't work well. Fortunately, it's easy to launch an IPython kernel and connect with an external Jupyter console.

Launching the kernel from the gdb prompt:

(gdb) python
>import IPython; IPython.embed_kernel()
>end

Gives the following message:

To connect another client to this kernel, use:
    --existing kernel-12688.json

We can start Jupyter on a separate terminal and connect to this kernel:

$ jupyter console --existing kernel-12688.json
In [1]: gdb.newest_frame().name()
Out[1]: 'main'

The GDB Python API is then available from the gdb module within this Python session. To get started, I'd suggest the API documentation or this series of tutorials.

Currently, only the console client can connect to existing kernels. Support in the Notebook or in Jupyter Lab is tackled in this Github issue. Even with the limited capabilities of the console client, it's a great way to explore the API and to tackle more complicated debugging problems that require automation to solve.


PyDays 2017

PyDays 2017 was Austria's first conference dedicated to the Python programming language. It took place on May 5 and May 6 and was graciously hosted by the Linuxwochen Wien at FH Technikum in Vienna. It was great on many levels: meeting new people excited about Python and where it's headed, over 20 talks, interesting hallway conversations, etc.

I helped out with the organization of the conference, mostly taking care of catering. We (the organization team) are very happy with how everything went. The atmosphere was welcoming and open, the quality of the talks and workshops was very good, attendance was great, and people seemed to have a really good time. From an organizational perspective, the talks mostly stayed on-time, the audio/video hardware worked fine, and the PyDays booth was well-received. Giving out snacks, coffee, soda, cake during breaks worked out nicely and was received very well. The feedback, both at and after the conference, was very positive.

In addition to the co-organization of the Conference, I also held a talk about Dask, a Python library for parallel computing. It provides data structures similar to NumPy Arrays or Python Dataframes that process operations in parallel and scale to out-of-memory datasets. Dask is exciting since it provides the analytical power and familiar mental model of Dataframes, but handles hundreds of gigabytes of data and allows seamless scaling from a single laptop to a computing cluster. The talk was well-received, and there was a lot of interest at the end and in the hallway afterwards. The slides are online, if you're interested as well.

Finally, I'd like to thank my co-organizers, Claus, Kay, Helmut and Sebastian, as well as the people who helped out during the conference, for all their great work to make the PyDays a success. I'd also like to thank our sponsors: the Python Software Foundation, the Python Software Verband e.V., T-Mobile, UBIMET, and Jetbrains.

Overall it was great to see so much community interest in Python. If you're passionate about Python as well, join us on the Python Austria Slack and our meetups!


GCC compatibility: inline namespaces and ABI tags

Keeping libraries binary-compatible with old versions is hard. Recently, GCC was in the unenviable situation of having to switch its std::string implementation.1 GCC used inline namespaces and ABI tags to minimize the extent of breakage and to ensure that old and new versions could only be combined in a safe way. Here, we'll have a look at those mitigation techniques: what they are and how they work.

First, some background: GCC used to have a copy-on-write implementation for std::string. However, C++11 does not allow this anymore because of new iterator and reference invalidation rules. So, GCC 5.1 introduced a new implementation of std::string. The new version was not binary-compatible with the old one: exchanging strings between old (pre-5.1) and new code would crash. We say the application binary interface (ABI) changed.

To understand the consequences of this, let's look at several scenarios:

  • A program uses only code compiled with GCC < 5.1: only old strings, works
  • A program uses only code compiled with GCC >= 5.1: only new strings, works
  • A program mixes code compiled with different GCC versions: both types of strings exist in the program, it could crash.

Let's dig into the last bullet point. When would it crash? Whenever "new" code accesses an "old" string or vice versa. Here are some examples where f() is compiled with an older GCC version and called from "new" code:

void f(int i);                 // (a) safe, no std::string involved
void f(const std::string& s);  // (b) will crash when f accesses s
std::string f();               // (c) will crash when the returned string is used

Given how common std::string is, such crashes would happen frequently if "old" and "new" code were combined. Unfortunately, it's surprisingly easy to end up with a program with some pre-5.1 parts and some newer ones. It's sufficient to link a "new" executable to an "old" library. Given the bad consequences, the GCC developers needed to solve this.

The solution is to change the symbol names of the GCC 5.1 std::string and all functions using it. Because the linker uses symbol names to resolve function calls into libraries, this would cause link-time errors for cases (b) and (c) while (a) would still work. Exactly the intended behavior.

How could this be done? The mechanism that converts C++ names into symbol names is called name mangling. The generated symbol names contain information about namespaces, function names, argument types, etc. Putting the new std::string into a separate namespace would change the symbol names of all functions accepting std::string as argument. But that's crazy—std::string needs to be in namespace std, right?

The solution for this is inline namespaces. All classes, functions, and templates declared in an inline namespace are automatically imported into the parent namespace. Their mangled name still references the original location, though.

Here's what GCC 5.1 does:

namespace std {
    inline namespace __cxx11 {
        template<typename _CharT, ...>  basic_string;
        typedef basic_string<char>      string;
    }
}

Looking at f(const std::string& s), what would the symbol names be when compiled with older and newer GCC versions?

Older GCCGCC 5.1
Symbol name_Z1fRKSs_Z1fRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Decoded symbol name2f(std::basic_string<char, ...> const&)f(std::__cxx11::basic_string<char, ...> const&)

Indeed, the symbol names are different and the linker would give an error if GCC 5.1 code called f(const std::string &s) from a library compiled with an older GCC version. This solves the compatibility problem for all functions taking std::string as argument.

One problem remains: case (c) from above, std::string f(). The return value type of a function is not part of its mangled name.3 Thus, the symbol name wouldn't change for functions that return std::string, which could still lead to runtime crashes.

GCC 5.1 solves this with the use of ABI tags. From the documentation:

The abi_tag attribute can be applied to a function, variable, or class declaration. It modifies the mangled name of the entity to incorporate the tag name, in order to distinguish the function or class from an earlier version with a different ABI

[...]

When a type involving an ABI tag is used as the type of a variable or return type of a function where that tag is not already present in the signature of the function, the tag is automatically applied to the variable or function.

GCC applies the attribute __abi_tag__ ("cxx11")) to the std::__cxx11 namespace. This affects all classes therein, including the new version of std::string. The ABI tag propagates to all functions that return a string and changes their symbol name.

Let's look at the symbol names for std::string f() for different compiler versions:

Older GCCGCC 5.1
Symbol name_Z1fv_Z1fB5cxx11v
Decoded symbol name2f()f[abi:cxx11]()

Again, the symbol name is different, and the "cxx11" ABI tag is applied to std::string f() with GCC 5.1. This completes the second part of GCC's solution for the std::string ABI change.

These two methods to induce symbol name changes for anything using std::string make the migration path to GCC 5.1 much easier. If the program links correctly it should work, and be safe from difficult-to-debug runtime errors. There would be more to discuss about the GCC 5.1 changes, such as how libstdc++.so still exports the old std::string implementation for compatibility. But this post has gone on too long already, so let's leave it at that :-)


1 GCC 5.1 also introduced a new version of std::list. It is handled analogously to std::string.
2 Decoded using c++filt from the binutils package.
3 The return value type doesn't need to be part of the symbol name because it doesn't get used for overload resolution.



Building External Libraries with Boost Build

I recently ran into the problem of having to use an external library not available for our distribution. Of course, it would be possible to build Debian or RPM packages, deploy them, and then link against those. However, building sanely-behaving library packages is tricky.

The easier way is to include the source distribution in a sub-folder of the project and integrate it into our build system, Boost Build. It seemed prohibitive to create Boost Build rules to build the library from scratch (from *.cpp to *.a/*.so) because the library had fairly complex build requirements. The way to go was to leverage the library's own build system.

Easier said than done. I only arrived at a fully integrated solution after seeking help on the Boost Build Mailing List (thanks Steven, Vladimir). This is the core of it:

path-constant LIBSQUARE_DIR : . ;

make libsquare.a
    : [ glob-tree *.c *.cpp *.h : generated-file.c gen-*.cpp ]
    : @build-libsquare
    ;
actions build-libsquare
{
    ( cd $(LIBSQUARE_DIR) && make )
    cp $(LIBSQUARE_DIR)/libsquare.a $(<)
}

alias libsquare
    : libsquare.a                                        # sources
    :                                                    # requirements
    :                                                    # default-build
    : <include>$(LIBSQUARE_DIR) <dependency>libsquare.a  # usage-requirements
    ;

The first line defines a filesystem constant for the library directory, so the build process becomes independent of the working directory.

The next few lines tell Boost Build about a library called libsquare.a, which is built by calling make. The cp command copies the library to the target location expected by Boost Build ($(<)). glob-tree defines source files that should trigger library rebuilds when changed, taking care to exclude files generated by the build process itself.

The alias command defines a Boost Build target to link against the library. The real magic is the <dependency>libsquare.a directive: it is required because the library build may produce header files used by client code. Thus, build-libsquare must run before compiling any dependent C++ files. Adding libsquare.a to an executable's dependency list won't quite enforce this: it only builds libsquare.a in time for the linking step (*.oexecutable), but not for the compilation (*.cpp*.o). In contrast, the <dependency>libsquare.a directive propagates to all dependent build steps, including the compilation, and induces the required dependencies.

I created an example project on GitHub that demonstrates this. It builds a simple executable relying on a library built via make. The process is fully integrated in the Boost Build framework and triggered by a single call to bjam (or bb2).

If you need something along these lines give this solution a try!


Working with angles (is surprisingly hard)

Due to the periodic nature of angles, especially the discontinuity at 2π / 360°, working with them presents some subtle issues. For example, the angles 5° and 355° are "close" to each other, but are numerically quite different. In a geographic (GIS) context, this often surfaces when working with geometries spanning Earth's date-line at -180°/+180° longitude, which often requires multiple code paths to obtain the desired result.

I've recently run into the problem of computing averages and differences of angles. The difference should increase linearly, it should be 0 if they are equal, negative if the second angle lies to one side of the first, positive if it's on the other side (whether that's left or right depends on whether the angles increase in the clockwise direction or in counter-clockwise direction). Getting that right and coming up with test cases to prove that it's correct was quite interesting. As Bishop notes in Pattern Recognition and Machine Learning, it's often simpler to perform operations on angles in a 2D (x, y) space and then back-transform to angles using the atan2() function. I've used that for the averaging function; the difference is calculated using modulo arithmetic.

Here's the Python version of the two functions:

def average_angles(angles):
    """Average (mean) of angles

    Return the average of an input sequence of angles. The result is between
    ``0`` and ``2 * math.pi``.
    If the average is not defined (e.g. ``average_angles([0, math.pi]))``,
    a ``ValueError`` is raised.
    """

    x = sum(math.cos(a) for a in angles)
    y = sum(math.sin(a) for a in angles)

    if x == 0 and y == 0:
        raise ValueError(
            "The angle average of the inputs is undefined: %r" % angles)

    # To get outputs from -pi to +pi, delete everything but math.atan2() here.
    return math.fmod(math.atan2(y, x) + 2 * math.pi, 2 * math.pi)


def subtract_angles(lhs, rhs):
    """Return the signed difference between angles lhs and rhs

    Return ``(lhs - rhs)``, the value will be within ``[-math.pi, math.pi)``.
    Both ``lhs`` and ``rhs`` may either be zero-based (within
    ``[0, 2*math.pi]``), or ``-pi``-based (within ``[-math.pi, math.pi]``).
    """

    return math.fmod((lhs - rhs) + math.pi * 3, 2 * math.pi) - math.pi

The code, along with test cases can also be found in this GitHub Gist. Translation of these functions to other languages should be straight-forward, sin()/cos()/fmod()/atan2() are pretty ubiquitous.