dh_virtualenv and long package names (FileNotFound error)

Interesting tidbit about Linux:

A maximum line length of 127 characters is allowed for the first line in a #! executable shell script.

from man execve

Should be enough, right?

Wrong.

Well, not if you are using dh_virtualenv with long package names, anyway:

Installing pip...
  Error [Errno 2] No such file or directory while executing command /tmp/lo...me/bin/easy_install /usr/share/python-virtualenv/pip-1.1.tar.gz
...Installing pip...done.
Traceback (most recent call last):
  File "/usr/bin/virtualenv", line 3, in <module>
virtualenv.main()
  File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 938, in main
never_download=options.never_download)
  File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 1054, in create_environment
install_pip(py_executable, search_dirs=search_dirs, never_download=never_download)
  File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 643, in install_pip
filter_stdout=_filter_setup)
  File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 976, in call_subprocess
cwd=cwd, env=env)
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Traceback (most recent call last):
  File "/usr/bin/dh_virtualenv", line 106, in <module>
sys.exit(main() or 0)
  File "/usr/bin/dh_virtualenv", line 83, in main
deploy.create_virtualenv()
  File "/usr/lib/python2.7/dist-packages/dh_virtualenv/deployment.py", line 112, in create_virtualenv
subprocess.check_call(virtualenv)
  File "/usr/lib/python2.7/subprocess.py", line 511, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['virtualenv', '--system-site-packages', '--setuptools', 'debian/long-package-name/usr/share/python/long-package-name']' returned non-zero exit status 1
make: *** [binary-arch] Error 1
dpkg-buildpackage: error: fakeroot debian/rules binary gave error exit status 2

dh_virtualenv is used to create Debian packages that include Python virtualenvs. It is one of the better ways of packaging Python software, especially if there are Python dependencies that are not available in Debian or Ubuntu. When building a .deb package, it creates a virtualenv in a location such as:

/<build-directory>/debian/<packagename>/usr/share/python/<packagename>

This virtualenv has several tools under its bin/ directory, and they all have the absolute path of the virtualenv's Python interpreter hard-coded in their #! shebang line:

#!/<build-directory>/debian/<packagename>/usr/share/python/<packagename>/bin/python

Given that <build-directory> often contains the package name as well, it's easy to overflow the 128 byte limit of the #! shebang line. In my case, with a ~30 character package name, the path length grew to 160 characters!

Consequently, the kernel couldn't find the Python executable anymore, and running any of the tools from the bin/ directory gave an ENOENT (file not found) error. This is what happened when virtualenv tried to install pip during the initial setup. The root cause of this error is not immediately obvious, to say the least.

To check whether this affects you, check the line length of any script with wc:

head -n 1 /path/to/virtualenv/bin/easy_install | wc -c

If that's larger than 128, it's probably the cause of the problem.

The fix is to change the package name and/or the build location to something shorter. The alternative would be to patch the Linux kernel, which – depending on your preferences – sounds either fun or really unpleasant. Suit yourself!


Behold the power of perf-tools

perf-tools is a collection of scripts for system-wide tracing on Linux. It's really, really cool. It's what the perf command should have included from day one, but didn't.

It is packaged in Debian and Ubuntu, but those versions miss some key features. As perf-tools consists of shell scripts (no compilation necessary), I recommend using the GitHub version directly:

git clone https://github.com/brendangregg/perf-tools.git

Two tools that are included are execsnoop and opensnoop, which trace new program executions and open() calls across the whole system.

$ sudo ./execsnoop
TIME        PID   PPID ARGS
21:12:56  22898  15674 ls --color=auto -la
21:12:56  22899  15674 git rev-parse --is-inside-work-tree
21:12:56  22900  15674 git rev-parse --git-dir
...

$ sudo ./opensnoop
Tracing open()s. Ctrl-C to end.
COMM             PID      FD FILE
opensnoop        22924   0x3 /etc/ld.so.cache
gawk             22924   0x3 /usr/lib/locale/locale-archive
top              15555   0x8 /proc/1/stat
...

Maybe the most interesting tool is uprobe. It's magic: it traces function calls in arbitrary user-space programs. With debugging symbols available, it can trace practically every function in a program. Without them, it can trace exported functions or arbitrary code locations (specified by raw address). It can also trace library code, e.g. libc). Having these possibilities on a production system without any prior setup is staggering.

$ sudo user/uprobe -F -l /tmp/a.out | grep quicksort
_Z9quicksortN9__gnu_cxx17__normal_iteratorIPiSt6vectorIiSaIiEEEES5_
$ sudo user/uprobe -F p:/tmp/a.out:_Z9quicksortN9__gnu_cxx17__normal_iteratorIPiSt6vectorIiSaIiEEEES5_
Tracing uprobe _Z9quicksort[snip] (p:_Z9quicksort[snip] /tmp/a.out:0x8ba). Ctrl-C to end.
   a.out-23171 [000] d... 1860355.891238: _Z9quicksort[snip]: (0x80488ba)
   a.out-23171 [000] d... 1860355.891353: _Z9quicksort[snip]: (0x80488ba)
   ...

(To demangle the C++ function names, use the c++filt tool.)

perf-tools really shows the power of the Linux perf/ftrace infrastructure, and make it usable for the broad masses. There are several other tools that analyze latency and cache hit rates, trace kernel functions, and much more. To finally have such functionality in Linux is fabulous!


Running strace for multiple processes

Just a quick note about strace, the ancient Linux system-call tracing tool.

It can trace multiple processes at once: simply start it with multiple -p arguments (the numbers give the processes' PIDs):

sudo strace -p 2916 -p 2929 -p 2930 -p 2931 -p 2932 -o /tmp/strace.log

This is great for tracing daemons which use separate worker processes, for example Apache with mpm-prefork.


Plotting maps with Folium

Data visualization in Python is a well solved problem by now. Matplotlib and it's prettier cousin Seaborn are widely used to generate static graphs. Bokeh generates HTML files with interactive, JavaScript-based graphs. It's a great way of sharing data with other people who don't have a Python development environment ready. Several other libraries exist for more specialized purposes.

What has been missing for a long time was good map libraries. Plotting capabilities were fine, but basemap support of the existing libraries was very limited. For example, the popular Matplotlib-basemap has great plot types (contour maps, heatmaps, ...) but can't show any high-resolution maps: it only has country/state shapes or whole-world images. Consequently, it's useless for drawing city or street level maps, unless you want to set up your own tile server (you don't).

Along comes Folium, a library that generates interactive maps in HTML format based on Leaflet.js. It supports, among others, OpenStreetMap and MapBox base layers which look great and provide enough details for large-scale maps.

Here is an example that shows some GPS data I cleaned up with a Kalman filter:

def plot(points, center):
    map_osm = folium.Map(location=center, zoom_start=16, max_zoom=23)
    map_osm.line(locations=points)
    map_osm.create_map(path='folium-example.html')

Here's what it looks like. I find it pretty neat, especially given that it took only 3 lines of code to create:

 


Mixing C runtime library versions

When compiling code with Microsoft's Visual C/C++ compiler, a dependency on Microsoft's C runtime library (CRT) is introduced. The CRT can be linked either statically or dynamically, and comes in several versions (different version numbers as well as in debug and release variants).

Complications arise when libraries linked into the same program use different CRTs. This happens if they were compiled with different compiler versions, or with different compiler flags (static/dynamic CRT linkage or release/debug switches). In theory, this could be made to work, but in practice it is asking for trouble:

  • If the CRT versions differ (either version number or debug/release flag), you can't reliably share objects generated by CRT A with any code that uses CRT B. The reason is that the two CRTs may have a different memory layout (structure layout) for the that object. The memory location that CRT A wrote the object size to might be interpreted by CRT B as a pointer, leading to a crash when CRT B tries to access that memory.
  • Even if the same CRT version is included twice (once statically, once dynamically linked), they won't share a heap. Both CRTs track the memory they allocate individually, but they don't know anything about objects allocated by the other CRT. If CRT A tries to free memory allocated by CRT B, that causes heap corruption as the two CRTs trample on each others feet. While you can freely share objects between CRTs, you have to be careful whenever memory is allocated or freed. This can sometimes be managed when writing C code, but is very hard to do correctly in C++ (where e.g. pushing to a vector can cause a reallocation of its internal buffer).

Accordingly, having multiple CRTs in the same process is fragile at best. When mixing CRTs, there are no tools to check whether objects are shared in a way that's problematic, and manual tracking is subtle, easy to get wrong, and unreliable. Mistakes will lead to difficult-to-diagnose bugs and intermittent crashes, generally at the most inconvenient times.

To keep your sanity, ensure that all code going into your program uses the same CRT.1 Consequently, all program code, as well as all libraries, need to be compiled from scratch using the same runtime library options (/MD or /MT). Pre-compiled libraries are a major headache, because they force the use of a specific compiler version to get matching CRT version requirements. If multiple pre-compiled libraries use different CRT versions, there may not be any viable solution at all.

This situation will improve with the runtime library refactoring in VS2015, which promises CRT compatibility to subsequent compiler versions. Thus, this inconvenience should mostly be solved in the future.


1 Dependency Walker can be used to list all dynamically linked CRT versions. I'm not sure whether that can be done for statically linked CRTs, I generally avoid those.


Debian's most annoying warning message: "Setting locale failed"

You ssh into a server or you enter a chroot, and the console overflows with these messages:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "en_US:en",
        LC_ALL = (unset),
        LC_TIME = "en_US.utf8",
        LC_CTYPE = "de_AT.UTF-8",
        LC_COLLATE = "C",
        LC_MESSAGES = "en_US.utf8",
        LANG = "de_AT.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory

Gaaaaarrrrh!

Fortunately the fix is easy (adjust the locale names for your situation):

locale-gen en_US.UTF-8 de_AT.UTF-8

Finally, peace on the console.


Profiling

Profiling is hard. Measuring the right metric and correctly interpreting the obtained data can be difficult even for relatively simple programs.

For performance optimization, I'm a big fan of the poor man's profiler: run the binary to analyze under a debugger, periodically stop the execution, get a backtrace and continue. After doing this a few times, the hotspots will become apparent. This works amazingly well in practice and gives a reliable picture of where time is spent, without the danger of skewed results from instrumentation overhead.

Sometimes it's nice to get a more fine-grained view. That is, not only find the hotspot, but get an overview how much time is spent where. That's where 'real' profilers come in handy.

Under Windows, I like the built-in "Event Tracing for Windows" (ETW), which produces files that can be analyzed with Xperf/Windows Performance Analyzer. It is a really well thought out system, and the Xperf UI is amazing in the analyzing abilities that it offers. Probably the best place to start reading up on this is ETW Central.

Under Linux, I haven't found a profiler I can really recommend, yet. gprof and sprof are both ancient and have severe limitations. OProfile may be nice, but I haven't had a chance to use it yet, as it wasn't available for my Ubuntu LTS release.

I have used Callgrind from the Valgrind toolkit in combination with the KCachegrind GUI analyzer. I typically invoke it like this:

valgrind --tool=callgrind --callgrind-out-file=callgrind-cpu.out ./program-to-profile
kcachegrind callgrind-cpu.out

Callgrind works by instrumenting the binary under test. It slows down program execution, often by a factor of 10. Further, it only measures CPU time, so sleeping times are not included. This makes it unsuitable for programs that wait a significant amount of time for network or disk operations to complete. Despite these drawbacks, it's pretty handy if CPU time is all that you're interested in.

If blocking times are important (as they are for so many modern applications - we generally spend less time computing and more time communicating), gperftools is a decent choice. It includes a CPU profiler that can be run in real-time sampling mode, and the results can viewed in KCachegrind. It is recommended to compile libprofiler.so into the binary to analyze, but using LD_PRELOAD works decently well:

CPUPROFILE_REALTIME=1 CPUPROFILE=prof.out LD_PRELOAD=/usr/lib/libprofiler.so ./program-to-profile
google-pprof --callgrind ./program_to_profile prof.out > callgrind-wallclock.out
kcachegrind callgrind-wallclock.out

If it works, this gives a good overall profile of the application. Unfortunately, it sometimes fails: on amd64, there are sporadic crashes from within libunwind. It's possible to just ignore those and rerun the profile, at least interesting data is obtained 50% of the time.

The more serious problem is that CPUPROFILE_REALTIME=1 causes gperftools to use SIGALARM internally, conflicting with any applications that want to use that signal for themselves. Looking at the profiler source code, it should be possible to work around this limitation with the undocumented CPUPROFILE_PER_THREAD_TIMERS and CPUPROFILE_TIMER_SIGNAL environment variables, but I couldn't get that to work yet.

You'd think that perf has something to offer in this area as well. Indeed, it has a CPU profiling mode (with nice flamegraph visualizations) and a sleeping time profiling mode, but I couldn't find a way to combine the two to get a real-time profile.

Overall, there still seems to be room for a good, reliable real-time sampling profiler under Linux. If I'm missing something, please let me know!


Returning generators from with statements

Recently, an interesting issue came up at work that involved a subtle interaction between context managers and generator functions. Here is some example code demonstrating the problem:

@contextlib.contextmanager
def resource():
    """Context manager for some resource"""

    print("Resource setup")
    yield
    print("Resource teardown")


def _load_values():
    """Load a list of values (requires resource to be held)"""

    for i in range(3):
        print("Generating value %d" % i)
        yield i


def load_values():
    """Load values while holding the required resource"""

    with resource():
        return _load_values()

This is the output when run:

>>> for val in load_values(): pass
Resource setup
Resource teardown
Generating value 0
Generating value 1
Generating value 2

Whoops. The resource is destroyed before the values are actually generated. This is obviously a problem if the generator depends on the existence of the resource.

When you think about it, it's pretty clear what's going on. Calling _load_values() produces a generator object, whose code is only executed when values are requested. load_values() returns that generator, exiting the with statement and leading to the destruction of the resource. When the outer for loop (for val) comes around to iterating over the generator, the resource is long gone.

How do you solve this problem? In Python 3.3 and newer, you can use the yield from syntax to turn load_values() into a generator as well. The execution of load_values() is halted at the yield from point until the child generator is exhausted, at which point it is safe to dispose of the resource:

def load_values():
    """Load values while holding the required resource"""

    with resource():
        yield from _load_values()

In older Python versions, an explicit for loop over the child generator is required:

def load_values():
    """Load values while holding the required resource"""

    with resource():
        for val in _load_values():
            yield val

Still another method would be to turn the result of _load_values() into a list and returning that instead. This incurs higher memory overhead since all values have to be held in memory at the same time, so it's only appropriate for relatively short lists.

To sum up, it's a bad idea to return generators from under with statements. While it's not terribly confusing what's going on, it's a whee bit subtle and not many people think about this until they ran into the issue. Hope this heads-up helps.


A better way for deleting Docker images and containers

In one of my last posts, I described the current (sad) state of managing Docker container and image expiration. Briefly, Docker creates new containers and images for many tasks, but there is no good way to automatically remove them. The best practice seems to be a rather hack-ish bash one-liner.

Since this wasn't particularly satisfying, I decided to do something about it. Here, I present docker-cleanup, a Python application for removing containers and images based on a configurable set of rules.

This is a rules file example:

# Keep currently running containers, delete others if they last finished
# more than a week ago.
KEEP CONTAINER IF Container.State.Running;
DELETE CONTAINER IF Container.State.FinishedAt.before('1 week ago');

# Delete dangling (unnamed and not used by containers) images.
DELETE IMAGE IF Image.Dangling;

Clear, expressive, straight-forward. The rule language can do a whole lot more and provides a readable and intuitive way to define removal policies for images and containers.

Head over to GitHub, give it a try, and let me know what you think!


Using Python slice objects for fun and profit

Just a quick tip about the hardly known slice objects in Python. They are used to implement the slicing syntax for sequence types (lists, strings):

s = "The quick brown fox jumps over the lazy dog"

# s[4:9] is internally converted (and equivalent) to s[slice(4, 9)].
assert s[4:9] == s[slice(4, 9)]

# 'Not present' is encoded as 'None'
assert s[20:] == s[slice(20, None)]

slice object can be used in normal code too, for example for tracking regions in strings: instead of having separate start_idx and end_idx variables (or writing a custom class/namedtuple) simply roll the indices into a slice.

# A column-aligned table:
table = ('REPOSITORY   TAG      IMAGE ID       CREATED       VIRTUAL SIZE',
         '<none>       <none>   0987654321AB   2 hours ago   385.8 MB',
         'chris/web    latest   0123456789AB   2 hours ago   385.8 MB',
        )
header, *entries = table

# Compute the column slices by parsing the header. Gives a list of slices.
slices = find_column_slices(header)

for entry in entries:
    repo, tag, id, created, size = [entry[sl].strip() for sl in slices]
    ...

This is mostly useful when the indices are computed at runtime and applied to more than one string.

More generally, slice objects encapsulate regions of strings/lists/tuples, and are an appropriate tool for simplifying code that operates on start/end indices. They provide a clean abstraction, make the code more straight-forward and save a bit of typing.