Exploring GDB's Python API with Jupyter

GDB — the most common console debugger on Linux systems — has a Python API for adding new debugger commands and pretty-printers for complex data structures, or to automate debugging tasks.

While Python scripts can be loaded from files, it is nice to interactively explore the API or the debugged program. IPython would be perfect for the job, but starting it directly inside GDB doesn't work well. Fortunately, it's easy to launch an IPython kernel and connect with an external Jupyter console.

Launching the kernel from the gdb prompt:

(gdb) python
>import IPython; IPython.embed_kernel()
>end

Gives the following message:

To connect another client to this kernel, use:
    --existing kernel-12688.json

We can start Jupyter on a separate terminal and connect to this kernel:

$ jupyter console --existing kernel-12688.json
In [1]: gdb.newest_frame().name()
Out[1]: 'main'

The GDB Python API is then available from the gdb module within this Python session. To get started, I'd suggest the API documentation or this series of tutorials.

Currently, only the console client can connect to existing kernels. Support in the Notebook or in Jupyter Lab is tackled in this Github issue. Even with the limited capabilities of the console client, it's a great way to explore the API and to tackle more complicated debugging problems that require automation to solve.

Debugging riddle of the day

One of our services failed to start on a test system (Ubuntu 12.04 on amd64). The stdout/stderr log streams contained only the string “Permission denied” – less than helpful. strace showed that the service tried to create a file under /run, which it doesn't have write permissions to. This caused the it to bail out:

open("/run/some_service", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0644) = -1
    EACCES (Permission denied)

Grepping the source code and configuration files for /run didn't turn up anything that could explain this open() call. Debugging with gdb gave further hints:

Breakpoint 2, 0x00007ffff73e3ea0 in open64 () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff73e3ea0 in open64 () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7bd69bf in shm_open () from /lib/x86_64-linux-gnu/librt.so.1
#2  0x0000000000400948 in daemonize () at service.cpp:93
#3  0x00000000004009ac in main () at main.cpp:24
(gdb) p (char*)$rdi
$1 = 0x7fffffffe550 "/run/some_service"
(gdb) frame 2
#2  0x0000000000400948 in daemonize () at service.cpp:93
9           int fd = shm_open(fname.c_str(), O_RDWR | O_CREAT, 0644);
(gdb) p fname
$2 = {...., _M_p = 0x602028 "/some_service"}}

The open("/run/some_service", ...) was caused by an shm_open("/some_service", ...).

This code is working on other machines, why does it fail on this particular one? Can you figure it out? Bonus points if you can explain why it is trying to access /run and not some other directory. You might find the shm_open() man page and source code helpful.

I'll be waiting for you.

The solution is pretty evident after examining the Linux version of shm_open(). By default, it tries to create shared memory files under /dev/shm. If that doesn't exist, it will pick the first tmpfs mount point from /proc/mounts.

In Ubuntu 12.04, /dev/shm is a symlink to /run/shm. On this machine the symlink was missing, which caused shm_open() to go hunting for a tmpfs filesystem, and /run happened to be the first one in /proc/mounts.

Re-creating the symlink solved the problem. Why it was missing in the first place is still unclear. In the aftermath, we're also improving the error messages in this part of the code to make such issues easier to diagnose.