A neat Python debugger command

pdb is a console-mode debugger built into Python. Out of the box, it has basic features like variable inspection, breakpoints, and stack frame walking, but it lacks more advanced capabilities.

Fortunately, it can be customized with a .pdbrc file in the user's home directory. Ned Batchelder has several helpful commands in his .pdbrc file:

  • pl: print local variables
  • pi obj: print the instance variables of obj
  • ps: print the instance variables of self

Printing instance variables is great for quickly inspecting objects, but it shows only one half of the picture. What about the class-side of objects? Properties and methods are crucial for understanding what can actually be done with an object, in contrast to what data it encapsulates.

Since I couldn't find a readily available pdb command for listing class contents, I wrote my own:

# Print contents of an object's class (including bases).
alias pc for k,v in sorted({k:v for cls in reversed(%1.__class__.__mro__) for k,v in cls.__dict__.items() if cls is not object}.items()): print("%s%-20s= %-80.80s" % ("%1.",k,repr(v)))

pc lists the contents of an object's class and its base classes. Typically, these are the properties and methods supported by the object. It is used like this:

# 'proc' is a multiprocessing.Process() instance.
(Pdb) pc proc
...
proc.daemon              = <property object at 0x036B9A20>
proc.exitcode            = <property object at 0x036B99C0>
proc.ident               = <property object at 0x036B9A50>
proc.is_alive            = <function BaseProcess.is_alive at 0x033E4618>
proc.join                = <function BaseProcess.join at 0x033E45D0>
proc.name                = <property object at 0x036B99F0>
proc.pid                 = <property object at 0x036B9A50>
proc.run                 = <function BaseProcess.run at 0x033E4A98>
proc.start               = <function BaseProcess.start at 0x033E4DB0>
proc.terminate           = <function BaseProcess.terminate at 0x033E4DF8>

Note the difference to pi, which lists the contents of the proc instance:

(Pdb) pi proc       # In contrast, here is the image dictionary.
proc._args          = ()
proc._config        = {'authkey': b'\xd0\xc8\xbd\xd6\xcf\x7fo\xab\x19_A6\xf8M\xd4\xef\x88\xa9;\x99c\x9
proc._identity      = (2,)
proc._kwargs        = {}
proc._name          = 'Process-2'
proc._parent_pid    = 1308
proc._popen         = None
proc._target        = None

In general, pc focuses on the interface while pi examines the state of the object. The two complement each other nicely. Especially when working with an unfamiliar codebase, pc is helpful for quickly figuring out how to use a specific class.

pc works with both Python 2 and Python 3 (on Python 2 it only shows new-style classes). Add it to your .pdbrc and give it a try. Let me know what you think!

Python's GIL and atomic reference counting

I love Python because it's an incredibly fun, expressive and productive language. However, it's often criticised for being slow. I think the correct answer to that is two-fold:

  • Use an alternative Python implementation for impressive speed-ups. For example, PyPy is on average 7 times faster than the standard CPython.
  • Not all parts of a program have to be blazingly fast. Use Python for all non-performance-critical areas, such as the UI and database access, and drop to C or C++ only when it's required for for CPU intensive tasks. This is easy to achieve with language binding generators such as CFFI, SWIG or SIP.

A further performance-related issue is Python's Global Interpreter Lock (GIL), which ensures that only one Python thread can run at a single time. This is a bit problematic, because it affects PyPy as well, unless you want to use its experimental software transactional memory support.

Why is this such a big deal? With the rise of multi-core processors, multithreading is becoming more important as well. This not only affects performance on large servers, it impacts desktop programs and is crucial for battery life on mobile phones (race to idle). Further, other programming languages make multi-threaded programming easier and easier. C, C++, and Java have all moved to a common memory model for multithreading. C++ has gained std::atomic, futures, and first-class thread support. C# has async and await, which is proposed for inclusion in C++ as well. This trend will only accelerate in the future.

With this in mind, I decided to investigate the CPython GIL. Previous proposals for its removal have failed, but I thought it's worth a look — especially since I couldn't find any recent attempts.

The results were not encouraging. Changing the reference count used by Python objects from a normal int to an atomic type resulted in a ~23% slowdown on my machine. This is without actually changing any of the locking. This penalty could be moderated for single-threaded programs by only using atomic instructions once a second thread is started. This requires a function call and an if statement to check whether to use atomics or not in the refcount hot path. Doing this still results in an 11% slowdown in the single-threaded case. If hot-patching was used instead of the if, a 6% slowdown remained.

refcount slowdown
Atomic refcounts slow down the overall Python execution speed by >20%.

The last result looks promising, but is deceiving. Hot-patching would rely on having a single function to patch. Alas, the compiler decided to mostly inline the Py_INCREF/Py_DECREF function calls. Disabling inlining of these functions gives a 16% slowdown, which is worse than the "call + if" method. Furthermore, hot-patching is probably not something that could be merged in CPython anyway.

So what's the conclusion? Maybe turning Py_INCREF and Py_DECREF into functions and living with the 11% slowdown of the single-threaded case would be sell-able, if compelling speed-ups of multithreaded workloads could be shown. It should be possible to convert one module at a time from the GIL to fine-grained locking, but performance increases would only be expected once at least a couple of core modules are converted. That would take a substantial amount of work, especially given the high risk that the resulting patches wouldn't be accepted upstream.

Where does this leave Python as a language in the multi-threaded world? I'm not sure. Since PyPy is already the solution to Python's performance issue, perhaps it can solve the concurrency problem as well with its software transactional memory mode.

PS: My profiling showed that the reference counting in Python (Py_INCREF() and Py_DECREF()) takes up to about 5–10% of the execution time of benchmarks (not including actual object destruction), crazy!

Subclassing C++ in Python with SIP

I use SIP in MapsEvolved to generate bindings for interfacing Python with C++. I really like SIP due to its straight-forward syntax that mostly allows just copying class definitions over from C++. Further, it's really well thought out and contains support for a number of advanced use cases.

One such feature is implementing a C++ interface in Python. The resulting class can then even be passed back to C++, and any methods called on it will be forwarded to the Python implementation. Sweet!

Here is an example I originally wrote for this Stack Overflow question. It illustrates how ridiculously easy it is to get this working:

visitor.h:

class EXPORT Node {
public:
    int getN() const;
    ...
};
struct EXPORT NodeVisitor {
    virtual void OnNode(Node *n) = 0;
};
void visit_graph_nodes(NodeVisitor *nv);

visitor.sip:

%Module pyvisit

%ModuleHeaderCode
#include "visitor.h"
%End

class Node {
public:
    int getN() const;
    ...
};

struct NodeVisitor {
    virtual void OnNode(Node* n) = 0;
};

void visit_graph_nodes(NodeVisitor *nv);

Using it from Python:

>>> import pyvisit
>>> class PyNodeVisitor(pyvisit.NodeVisitor):
>>>     def OnNode(self, node):
>>>         print(node.getN())
>>> pnv = PyNodeVisitor()
>>> visit_graph_nodes(pnv)
1
2
3
...

Here, the C++ function visit_graph_nodes() calls the Python method pnv.OnNode() for every node in its (internal) graph. A zip file with the full working source code of this example can be downloaded here.

The subclassing capabilities of SIP don't stop at interfaces, either. It's possible to derive from any C++ class, abstract or not, inheriting (or overriding) existing method implementations as needed. This gives a lot of flexibility and makes it easy to have classes with some parts implemented in C++, and others being in Python.