Tuesday, July 21, 2009

What success would look like

This post will summarize what I consider the criteria for success for projects such as pypy and psyco. I was going to say that this post will be a brief summary, but interesting ideas intruded :-)

You could say that this post distills a series of my questions and answers given by the psyco folks several years ago. As I consider these questions now, I'm surprised by how my intuitions seem to have changed in the interval.

To state the conclusions first: to be considered successful, both psyco and pypy would at least have to meet the performance of CPython, Python's default interpreter.

As the jit article should make clear, there is every reason to believe that this goal can be accomplished. Indeed, jits routinely succeed in this sense. And its not hard to see why. Even though CPython is written in C, it executes the exact same code every time it interprets a particular bytecode. It's ignorant of context, and stays ignorant. This ignorance is costly: it means that it must execute potentially unnecessary tests on the type of of arguments to the bytecode, including arguments on the stack. Worse, there is no way to optimize operations for arguments of known types, as discussed in the runtime-specialization article.

So it seems that out-performing CPython is a fairly reasonable goal to attempt.

However, there same strategies could conceivably produce gains that seem counter-intuitive to old-timers like me. That is, one can imagine situations in which runtime specialization will do better than even C or assembly languages codes that do no take advantage of information available at run time.

I have a hunch there is an important Aha lurking here. Before trying to say what that is, let me recommend to you the following paper:
Transparent Dynamic Optimization:The Design and Implementation of Dynamo

When I first read this paper, I thought that what they had accomplished was clearly impossible, namely to use software to speed up hardware. All my intuitions said that hardware operations were orders of magnitude faster than software operations, and for that reason, trying to intervene with software was doomed to failure. When I read the paper, my first thought was, "how did anyone think to do something this clever?"

But my intuitions were quite wrong. True, software is much slower than hardware, but in some sense this is irrelevant. In some cases, optimizing the instructions to be presented to the CPU is so valuable that it is worth significant extra effort, even if that effort takes place in software.

This is pretty close to the heart of the matter, so let's look at this from several other perspectives. Originally, I had thought of the situation as a matter of accounting. This isn't wrong, but it may be misleading. Considered as an accounting problem, one asks whether the optimization effort (done by software) can be "amortized" over the potential savings. Considered in this way, perhaps the answer is less clear than it might be. But notice, there is ample experimental evidence that the effort can be amortized.

But what if I simply stop fixating on the notion that software is much slower than hardware? Is there another point of view that makes the desired speedups seem not so outlandish? My intuition tells me there is, although it's not completely clear.

Let's try some ideas and see where they lead.

First, let's focus on the improved code, not on how long it takes to improve the code. Clearly, the improved code could be arbitrarily better than the unimproved code. For example, the improved code could be in a loop that is executed a huge number of time.

Second, let's not assume that "dispatching" to improved code or caching the improved code, or stitching together the improved code are necessarily time consuming operations. We might find clever ways of doing these operations very quickly. We might even find ways of eliminating these "overhead" operations entirely.

Lets go back again and focus on the improved code. As a thought experiment, lets ask, "How good could the improved code get?" The beauty of thought experiments is that we can rely on magic as needed. So let's assume some garden-variety magic and assume that all the overhead that formerly we were so worried about has magically just disappeared. Is this such an outrageous thing to assume? I think not. You could imagine it as being a species of loop unwinding combined with type specific specializations.

Furthermore, lots of information can be held implicitly in specialized code. The more I think about this, the more it seems like the master key to optimization. Indeed, all specialized code could be considered to be (or have) implicit information simply by virtue of the fact that the specialized code is a valid code improvement. In other words, if we execute the improved code there are a raft of underlying assumptions about context that must, in fact, be true for the improved code to be valid.

But this implicit information might indeed be sufficient to perform further operations. In particular, it may be sufficient to eliminate explicit overhead operations such as fetching code from a cache, linking to other improved code snippets, etc., etc.

In other words, we might be able to make any knowledge-based "magic" a reality.

I'm not entirely convinced that this is the best demonstration that large speedups are possible, but I think it may be good enough. OTOH, I have one of those feelings that an even better point of view is lurking nearby. For example, one could propose the following hypothesis:

Given enough optimization, we can treat all implicit knowledge as if it were explicitly available at no additional cost.

This still doesn't seem like a perfectly convincing argument, but it will have to do for now...I think you can sense the excitement promised by psyco and (even more) in pypy.

Edward

No comments:

Post a Comment