Friday, December 18, 2009

Converting Leo to Python 3K: surprises

This post will conclude the discussion of how I converted Leo to Python 3k. In fact, the work was mostly complete several weeks ago. Vacation got in the way of writing up these notes.

The biggest surprise was that almost all the surprises were good surprises. Aside from the complications previously discussed, porting Leo was a matter of running Leo and fixing the obvious problems reported by Python itself.

To recap, it took just a few days to get Leo to the point at which Leo could run its own unit tests with Python 3k. After that, it took just a few more hours, spread over several days, to get all the unit tests to pass.

Yesterday I completed the one and only "tricky" part of the port. This was a library problem. I won't bore you with the details: the problem arose from a mis-reading of the Python 3k docs. Today, as I was writing this post, I discovered my mistake and was able to clean up an ugly workaround.

So that's it. Porting a major app like Leo from Python 2.x to Python 3.x was almost completely straightforward.

Edward

Sunday, December 6, 2009

Coverting Leo to Python 3k: tactics

In this posting I'll discuss how to create code that will compile on both Python 2.x and Python 3.x. This is mostly straightforward, but there are some interesting tactical details.

1. The g.isPython3 constant

Leo uses the toPython3 constant to distinguish between old (Python 2.x) and new (Python 3.x) code:

isPython3 = sys.version_info >= (3,0,0)

This constant is defined in only one place, in Leo's leoGlobals module, known as 'g' throughout Leo's code. All of Leo's modules import this module before any other imports as follows:

import leo.core.leoGlobals as g

Leo distinguishes old and new code this way:

if g.isPython3:
<< old code >>
else:
<< new code >>

Naturally, we want to keep this kind of code to a minimum by refactoring common instances of this pattern into new functions. In Leo, the obvious place for these new functions is in g, the leoGlobals module.

2. Print statements and g.pr

Print statements with a trailing comma pose a special problem. Such statements are syntactically invalid in Python 3.x, so there is *no way* to use the pattern above. Late last year I converted almost all of Leo's print statements to the g.pr function. This conversion was the single biggest change to Leo's code base for Python 3.x.

Theoretically, I only needed to eliminate print statements with trailing commas, but even "normal" print statements pose problems. In Python 3.x, 'print(a,b)' produces the same output as 'print a,b' produces in Python 2.x. But in Python 2.x, 'print(a,b)' prints the *tuple* (a,b).

Yes, there is a workaround: to translate 'print a,b' to `print('%s %s' % (a,b))'. But I thought this was too ugly. Besides, defining g.pr gave me the opportunity to use a clever convention that supports automatic translation of some, but not all arguments. The convention is just this: g.pr translates only *odd* arguments. For example:

g.pr(a,b,c,d,e)

prints the translation of a, c and e, but prints b and d as they are, untranslated. To suppress the translation of the first argument, use an empty string as the first argument:

g.pr('',a)

Imo, this convention is much better than the typical convention of using _(a) to denote the translation of a. Besides being cleaner visually, it is often exactly what is desired anyway. For example:

g.pr('can not open', fileName)

does exactly the right thing: we want to translate 'can not open' but we should not translate fileName!

In short, removing print statements from Leo makes a virtue out of necessity. A few print statements do remain in Leo, but they are just for debugging and are seldom used. It doesn't matter whether those statements produce the same results in Python 3.x as they do in Python 2.x.

One more detail. There is no way to use 'print' in the g.pr function so that it compiles on both Python 2.x and Python 3.x. Instead, the g.pr function uses sys.stdout.write. Alas, sys.stdout.write is not precisely equivalent to print in all situations, but this can not be helped.

3. Unicode constants

Unicode constants of the form u'whatever' are invalid in Python 3.x. The workaround is code such as the following:

if isPython3:
f = str
else:
f = unicode
aConstant = f('whatever')

This kind of code is common enough to have been refactored:

aConstant = g.toUnicode('whatever')

I'll be saying more about unicode issues in later posts.

Friday, December 4, 2009

Coverting Leo to Python 3k: strategy

In the first post of this series, I stated the goal of this project: to create a common code base that will run Leo on both Python 2.x and Python 3.x. I am confident this goal will be accomplished because the following 3-step strategy guarantees success:

Step 1: Create a common code base that compiles on both Python 2.6 and Python 3.1. I completed this step yesterday, although one or two minor issues remain. Leo continues to run with Python 2.6 as it always has, and all of Leo's existing unit tests pass with the new code base when run with Python 2.6.

Step 2: Run the new code using Python 3.1, fixing bugs as they are found, until Leo executes its startup code without crashing. I completed this step yesterday as well.

Step 3: Run all of Leo's unit tests using Python 3.1. When this step is complete, there will be little or nothing left to do: Leo's unit tests cover almost all the essential features of Leo, except for some gui-related issues that already appear to work properly.

This strategy is guaranteed to work: porting Leo to Python 3.x is a low-risk project.

Only tactical details remain. Most details need no further comments, but I'll discuss two tactical issues in further posts:

A. How Leo's code can be made to compile in Python 2.6/3.1.

B. How Leo handles unicode in both Python 2.6/3.1.

The second is more important theoretically, the first more difficult practically.

P.S. A word or two about timing. Leo depends on Qt, so I had to wait until Qt itself supported Python 3.x before I could attempt to finish steps 2 and 3 above. PyQt supported Python 3.x at version 4.6, and it is now at version 4.6.2, so I can be pretty confident that Qt will handle Python 3.x pretty well.

I restarted this project yesterday after reading a comment praising Python 3.x. My guess is that 2010 will be the year the Python world moves decisively to Python 3.x.

Coverting Leo to Python 3k: goals

This is the first in a series of posts that will describe how I am converting my Leo app to run on Python 3k. I will be writing these posts before the project is complete so that the details are fresh in my mind. But the intention is to say something that will be generally useful to anyone contemplating a similar project: I'll keep Leo-specific details to a minimum. My emphasis will be on strategy and tactics, not code-level details.

The one and only goal of this project is straightforward: the final product will be code that compiles and runs on both Python2k and Python3k without any modification whatsoever. A common code base is an absolute requirement, for two reasons:

1. Leo is under active development using bzr. It would be intolerable to attempt to support multiple code bases for any length of time.

2. Python's 2to3 code-conversion tool does not begin to have the sophistication needed to automatically convert Leo's code. But even if 2to3 did have the smarts, it would odious to add an intermediate code-conversion step every time anyone changed Leo's code base.

This goal is just a bit ambitious, but I have absolute confidence that it can be accomplished. I'll tell you why in the next post.