Sunday, December 6, 2009

Coverting Leo to Python 3k: tactics

In this posting I'll discuss how to create code that will compile on both Python 2.x and Python 3.x. This is mostly straightforward, but there are some interesting tactical details.

1. The g.isPython3 constant

Leo uses the toPython3 constant to distinguish between old (Python 2.x) and new (Python 3.x) code:

isPython3 = sys.version_info >= (3,0,0)

This constant is defined in only one place, in Leo's leoGlobals module, known as 'g' throughout Leo's code. All of Leo's modules import this module before any other imports as follows:

import leo.core.leoGlobals as g

Leo distinguishes old and new code this way:

if g.isPython3:
<< old code >>
else:
<< new code >>

Naturally, we want to keep this kind of code to a minimum by refactoring common instances of this pattern into new functions. In Leo, the obvious place for these new functions is in g, the leoGlobals module.

2. Print statements and g.pr

Print statements with a trailing comma pose a special problem. Such statements are syntactically invalid in Python 3.x, so there is *no way* to use the pattern above. Late last year I converted almost all of Leo's print statements to the g.pr function. This conversion was the single biggest change to Leo's code base for Python 3.x.

Theoretically, I only needed to eliminate print statements with trailing commas, but even "normal" print statements pose problems. In Python 3.x, 'print(a,b)' produces the same output as 'print a,b' produces in Python 2.x. But in Python 2.x, 'print(a,b)' prints the *tuple* (a,b).

Yes, there is a workaround: to translate 'print a,b' to `print('%s %s' % (a,b))'. But I thought this was too ugly. Besides, defining g.pr gave me the opportunity to use a clever convention that supports automatic translation of some, but not all arguments. The convention is just this: g.pr translates only *odd* arguments. For example:

g.pr(a,b,c,d,e)

prints the translation of a, c and e, but prints b and d as they are, untranslated. To suppress the translation of the first argument, use an empty string as the first argument:

g.pr('',a)

Imo, this convention is much better than the typical convention of using _(a) to denote the translation of a. Besides being cleaner visually, it is often exactly what is desired anyway. For example:

g.pr('can not open', fileName)

does exactly the right thing: we want to translate 'can not open' but we should not translate fileName!

In short, removing print statements from Leo makes a virtue out of necessity. A few print statements do remain in Leo, but they are just for debugging and are seldom used. It doesn't matter whether those statements produce the same results in Python 3.x as they do in Python 2.x.

One more detail. There is no way to use 'print' in the g.pr function so that it compiles on both Python 2.x and Python 3.x. Instead, the g.pr function uses sys.stdout.write. Alas, sys.stdout.write is not precisely equivalent to print in all situations, but this can not be helped.

3. Unicode constants

Unicode constants of the form u'whatever' are invalid in Python 3.x. The workaround is code such as the following:

if isPython3:
f = str
else:
f = unicode
aConstant = f('whatever')

This kind of code is common enough to have been refactored:

aConstant = g.toUnicode('whatever')

I'll be saying more about unicode issues in later posts.

2 comments:

  1. Personally, I'd suggest dropping print entirely and using sys.stdout- specifically passing in the handle to be printed to.

    I wasn't always a fan of it, but have been doing it in my code for a long while- it makes testing a *helluva* lot easier among other things (since you can pass in a StringIO and collect the output for validation) and avoids the py2k/py3k issue entirely.

    ReplyDelete
  2. Martin v. Löwis presented an interesting approach to maintain one code base that runs in both python 2 and python 3 at this years German Zope Users Group (DZUG) conference. The code is maintained in python 2 and modified until it runs on python 3 without further modifications after it is transformed by 2to3. The number of changes to the codebase are few and the code will be simpler because you don't need many conditionals/helper functions to support python 2 and 3. These changes will be done by automated transformations with 2to3.
    He gave some examples by refactoring some core zope pagages and their dependencies to work on python 2 and 3.

    He and some zope guys worked on extentions that will run the transformation inside setuptools/distutils so they will be completely transparent to the user, and on support for our test chain.

    You can find his (german) presentation with links to the ported packages here:
    http://www.zope.de/redaktion/dzug/tagung/2009/die-vortraege/portierung_zodb_python_3-1.pdf/view

    I think he gave a similar presentation at Europython or pycon us.

    ReplyDelete