Leo 4.8 final is now available here.
Leo is a text editor, data organizer, project manager and much more. Look here for more information.
The highlights of Leo 4.8
Leo now uses the simplest possible sentinel lines in external files. External files with sentinels now look like Emacs org-mode files.
Leo Qt gui now supports Drag and Drop. This was one of the most frequently requested features.
Improved abbreviation commands. You now define abbreviations in Leo settings nodes, not external files.
@url nodes may contain url's in body text. This allows headlines to contain summaries: very useful.
Leo now uses PyEnchant to check spelling.
Leo can now open multiple files from the command line.
Leo's ancient Tangle and Untangle commands are now deprecated. This will help newbies how to learn Leo.
Leo now shows "Resurrected" and "Recovered" nodes. These protect data and show how data have changed. These fix several long-standing data-related problems.
A new "screenshots" plugin for creating slide shows with Leo. I used this plugin to create Leo's introductory slide shows.
A better installer.
Many bug fixes.
Links
Leo
Forum
Download
Quotes
Friday, November 26, 2010
Monday, November 22, 2010
Science-based education
From an obituary of Georges Charpak (1924-2010) appearing in Science, 29 October, 2010:
"A team formed under the aegis of the French Academy, and it soon defined a strategy and tactics. The strategy was founded on a single priority: to use science to support the child's mental development. This meant inculcating a taste for questioning, a sense of observation, intellectual rigor, practice with reasoning, modesty in the face of facts, an ability to distinguish between true and false, and an attachment to logical and precise language."
Presumably these values, and values they are, are deeply troubling to some. How else to explain the anti-science attitudes so distressingly prevalent in the United States these days.
Edward
"A team formed under the aegis of the French Academy, and it soon defined a strategy and tactics. The strategy was founded on a single priority: to use science to support the child's mental development. This meant inculcating a taste for questioning, a sense of observation, intellectual rigor, practice with reasoning, modesty in the face of facts, an ability to distinguish between true and false, and an attachment to logical and precise language."
Presumably these values, and values they are, are deeply troubling to some. How else to explain the anti-science attitudes so distressingly prevalent in the United States these days.
Edward
Monday, November 15, 2010
Announcing Leo 4.8 release candidate 1
Leo 4.8 rc1 is now available here.
Leo is a text editor, data organizer, project manager and much more. Look here for more information.
The highlights of Leo 4.8
Leo now uses the simplest possible sentinel lines in external files. External files with sentinels now look like Emacs org-mode files.
Leo Qt gui now supports Drag and Drop. This was one of the most frequently requested features.
Improved abbreviation commands. You now define abbreviations in Leo settings nodes, not external files.
@url nodes may contain url's in body text. This allows headlines to contain summaries: very useful.
Leo now uses PyEnchant to check spelling.
Leo can now open multiple files from the command line.
Leo's ancient Tangle and Untangle commands are now deprecated. This will help newbies how to learn Leo.
Leo now shows "Resurrected" and "Recovered" nodes. These protect data and show how data have changed. These fix several long-standing data-related problems.
A new "screenshots" plugin for creating slide shows with Leo. I used this plugin to create Leo's introductory slide shows.
A better installer.
Many bug fixes.
Links
Leo
Forum
Download
Quotes
Leo is a text editor, data organizer, project manager and much more. Look here for more information.
The highlights of Leo 4.8
Leo now uses the simplest possible sentinel lines in external files. External files with sentinels now look like Emacs org-mode files.
Leo Qt gui now supports Drag and Drop. This was one of the most frequently requested features.
Improved abbreviation commands. You now define abbreviations in Leo settings nodes, not external files.
@url nodes may contain url's in body text. This allows headlines to contain summaries: very useful.
Leo now uses PyEnchant to check spelling.
Leo can now open multiple files from the command line.
Leo's ancient Tangle and Untangle commands are now deprecated. This will help newbies how to learn Leo.
Leo now shows "Resurrected" and "Recovered" nodes. These protect data and show how data have changed. These fix several long-standing data-related problems.
A new "screenshots" plugin for creating slide shows with Leo. I used this plugin to create Leo's introductory slide shows.
A better installer.
Many bug fixes.
Links
Leo
Forum
Download
Quotes
Friday, November 5, 2010
Announcing Leo 4.8 beta 1
Leo 4.8 beta 1 is now available here.
Leo is a text editor, data organizer, project manager and much more. Look here for more information.
The highlights of Leo 4.8
Leo now uses the simplest possible sentinel lines in external files. External files with sentinels now look like Emacs org-mode files.
Leo Qt gui now supports Drag and Drop. This was one of the most frequently requested features.
Improved abbreviation commands. You now define abbreviations in Leo settings nodes, not external files.
@url nodes may contain url's in body text. This allows headlines to contain summaries: very useful.
Leo now uses PyEnchant to check spelling.
Leo can now open multiple files from the command line.
Leo's ancient Tangle and Untangle commands are now deprecated. This will help newbies how to learn Leo.
Leo now shows "Resurrected" and "Recovered" nodes. These protect data and show how data have changed. These fix several long-standing data-related problems.
A new "screenshots" plugin for creating slide shows with Leo. I used this plugin to create Leo's introductory slide shows.
A better installer.
Many bug fixes.
Links
Leo
Forum
Download
Quotes
Leo is a text editor, data organizer, project manager and much more. Look here for more information.
The highlights of Leo 4.8
Leo now uses the simplest possible sentinel lines in external files. External files with sentinels now look like Emacs org-mode files.
Leo Qt gui now supports Drag and Drop. This was one of the most frequently requested features.
Improved abbreviation commands. You now define abbreviations in Leo settings nodes, not external files.
@url nodes may contain url's in body text. This allows headlines to contain summaries: very useful.
Leo now uses PyEnchant to check spelling.
Leo can now open multiple files from the command line.
Leo's ancient Tangle and Untangle commands are now deprecated. This will help newbies how to learn Leo.
Leo now shows "Resurrected" and "Recovered" nodes. These protect data and show how data have changed. These fix several long-standing data-related problems.
A new "screenshots" plugin for creating slide shows with Leo. I used this plugin to create Leo's introductory slide shows.
A better installer.
Many bug fixes.
Links
Leo
Forum
Download
Quotes
Sunday, September 5, 2010
SQLite is serious about testing!
The How SQLite is Tested page is the most interesting discussion of software testing I have ever seen.
Sunday, August 29, 2010
Just say no to demagoguery
The sorry state of the American right wing is nowhere more on view then with the so-called ground-zero mosque. One has to wonder what kind of America they believe in.
Hatred, racism, xenophobia, fear-mongering and demagoguery seem to be the real guiding principles.
Does the Tea Party spell trouble for Democrats? Not likely. Many Republicans and swing voters will be revolted by the Tea Party's crude populism. The Tea Party seems more likely to split the GOP and unify the Dems.
Hatred, racism, xenophobia, fear-mongering and demagoguery seem to be the real guiding principles.
Does the Tea Party spell trouble for Democrats? Not likely. Many Republicans and swing voters will be revolted by the Tea Party's crude populism. The Tea Party seems more likely to split the GOP and unify the Dems.
The men behind the hate and lies
Everyone who cares even a little bit about truth and accountability in politics will be interested in the this piece from the New Yorker.
The Koch brothers are the driving force behind both the Tea Party and climate denial. They have contributed hundreds of millions of dollars to these odious ventures.
The Koch brothers are the driving force behind both the Tea Party and climate denial. They have contributed hundreds of millions of dollars to these odious ventures.
Saturday, August 14, 2010
Sharing code in Leo scripts, part deux
For years I have wanted Leo scripts to be able to share code directly. Now they can--simply, intuitively, dynamically, in a Leonine way.
exec(g.findTestScript(c,h)) is a big breakthrough in Leo scripting; the previous post buried the lead.
To recap, suppose a set of related @test nodes (or any other set of Leo scripts) want to share class definitions in a node whose headline is 'x'. To get these definitions, each node just starts with::
After this one line, the script can use all the class names defined in x without qualification. Furthermore, if I change the definitions in x, these changes immediately become available to all the scripts that use them.
This one-liner is a big step forward in Leonine programming.
exec(g.findTestScript(c,h)) is a big breakthrough in Leo scripting; the previous post buried the lead.
To recap, suppose a set of related @test nodes (or any other set of Leo scripts) want to share class definitions in a node whose headline is 'x'. To get these definitions, each node just starts with::
exec(g.findTestScript(c,'x'))
After this one line, the script can use all the class names defined in x without qualification. Furthermore, if I change the definitions in x, these changes immediately become available to all the scripts that use them.
This one-liner is a big step forward in Leonine programming.
Friday, August 13, 2010
Adding code to scripts, the Leo way
All of Leo's unit tests reside in @test nodes in a single Leo outline. Leo's users will understand the benefits of this approach: it is easy to organize tests, and run them in custom batches. For example, I can run all failing unit tests by creating a node called 'failing tests', and then drag clones of the failing @test nodes so they are children of the 'failing tests' node. I then select that node and hit Alt-4, Leo's run-unit-tests-locally command. This executes all the unit tests in that node only.
Unit tests can often be simplified by sharing common code. Suppose, for example, that I want my unit tests to have access to this class::
Before yesterday's Aha, I would have defined the class Hello in an external file, and then imported the file. For example, a complete unit test (in an @test node) might be::
Aside: Leo's users will know that putting this code in an @test node makes it an official unit test. Leo automatically creates a subclass of UnitTest.TestCase from the body text of any @test node.
Importing code this way works, but it's a static, plodding solution. To change class Hello, I have to switch to another file, make the changes and save that file, and reload the outline that uses it. I've been wanting a better solution for years. Yesterday I saw the answer: it's completely dynamic, it's totally simple and it's completely Leonine.
The idea is this. Suppose the node '@common code for x tests' contains a list of nodes, each of which defines a class or function to be shared by unit tests. A unit test can gain access to the compiled code in these nodes as follows::
Let's look at these lines:
1. The first line finds the node whose headline is '@common code for x tests'. As usual in a Leo script, 'c' and 'g' are predefined. 'c' is bound to the Leo outline itself, and 'g' is bound to Leo's globals module, leo.core.leoGlobals.
2. The second line converts this node and all its descendants into a script. g.getScript handles Leo's section references and @others directives correctly--I can use all of Leo's code-organization features as usual.
3-5 The third line executes the script in the context of the unit test. This defines Hello in the @test node, that is, in the unit test itself! There is no need to qualify Hello. The actual test can be::
That's all there is to it. Naturally, I wanted to make this scheme a bit more concise, so I created g.findTestScript function, defined as follows::
The unit test then becomes::
This shows, I think the power of leveraging outlines with scripts. It would be hard even to think of this in emacs, vim, Eclipse, or Idle.
The difference in the new work-flow is substantial. Any changes I make in the common code instantly become available to all the unit tests that use it. I can modify shared code and run the unit tests that depend on it without any "compilation" step at all. I don't even have to save the outline that I'm working on. Everything just works.
Edward
Unit tests can often be simplified by sharing common code. Suppose, for example, that I want my unit tests to have access to this class::
class Hello: def __init__(self,name='john'): self.name=name print('hello %s' % name)
Before yesterday's Aha, I would have defined the class Hello in an external file, and then imported the file. For example, a complete unit test (in an @test node) might be::
import leo.core.leoTest as leoTest h = leoTest.Hello('Bob') assert h.name == 'Bob'
Aside: Leo's users will know that putting this code in an @test node makes it an official unit test. Leo automatically creates a subclass of UnitTest.TestCase from the body text of any @test node.
Importing code this way works, but it's a static, plodding solution. To change class Hello, I have to switch to another file, make the changes and save that file, and reload the outline that uses it. I've been wanting a better solution for years. Yesterday I saw the answer: it's completely dynamic, it's totally simple and it's completely Leonine.
The idea is this. Suppose the node '@common code for x tests' contains a list of nodes, each of which defines a class or function to be shared by unit tests. A unit test can gain access to the compiled code in these nodes as follows::
p = g.findNodeAnywhere(c,'@common code for x tests') script = g.getScript(c,p) exec(script) h = Hello('Bob') assert h.name == 'Bob'
Let's look at these lines:
1. The first line finds the node whose headline is '@common code for x tests'. As usual in a Leo script, 'c' and 'g' are predefined. 'c' is bound to the Leo outline itself, and 'g' is bound to Leo's globals module, leo.core.leoGlobals.
2. The second line converts this node and all its descendants into a script. g.getScript handles Leo's section references and @others directives correctly--I can use all of Leo's code-organization features as usual.
3-5 The third line executes the script in the context of the unit test. This defines Hello in the @test node, that is, in the unit test itself! There is no need to qualify Hello. The actual test can be::
h = Hello('Bob') assert h.name == 'Bob'
That's all there is to it. Naturally, I wanted to make this scheme a bit more concise, so I created g.findTestScript function, defined as follows::
def findTestScript(c,h): p = g.findNodeAnywhere(c,h) return p and g.getScript(c,p)
The unit test then becomes::
exec(g.findTestScript('@common code for x tests')) h = Hello('Bob') assert h.name == 'Bob'
This shows, I think the power of leveraging outlines with scripts. It would be hard even to think of this in emacs, vim, Eclipse, or Idle.
The difference in the new work-flow is substantial. Any changes I make in the common code instantly become available to all the unit tests that use it. I can modify shared code and run the unit tests that depend on it without any "compilation" step at all. I don't even have to save the outline that I'm working on. Everything just works.
Edward
Sunday, August 8, 2010
Leo in a nutshell
I have struggled for years to explain why Leo is interesting. Here is my latest attempt. I think it looks a bit better than usual :-)
Leo combines outlines, data, files and scripting in a unique way. As a result, it takes some time to get the Leo Aha. This page introduces Leo's features and argues that Leo truly is a unique tool.
Outlines and organization: Leo's outlines are far more flexible and powerful than any other outline you have ever used, for at least three reasons:
1. Unlike other browsers, you, not the browser, are in complete control of the outline. You can organize it however you like, and Leo will remember what you have done and will show it to you just that way when come back next time. If you don't think this is important you have never used Leo :-)
2. Leo outlines may look like other outlines, but in fact Leo outlines are views of a more general underlying graph structure. Nodes in Leo's outlines may appear in many places in the same outline. We call such nodes clones. Using clones, it is easy to create as many views of the data in the outline as you like. In effect, Leo becomes a supremely flexible filing cabinet: any outline node may be filed anyplace in this cabinet.
3. Leo outlines are intimately connected to both external files and Python scripting, as explained next.
External files: Any outline node (and its descendants) can be "connected" to any file on your file system. Several kinds of connections exist. The three most common kinds are:
1. @edit: Leo reads the entire external file into the @edit node's body text.
2. @auto: Leo parses the external file and creates an outline that shows the structure of the external file, just as in typical class browsers.
3. @file: Leo makes a two way connection between the @file node (and its descendants) and the external file. You can update the external file by writing the Leo outline connected to it, or you can update the outline by changing the external file. Moreover, you can easily control how Leo writes nodes to the file: you can rearrange how Leo writes nodes. To do all this Leo uses comments in the external file called sentinels that represent the outline structure in the external file itself.
All of these connections allow you to share external files with others in a collaborative environment. With @file, you can also share outline structure with others. Thus, a single Leo outline can contain an entire project with dozens or even hundreds of external files. Using Leo, you never have to open these files by hand, Leo does so automatically when it opens the Leo outline. Leo is a unique new kind of IDE.
Scripting: Every outline node can contain Python scripts. Moreover, each node in a Leo outline is a programmable object, which is easily available to any Leo script. Furthermore, the structure of the outline is also easily available to any script. Thus, nodes can contain programs, or data, or both!
Furthermore, Leo's headlines provide a natural place to indicate the type of data contained in nodes. By convention, @test in a headline denotes a unit test, @command creates a new Leo command, and @button creates a script button, that is, a Python script that can be applied to any node in an outline!
Unifying scripting, data and outline structure creates a new world. We use the term Leonine to denote the Leo-centric (outline-centric) view of programming, data and scripting. Here are some of the implications of of this new world:
Data organization: Leo's clones allow unprecedented flexibility in organizing data. Leo outlines have been used as an entirely new kind of database. It is easily scriptable. As my brother has shown, it is possible to design Leo outlines so that parts of the outline are SQL queries!
Design: With Leo, you always see the big picture, and as many of the details as you like. But this makes outlines ideal for representing designs. In fact, Leo outlines don't just represent designs, they are the designs. For example, all of Leo's source code reside in just a few Leo outlines. There is no need for separate design tools because creating a Leo outlines simultaneously embodies both the design and the resulting code. Furthermore, Leo outlines can also represent input data to other design tools.
Programming: It's much easier to program when the design is always easily visible. Nodes provide the perfect way to organize large modules, classes and functions. Nodes also provide unlimited room to save as many details an notes as you like, without cluttering your overall view of the task, or rather tasks, at hand.
Testing: Leo is a supremely powerful unit-testing framework:
1. You can make node a unit test simply by putting @test at the start of its headline. Leo will then automatically generate all the blah-blah-blah needed to turn the node's script into a fully-functional unit test. Oh yes, the headline becomes the name of the unit test.
2. Unit tests can use data in children of @test nodes. Typical tests put input data in one child node, and the expected results of running the test in another child node. The test simply compares the actual and expected results.
3. You can easily run tests in the entire outline or just in the selected outline. Because tests reside in nodes, you can use clones to organize tests in as many ways as you like. For example, it is trivial to run only those tests that are failing.
Maintenance and support: Leo's ability to contain multiple views of data is precisely what is needed while maintaining any large project. For every new support task and every new bug, a new (plain) task node will contain all the data needed for that task, notes, test data, whatever. Furthermore, when fixing bugs, the task node can contain clones of all classes, methods or functions related to the bug. Fixing a node in the task node fixes the node in the external file! And as always, you can use all of Leo's outlining features (including clones) to organize your task nodes.
Organization everywhere: Have you noticed that Leo's organizational prowess applies to everything? Indeed, you can use outlines and clones in new ways to organize files, projects, data, design, programming, testing, and tasks. Leo doesn't need lots of features--outlines, clones and scripts suffice. The more complex your data, designs, program and tasks, the better Leo is suited to them.
Scripting everything: Let's step back a moment. A single outline can contain databases, designs, actual computer code, unit tests, test scripts and task nodes. But Leo scripts will work on any kind of node. Thus, it is easy to run scripts on anything! Examples:
- Data: The @kind convention for headlines tells scripts what a node contains without having to parse the node's contents. The possibilities are endless.
- Design: scripts can verify properties of design based on either the contents of design nodes or their outline structure.
- Coding: scripts routinely make massive changes to outlines. Scripts and unit tests can (and do!) verify arbitrarily complex properties of outlines.
- Testing: scripts can (and do!) create @test nodes themselves.
- Maintenance: scripts could gather statistics about tasks using simple @kind conventions.
Leo combines outlines, data, files and scripting in a unique way. As a result, it takes some time to get the Leo Aha. This page introduces Leo's features and argues that Leo truly is a unique tool.
Outlines and organization: Leo's outlines are far more flexible and powerful than any other outline you have ever used, for at least three reasons:
1. Unlike other browsers, you, not the browser, are in complete control of the outline. You can organize it however you like, and Leo will remember what you have done and will show it to you just that way when come back next time. If you don't think this is important you have never used Leo :-)
2. Leo outlines may look like other outlines, but in fact Leo outlines are views of a more general underlying graph structure. Nodes in Leo's outlines may appear in many places in the same outline. We call such nodes clones. Using clones, it is easy to create as many views of the data in the outline as you like. In effect, Leo becomes a supremely flexible filing cabinet: any outline node may be filed anyplace in this cabinet.
3. Leo outlines are intimately connected to both external files and Python scripting, as explained next.
External files: Any outline node (and its descendants) can be "connected" to any file on your file system. Several kinds of connections exist. The three most common kinds are:
1. @edit: Leo reads the entire external file into the @edit node's body text.
2. @auto: Leo parses the external file and creates an outline that shows the structure of the external file, just as in typical class browsers.
3. @file: Leo makes a two way connection between the @file node (and its descendants) and the external file. You can update the external file by writing the Leo outline connected to it, or you can update the outline by changing the external file. Moreover, you can easily control how Leo writes nodes to the file: you can rearrange how Leo writes nodes. To do all this Leo uses comments in the external file called sentinels that represent the outline structure in the external file itself.
All of these connections allow you to share external files with others in a collaborative environment. With @file, you can also share outline structure with others. Thus, a single Leo outline can contain an entire project with dozens or even hundreds of external files. Using Leo, you never have to open these files by hand, Leo does so automatically when it opens the Leo outline. Leo is a unique new kind of IDE.
Scripting: Every outline node can contain Python scripts. Moreover, each node in a Leo outline is a programmable object, which is easily available to any Leo script. Furthermore, the structure of the outline is also easily available to any script. Thus, nodes can contain programs, or data, or both!
Furthermore, Leo's headlines provide a natural place to indicate the type of data contained in nodes. By convention, @test in a headline denotes a unit test, @command creates a new Leo command, and @button creates a script button, that is, a Python script that can be applied to any node in an outline!
Unifying scripting, data and outline structure creates a new world. We use the term Leonine to denote the Leo-centric (outline-centric) view of programming, data and scripting. Here are some of the implications of of this new world:
Data organization: Leo's clones allow unprecedented flexibility in organizing data. Leo outlines have been used as an entirely new kind of database. It is easily scriptable. As my brother has shown, it is possible to design Leo outlines so that parts of the outline are SQL queries!
Design: With Leo, you always see the big picture, and as many of the details as you like. But this makes outlines ideal for representing designs. In fact, Leo outlines don't just represent designs, they are the designs. For example, all of Leo's source code reside in just a few Leo outlines. There is no need for separate design tools because creating a Leo outlines simultaneously embodies both the design and the resulting code. Furthermore, Leo outlines can also represent input data to other design tools.
Programming: It's much easier to program when the design is always easily visible. Nodes provide the perfect way to organize large modules, classes and functions. Nodes also provide unlimited room to save as many details an notes as you like, without cluttering your overall view of the task, or rather tasks, at hand.
Testing: Leo is a supremely powerful unit-testing framework:
1. You can make node a unit test simply by putting @test at the start of its headline. Leo will then automatically generate all the blah-blah-blah needed to turn the node's script into a fully-functional unit test. Oh yes, the headline becomes the name of the unit test.
2. Unit tests can use data in children of @test nodes. Typical tests put input data in one child node, and the expected results of running the test in another child node. The test simply compares the actual and expected results.
3. You can easily run tests in the entire outline or just in the selected outline. Because tests reside in nodes, you can use clones to organize tests in as many ways as you like. For example, it is trivial to run only those tests that are failing.
Maintenance and support: Leo's ability to contain multiple views of data is precisely what is needed while maintaining any large project. For every new support task and every new bug, a new (plain) task node will contain all the data needed for that task, notes, test data, whatever. Furthermore, when fixing bugs, the task node can contain clones of all classes, methods or functions related to the bug. Fixing a node in the task node fixes the node in the external file! And as always, you can use all of Leo's outlining features (including clones) to organize your task nodes.
Organization everywhere: Have you noticed that Leo's organizational prowess applies to everything? Indeed, you can use outlines and clones in new ways to organize files, projects, data, design, programming, testing, and tasks. Leo doesn't need lots of features--outlines, clones and scripts suffice. The more complex your data, designs, program and tasks, the better Leo is suited to them.
Scripting everything: Let's step back a moment. A single outline can contain databases, designs, actual computer code, unit tests, test scripts and task nodes. But Leo scripts will work on any kind of node. Thus, it is easy to run scripts on anything! Examples:
- Data: The @kind convention for headlines tells scripts what a node contains without having to parse the node's contents. The possibilities are endless.
- Design: scripts can verify properties of design based on either the contents of design nodes or their outline structure.
- Coding: scripts routinely make massive changes to outlines. Scripts and unit tests can (and do!) verify arbitrarily complex properties of outlines.
- Testing: scripts can (and do!) create @test nodes themselves.
- Maintenance: scripts could gather statistics about tasks using simple @kind conventions.
Tuesday, July 27, 2010
A design for inc-lint, an incremental pylint
This paper discusses the essential features of the design of an incremental pylint, or **inc-lint** for short. It discusses only those aspects that are essential for the success of the project. That is, it is the highest level design.
This design borrows some features from previous prototype of a new pylint, which in this paper I'll call **new-lint**. New-lint used a data-driven algorithm (I called it a sudoku-like algorithm) to do lint-like checking. Many features of this data-driven algorithm will reappear below.
New-lint was an “interesting” failure. It showed that a data-driven approach to lint-like checking is feasible. Alas, it's performance was comparable to that of pylint. This is strong evidence, imo, that the performance of the pylint can not be significantly improved without a major change in strategy.
To get significantly better performance, an **incremental** approach must be used. Such an algorithm computes diffs between old and new versions of files and generates the minimum needed additional checks based on those diffs. My intuition is that inc-lint could be 10 to 100 times faster than pylint in many situations. Inc-lint should be fast enough so that it can be run any time a program changes.
As an extreme example of an incremental approach, inserting a comment into a program should require *no* additional analysis at all. The only work would be to notice that the ast's (parse trees) of the changed file have not changed. More commonly, changes that do not alter the data defined by a module can have no global effects on the program. Inc-lint would check only the changed file. But these checks will happen in the presence of cached data about all other parts of the program, so we can expect such checks to be much faster than pylint's checks.
It was far from obvious that inc-lint was feasible. Indeed, the difficulties seemed overwhelming. Aside from adding or deleting comments, any change to a python file can have ripple effects throughout an entire program. What kind of bookkeeping could possibly keep track of all such changes? For example, diffs based on ast's could not possibly work: the number of cases to consider would be too large. Building incremental features into pylint also seemed hopeless. The present pylint algorithms are extremely complex—adding more complexity into pylint would be a recipe for failure. In spite of these difficulties, a new design gradually emerged.
The essential first step was to accept the fact that some checks must be repeated every time a file changes. These checks include checks that depend on the exact order of statements in a file. For example, the check that a variable is used before being defined is such a check. The check that a 'break' statement appears in an appropriate context is another such check. Otoh, many other checks, including *all* data-flow checks do *not* depend on the order in which definitions appear in files.
The distinction between order-dependent and order-independent checks is the key organizing principle of the design. This lead almost immediately to the fundamental distinction of the design: local analysis and global analysis.
**Local analysis** depends on order of statements in a Python file. Inc-lint completely redoes local analysis for a file any time that file changes. Local analysis performs all checks that depend on the exact form of the parse (ast) trees. As we shall see, the output of local analysis are data that do *not* depend on the order of statements in the parse tree.
**Global analysis** uses the order-independent data produced by local analysis. Global analysis uses a data-driven algorithm: only the *existence* of the data matters, how the data is defined is irrelevant.
This distinction makes an incremental design possible. We don't recompute global checks based on diffs to parse trees. That would be an impossible task. Instead, we recompute global checks based on diffs of order-independent data. This is already an important optimization: program changes that leave order-independent data unchanged will not generate new lint checks.
A **context** is a module, class or function. The **contents** of a context are all the (top-level) variables, classes and functions of that context. For example, the contents of a module context are all the top-level variables, classes and functions of that module. The top-level classes and functions of a module are also contexts: contexts may contain **sub-contexts**.
**Symbol tables** are the internal representation of a context. Contexts may contain sub-contexts, so symbol tables can contain **inner symbol tables**. In other words, symbol tables are recursive structures. The exact form of symbol tables does not matter except for one essential requirement—it must be possible to compare two symbol tables easily and to compute their diffs: the list of symbols (including inner contexts) that appear in one symbol table but not the other.
Local analysis produces the **module symbol table** for that file. The module symbol table and all its inner tables describe every symbol defined anywhere in the module. Local analysis is run (non-incrementally) every time a file changes, so the module symbol table is recreated “from scratch” every time a file changes.
The second output of local analysis is a list of **deductions**, the data that drive the data-driven algorithm done by global analysis. Deductions arise from assignment statements and other statements. You can think of deductions as being the data-flow representation of such statements.
Important: deductions *use* the data in symbol tables, and deductions also *set* the data in symbol tables. The data-driven algorithm is inherently an iterative process.
For example, an assignment a = b implies that the set of types that symbol 'a' can have is a superset of the set of types that symbol 'b' can have. One kind of deduction “completes” the type in the symbol table for 'a' when all types in the right hand side (RHS) of any assignment to a have been deduced. This deduction **fires** only when all the right-hand-sides of assignments to 'a' are known. Naturally, 'a' itself may appear in the RHS of an assignment to another variable 'c'. Once the possible types of 'a' are known, it may be possible to deduce the type of 'c'.
Another kind of deduction checks that operands have compatible types. For example, the expression 'x' + 'y' is valid only if some '+' operator may be applied to 'x' and 'y'. This is a non-trivial check: the meaning of '+' may depend on an __add__ function, which in turn depends on the types of 'x' and 'y'. In any case, these kinds of deductions result in various kinds of lint checks.
Global analysis attempts to satisfy deductions using the information in symbol tables. As in new-lint, the data-driven algorithm will start by triggering **base deductions**, deductions that depend on no other deductions. Satisfied deductions may trigger other deductions. When all possible deductions have been made, the remaining unsatisfied deductions generate error messages.
Large programs will contain many thousands of deductions. We can not afford to rerun all those deductions every time a change is made to a program. Instead, we must compute the (smallest) set of deductions that must be re-verified.
To compute the new deductions, we need a way of comparing the data contained in the changed source files. Comparing (diffing) parse trees will not work. Instead, inc-lint will compare symbol tables.
Happily, comparing symbol tables is easy. Any two source files that define the same contexts will be equivalent (isomorphic), regardless of how those contexts were defined. The diff algorithm will be recursive, mirroring the recursive structure of symbol tables. We expect the diff algorithm to be simple and fast.
The output of the diff will be a list of created and destroyed symbols for any context. Changing a name (in any particular context) is equivalent to deleting the old name and creating a new name.
Inc-lint will cache symbol tables and deductions for all files. This allows us to use avoid the local analysis phase for all unchanged files. However, changes made in the local analysis of a file may affect deductions in many *unchanged* files.
The update phase requires that we be able to find the “users” (referrers) of all data that might change during local analysis. Thus, we expect symbol tables and deductions to use doubly (or even multiply) linked lists. It should be straightforward (and fast!) to update these links during the update phase.
Diffing symbol tables will result in a list of changes. When applying those changes, we want to update the *old* (cached) copy of each symbol table. This will allow references to unchanged items in the symbol table to remain valid. Of course, references to *changed* items will have to be deleted to avoid “dangling pointers”. By taking care to update links we can use typical Python references (pointers) to symbol table entries and deductions. This avoids having to relink pointers to new symbol tables.
Here are the essential features of the design:
1. Inc-lint performs local analysis for all changed files in a project. This phase does all lint checks that depend on the order of statements in a Python program. The output of local analysis is a new symbol table for each changed file, and a list of deductions that must be proved for the changed file.
2. A diff phase compares the old (cached) and new versions of the symbol table. This diff will be straightforward and fast because symbol tables will be designed to be easily diffed. As an optimization, we can bypass the diff if the old and new parse trees are “isomorphic”. For example, files that differ only in whitespace or comments will have isomorphic ast trees.
3. An update phase inserts and deletes cached (global) deductions. Changes to a symbol table may result in changes to deductions in arbitrarily many files of the project. Thus, all symbol table entries and deductions will be heavily linked.
4. After all symbol table entries and deductions have been updated, a data-driven algorithm will attempt to satisfy all unsatisfied deductions, that is, deductions that must be proven (again) because of changes to one or more symbol tables. These deductions correspond to the type-checking methods in pylint. At the end of this phase, still-unsatisfied deductions will result in error messages.
This design looks like the simplest thing that could possibly work. Indeed, it looks like the *only* reasonable design. For simplicity's sake, local analysis *must* be done afresh for all changed files. In contrast, global analysis depends only on deductions and symbol tables, neither of which depends on program order. Thus, we can easily imagine that deductions that depend on unchanged symbol table entries (symbols) will not need to be rechecked.
This design consists of largely independent parts or phases. Difficulties with one part will not cause difficulties or complexity elsewhere. This separation into independent phases and parts is the primary strength of the design. Like Leo's core modules, this design should remain valid even if various parts change significantly.
This design seeks to minimizes the risks to the project. I believe it has accomplished this goal. It should be possible to demonstrate the design with relatively simple prototype code.
All comments are welcome.
Edward K. Ream
July 27, 2010
This design borrows some features from previous prototype of a new pylint, which in this paper I'll call **new-lint**. New-lint used a data-driven algorithm (I called it a sudoku-like algorithm) to do lint-like checking. Many features of this data-driven algorithm will reappear below.
New-lint was an “interesting” failure. It showed that a data-driven approach to lint-like checking is feasible. Alas, it's performance was comparable to that of pylint. This is strong evidence, imo, that the performance of the pylint can not be significantly improved without a major change in strategy.
To get significantly better performance, an **incremental** approach must be used. Such an algorithm computes diffs between old and new versions of files and generates the minimum needed additional checks based on those diffs. My intuition is that inc-lint could be 10 to 100 times faster than pylint in many situations. Inc-lint should be fast enough so that it can be run any time a program changes.
As an extreme example of an incremental approach, inserting a comment into a program should require *no* additional analysis at all. The only work would be to notice that the ast's (parse trees) of the changed file have not changed. More commonly, changes that do not alter the data defined by a module can have no global effects on the program. Inc-lint would check only the changed file. But these checks will happen in the presence of cached data about all other parts of the program, so we can expect such checks to be much faster than pylint's checks.
Inc-lint seemed impossible
It was far from obvious that inc-lint was feasible. Indeed, the difficulties seemed overwhelming. Aside from adding or deleting comments, any change to a python file can have ripple effects throughout an entire program. What kind of bookkeeping could possibly keep track of all such changes? For example, diffs based on ast's could not possibly work: the number of cases to consider would be too large. Building incremental features into pylint also seemed hopeless. The present pylint algorithms are extremely complex—adding more complexity into pylint would be a recipe for failure. In spite of these difficulties, a new design gradually emerged.
Global and Local analysis
The essential first step was to accept the fact that some checks must be repeated every time a file changes. These checks include checks that depend on the exact order of statements in a file. For example, the check that a variable is used before being defined is such a check. The check that a 'break' statement appears in an appropriate context is another such check. Otoh, many other checks, including *all* data-flow checks do *not* depend on the order in which definitions appear in files.
The distinction between order-dependent and order-independent checks is the key organizing principle of the design. This lead almost immediately to the fundamental distinction of the design: local analysis and global analysis.
**Local analysis** depends on order of statements in a Python file. Inc-lint completely redoes local analysis for a file any time that file changes. Local analysis performs all checks that depend on the exact form of the parse (ast) trees. As we shall see, the output of local analysis are data that do *not* depend on the order of statements in the parse tree.
**Global analysis** uses the order-independent data produced by local analysis. Global analysis uses a data-driven algorithm: only the *existence* of the data matters, how the data is defined is irrelevant.
This distinction makes an incremental design possible. We don't recompute global checks based on diffs to parse trees. That would be an impossible task. Instead, we recompute global checks based on diffs of order-independent data. This is already an important optimization: program changes that leave order-independent data unchanged will not generate new lint checks.
Contexts and symbol tables
A **context** is a module, class or function. The **contents** of a context are all the (top-level) variables, classes and functions of that context. For example, the contents of a module context are all the top-level variables, classes and functions of that module. The top-level classes and functions of a module are also contexts: contexts may contain **sub-contexts**.
**Symbol tables** are the internal representation of a context. Contexts may contain sub-contexts, so symbol tables can contain **inner symbol tables**. In other words, symbol tables are recursive structures. The exact form of symbol tables does not matter except for one essential requirement—it must be possible to compare two symbol tables easily and to compute their diffs: the list of symbols (including inner contexts) that appear in one symbol table but not the other.
Local analysis produces the **module symbol table** for that file. The module symbol table and all its inner tables describe every symbol defined anywhere in the module. Local analysis is run (non-incrementally) every time a file changes, so the module symbol table is recreated “from scratch” every time a file changes.
Deductions and the data-driven algorithm
The second output of local analysis is a list of **deductions**, the data that drive the data-driven algorithm done by global analysis. Deductions arise from assignment statements and other statements. You can think of deductions as being the data-flow representation of such statements.
Important: deductions *use* the data in symbol tables, and deductions also *set* the data in symbol tables. The data-driven algorithm is inherently an iterative process.
For example, an assignment a = b implies that the set of types that symbol 'a' can have is a superset of the set of types that symbol 'b' can have. One kind of deduction “completes” the type in the symbol table for 'a' when all types in the right hand side (RHS) of any assignment to a have been deduced. This deduction **fires** only when all the right-hand-sides of assignments to 'a' are known. Naturally, 'a' itself may appear in the RHS of an assignment to another variable 'c'. Once the possible types of 'a' are known, it may be possible to deduce the type of 'c'.
Another kind of deduction checks that operands have compatible types. For example, the expression 'x' + 'y' is valid only if some '+' operator may be applied to 'x' and 'y'. This is a non-trivial check: the meaning of '+' may depend on an __add__ function, which in turn depends on the types of 'x' and 'y'. In any case, these kinds of deductions result in various kinds of lint checks.
Global analysis attempts to satisfy deductions using the information in symbol tables. As in new-lint, the data-driven algorithm will start by triggering **base deductions**, deductions that depend on no other deductions. Satisfied deductions may trigger other deductions. When all possible deductions have been made, the remaining unsatisfied deductions generate error messages.
Diffs
Large programs will contain many thousands of deductions. We can not afford to rerun all those deductions every time a change is made to a program. Instead, we must compute the (smallest) set of deductions that must be re-verified.
To compute the new deductions, we need a way of comparing the data contained in the changed source files. Comparing (diffing) parse trees will not work. Instead, inc-lint will compare symbol tables.
Happily, comparing symbol tables is easy. Any two source files that define the same contexts will be equivalent (isomorphic), regardless of how those contexts were defined. The diff algorithm will be recursive, mirroring the recursive structure of symbol tables. We expect the diff algorithm to be simple and fast.
The output of the diff will be a list of created and destroyed symbols for any context. Changing a name (in any particular context) is equivalent to deleting the old name and creating a new name.
Caching and updating
Inc-lint will cache symbol tables and deductions for all files. This allows us to use avoid the local analysis phase for all unchanged files. However, changes made in the local analysis of a file may affect deductions in many *unchanged* files.
The update phase requires that we be able to find the “users” (referrers) of all data that might change during local analysis. Thus, we expect symbol tables and deductions to use doubly (or even multiply) linked lists. It should be straightforward (and fast!) to update these links during the update phase.
Preserving pointers
Diffing symbol tables will result in a list of changes. When applying those changes, we want to update the *old* (cached) copy of each symbol table. This will allow references to unchanged items in the symbol table to remain valid. Of course, references to *changed* items will have to be deleted to avoid “dangling pointers”. By taking care to update links we can use typical Python references (pointers) to symbol table entries and deductions. This avoids having to relink pointers to new symbol tables.
Recap
Here are the essential features of the design:
1. Inc-lint performs local analysis for all changed files in a project. This phase does all lint checks that depend on the order of statements in a Python program. The output of local analysis is a new symbol table for each changed file, and a list of deductions that must be proved for the changed file.
2. A diff phase compares the old (cached) and new versions of the symbol table. This diff will be straightforward and fast because symbol tables will be designed to be easily diffed. As an optimization, we can bypass the diff if the old and new parse trees are “isomorphic”. For example, files that differ only in whitespace or comments will have isomorphic ast trees.
3. An update phase inserts and deletes cached (global) deductions. Changes to a symbol table may result in changes to deductions in arbitrarily many files of the project. Thus, all symbol table entries and deductions will be heavily linked.
4. After all symbol table entries and deductions have been updated, a data-driven algorithm will attempt to satisfy all unsatisfied deductions, that is, deductions that must be proven (again) because of changes to one or more symbol tables. These deductions correspond to the type-checking methods in pylint. At the end of this phase, still-unsatisfied deductions will result in error messages.
Conclusions
This design looks like the simplest thing that could possibly work. Indeed, it looks like the *only* reasonable design. For simplicity's sake, local analysis *must* be done afresh for all changed files. In contrast, global analysis depends only on deductions and symbol tables, neither of which depends on program order. Thus, we can easily imagine that deductions that depend on unchanged symbol table entries (symbols) will not need to be rechecked.
This design consists of largely independent parts or phases. Difficulties with one part will not cause difficulties or complexity elsewhere. This separation into independent phases and parts is the primary strength of the design. Like Leo's core modules, this design should remain valid even if various parts change significantly.
This design seeks to minimizes the risks to the project. I believe it has accomplished this goal. It should be possible to demonstrate the design with relatively simple prototype code.
All comments are welcome.
Edward K. Ream
July 27, 2010
Friday, July 9, 2010
Federal judge rules against gay marriage ban
Joseph Louis Tauro, the federal judge for the United States District Court for the District of Massachusetts, has just ruled unconstitutional the Defense of Marriage Act (DOMA).
There are two rulings involved.
http://www.glad.org/uploads/docs/cases/2010-07-08-gill-district-court-decision.pdf
http://www.mass.gov/Cago/docs/civilrights/DOMA%20Decision.pdf
Here are some excerpts from the first.
Equal Protection of the Laws.
QQQ
[T]he Constitution ‘neither knows nor tolerates classes among citizens.’” It is with this fundamental principle in mind that equal protection jurisprudence takes on “governmental classifications that ‘affect some groups of citizens differently than others.’” And it is because of this “commitment to the law’s neutrality where the rights of persons are at stake” that legislative provisions which arbitrarily or irrationally create discrete classes cannot withstand constitutional scrutiny.
To say that all citizens are entitled to equal protection of the laws is “essentially a direction [to the government] that all persons similarly situated should be treated alike.” But courts remain cognizant of the fact that “the promise that no person shall be denied the equal protection of the laws must coexist with the practical necessity that most legislation classifies for one purpose or another, with resulting disadvantage to various groups or persons.” And so, in an attempt to reconcile the promise of equal protection with the reality of lawmaking, courts apply strict scrutiny, the most searching of constitutional inquiries, only to those laws that burden a fundamental right or target a suspect class.
...
What remains, therefore, is the possibility that Congress sought to deny recognition to same-sex marriages in order to make heterosexual marriage appear more valuable or desirable. But to the extent that this was the goal, Congress has achieved it “only by punishing same-sex couples who exercise their rights under state law.” And this the Constitution does not permit. “For if the constitutional conception of ‘equal protection of the laws’ means anything, it must at the very least mean” that the Constitution will not abide such “a bare congressional desire to harm a politically unpopular group.”
...
This court simply “cannot say that DOMA is directed to any identifiable legitimate
purpose or discrete objective. It is a status-based enactment divorced from any factual context which [this court] could discern a relationship to legitimate [government] interests.” Indeed, Congress undertook this classification for the one purpose that lies entirely outside of legislative bounds, to disadvantage a group of which it disapproves. And such a classification, the Constitution clearly will not permit. In the wake of DOMA, it is only sexual orientation that differentiates a married couple entitled to federal marriage-based benefits from one not so entitled. And this court can conceive of no way in which such a difference might be relevant to the provision of the benefits at issue. By premising eligibility for these benefits on marital status in the first instance, the federal government signals to this court that the relevant distinction to be drawn is between married individuals and unmarried individuals. To further divide the class of married individuals into those with spouses of the same sex and those with spouses of the opposite sex is to create a distinction without meaning. And where, as here, “there is no reason to believe that the disadvantaged class is different, in relevant respects” from a similarly situated class, this court may conclude that it is only irrational prejudice that motivates the challenged classification. As irrational prejudice plainly never constitutes a legitimate government interest, this court must hold that Section 3 of DOMA as applied to Plaintiffs violates the equal protection principles embodied in the Fifth Amendment to the United States Constitution.
QQQ
There are two rulings involved.
http://www.glad.org/uploads/docs/cases/2010-07-08-gill-district-court-decision.pdf
http://www.mass.gov/Cago/docs/civilrights/DOMA%20Decision.pdf
Here are some excerpts from the first.
Equal Protection of the Laws.
QQQ
[T]he Constitution ‘neither knows nor tolerates classes among citizens.’” It is with this fundamental principle in mind that equal protection jurisprudence takes on “governmental classifications that ‘affect some groups of citizens differently than others.’” And it is because of this “commitment to the law’s neutrality where the rights of persons are at stake” that legislative provisions which arbitrarily or irrationally create discrete classes cannot withstand constitutional scrutiny.
To say that all citizens are entitled to equal protection of the laws is “essentially a direction [to the government] that all persons similarly situated should be treated alike.” But courts remain cognizant of the fact that “the promise that no person shall be denied the equal protection of the laws must coexist with the practical necessity that most legislation classifies for one purpose or another, with resulting disadvantage to various groups or persons.” And so, in an attempt to reconcile the promise of equal protection with the reality of lawmaking, courts apply strict scrutiny, the most searching of constitutional inquiries, only to those laws that burden a fundamental right or target a suspect class.
...
What remains, therefore, is the possibility that Congress sought to deny recognition to same-sex marriages in order to make heterosexual marriage appear more valuable or desirable. But to the extent that this was the goal, Congress has achieved it “only by punishing same-sex couples who exercise their rights under state law.” And this the Constitution does not permit. “For if the constitutional conception of ‘equal protection of the laws’ means anything, it must at the very least mean” that the Constitution will not abide such “a bare congressional desire to harm a politically unpopular group.”
...
This court simply “cannot say that DOMA is directed to any identifiable legitimate
purpose or discrete objective. It is a status-based enactment divorced from any factual context which [this court] could discern a relationship to legitimate [government] interests.” Indeed, Congress undertook this classification for the one purpose that lies entirely outside of legislative bounds, to disadvantage a group of which it disapproves. And such a classification, the Constitution clearly will not permit. In the wake of DOMA, it is only sexual orientation that differentiates a married couple entitled to federal marriage-based benefits from one not so entitled. And this court can conceive of no way in which such a difference might be relevant to the provision of the benefits at issue. By premising eligibility for these benefits on marital status in the first instance, the federal government signals to this court that the relevant distinction to be drawn is between married individuals and unmarried individuals. To further divide the class of married individuals into those with spouses of the same sex and those with spouses of the opposite sex is to create a distinction without meaning. And where, as here, “there is no reason to believe that the disadvantaged class is different, in relevant respects” from a similarly situated class, this court may conclude that it is only irrational prejudice that motivates the challenged classification. As irrational prejudice plainly never constitutes a legitimate government interest, this court must hold that Section 3 of DOMA as applied to Plaintiffs violates the equal protection principles embodied in the Fifth Amendment to the United States Constitution.
QQQ
Friday, June 11, 2010
Merchants of Doubt
The following is an extended excerpt from a review in last weeks Science magazine of several books dealing with climate denial. These comments are about, Merchants of Doubt,
http://www.amazon.com/Merchants-Doubt-Handful-Scientists-Obscured/dp/1596916109
...two outstanding historians...have reviewed a sequence of controversies around topics of public concern. In their fascinating and important study, Merchants of Doubt, Naomi Oreskes and Erik M. Conway offer convincing evidence for a surprising and disturbing thesis. Opposition to scientifically well-supported claims about the dangers of cigarette smoking, the difficulties of the Strategic Defense Initiative ("Star Wars"), the effects of acid rain, the existence of the ozone hole, the problems caused by secondhand smoke, and—ultimately—the existence of anthropogenic climate change was used in "the service of political goals and commercial interests" to obstruct the transmission to the American public of important information. Amazingly, the same small cadre of obfuscators figured in all these episodes.
Oreskes (University of California, San Diego) and Conway (NASA's Jet Propulsion Laboratory) painstakingly trace the ways in which a few scientists, with strong ties to particular industries and with conservative political connections, have played a disproportionate role in debates about controversial questions, influencing policy-makers and the general public alike. Typically, these scientists have obtained their stature in fields other than those most pertinent to the debated question. Yet they have been able to cast enough doubt on the consensus views arrived at by scientists within the relevant disciplines to delay, often for a substantial period, widespread public acceptance of consequential hypotheses. They have used their stature in whatever areas of science they originally distinguished themselves to pose as experts who express an "alternative view" to the genuinely expert conclusions that seem problematic to the industries that support them or that threaten the ideological directions in which their political allies hope to lead.
The extraordinary story of deliberate obfuscation that Oreskes and Conway document begins with the delight of the tobacco companies in recruiting Fred Seitz and with Seitz's own connections to "scientists in their twilight years who had turned to fields in which they had no training or experience." It moves through the forging of a network of industrial and political alliances, and the creation of a variety of institutes and think-tanks devoted to challenging various forms of expert consensus, to a brilliant chapter in which the authors analyze the reasons why, as of 2009, a significant percentage of Americans (43%) continued to dissent from the minimal claim that there is "solid evidence the Earth is warming." As Oreskes and Conway conclude:
There are many reasons why the United States has failed to act on global warming, but at least one is the confusion raised by Bill Nierenberg, Fred Seitz, and Fred Singer.
This apparently harsh claim is thoroughly justified through a powerful dissection of the ways in which prominent climate scientists, such as Roger Revelle and Ben Santer, were exploited or viciously attacked in the press.
None of this would have been possible without a web of connections among aging scientists, conservative politicians, and executives of companies (particularly those involved in fossil fuels) with a short-term economic interest in denying the impact of the emission of carbon into the atmosphere. But it also could not have produced the broad public skepticism about climate change without help from the media. As Oreskes and Conway point out, "balanced coverage" has become the norm in the dissemination of scientific information. Pitting adversaries against one another for a few minutes has proven an appealing strategy for television news programs to pursue in attracting and retaining viewers. Nor is the idea of "fair and balanced" coverage, in which the viewer (or reader) is allowed to decide, confined to Fox News. Competing "experts" have become common on almost all American radio and television programs, the Internet is awash in adversarial exchanges among those who claim to know, and newspapers, too, "sell" science by framing it as a sport (preferably as much of a contact sport as possible). Oreskes and Conway identify the ways in which the Washington Times and the Wall Street Journal have nourished the public sense that anthropogenic climate change is a matter of dispute, how they have given disproportionately large space to articles and opinion pieces from the "merchants of doubt," and how they have sometimes censored the attempts of serious climate scientists to set the record straight. Even the New York Times, the American newspaper that takes science reporting most seriously, typically "markets" scientific research by imposing a narrative based on competition among dissenting scientists.
http://www.amazon.com/Merchants-Doubt-Handful-Scientists-Obscured/dp/1596916109
...two outstanding historians...have reviewed a sequence of controversies around topics of public concern. In their fascinating and important study, Merchants of Doubt, Naomi Oreskes and Erik M. Conway offer convincing evidence for a surprising and disturbing thesis. Opposition to scientifically well-supported claims about the dangers of cigarette smoking, the difficulties of the Strategic Defense Initiative ("Star Wars"), the effects of acid rain, the existence of the ozone hole, the problems caused by secondhand smoke, and—ultimately—the existence of anthropogenic climate change was used in "the service of political goals and commercial interests" to obstruct the transmission to the American public of important information. Amazingly, the same small cadre of obfuscators figured in all these episodes.
Oreskes (University of California, San Diego) and Conway (NASA's Jet Propulsion Laboratory) painstakingly trace the ways in which a few scientists, with strong ties to particular industries and with conservative political connections, have played a disproportionate role in debates about controversial questions, influencing policy-makers and the general public alike. Typically, these scientists have obtained their stature in fields other than those most pertinent to the debated question. Yet they have been able to cast enough doubt on the consensus views arrived at by scientists within the relevant disciplines to delay, often for a substantial period, widespread public acceptance of consequential hypotheses. They have used their stature in whatever areas of science they originally distinguished themselves to pose as experts who express an "alternative view" to the genuinely expert conclusions that seem problematic to the industries that support them or that threaten the ideological directions in which their political allies hope to lead.
The extraordinary story of deliberate obfuscation that Oreskes and Conway document begins with the delight of the tobacco companies in recruiting Fred Seitz and with Seitz's own connections to "scientists in their twilight years who had turned to fields in which they had no training or experience." It moves through the forging of a network of industrial and political alliances, and the creation of a variety of institutes and think-tanks devoted to challenging various forms of expert consensus, to a brilliant chapter in which the authors analyze the reasons why, as of 2009, a significant percentage of Americans (43%) continued to dissent from the minimal claim that there is "solid evidence the Earth is warming." As Oreskes and Conway conclude:
There are many reasons why the United States has failed to act on global warming, but at least one is the confusion raised by Bill Nierenberg, Fred Seitz, and Fred Singer.
This apparently harsh claim is thoroughly justified through a powerful dissection of the ways in which prominent climate scientists, such as Roger Revelle and Ben Santer, were exploited or viciously attacked in the press.
None of this would have been possible without a web of connections among aging scientists, conservative politicians, and executives of companies (particularly those involved in fossil fuels) with a short-term economic interest in denying the impact of the emission of carbon into the atmosphere. But it also could not have produced the broad public skepticism about climate change without help from the media. As Oreskes and Conway point out, "balanced coverage" has become the norm in the dissemination of scientific information. Pitting adversaries against one another for a few minutes has proven an appealing strategy for television news programs to pursue in attracting and retaining viewers. Nor is the idea of "fair and balanced" coverage, in which the viewer (or reader) is allowed to decide, confined to Fox News. Competing "experts" have become common on almost all American radio and television programs, the Internet is awash in adversarial exchanges among those who claim to know, and newspapers, too, "sell" science by framing it as a sport (preferably as much of a contact sport as possible). Oreskes and Conway identify the ways in which the Washington Times and the Wall Street Journal have nourished the public sense that anthropogenic climate change is a matter of dispute, how they have given disproportionately large space to articles and opinion pieces from the "merchants of doubt," and how they have sometimes censored the attempts of serious climate scientists to set the record straight. Even the New York Times, the American newspaper that takes science reporting most seriously, typically "markets" scientific research by imposing a narrative based on competition among dissenting scientists.
Tuesday, June 8, 2010
Should this blog be on planet python?
A recent comment objected that a previous post wasn't on topic for Planet Python. That is a perfectly reasonable point of view. The coming posts will likely strike some as even more off topic. It's fine with me if the Planet Python people want to de-syndicate this blog. That's for them to decide.
Saying goodbye to mother
Mother died last Friday from bone cancer at the age of 86. She had been diagnosed only 10 days previously with stage IV cancer, so treatment was out of the question. Our extended family came together to keep vigil. It was a good, if not enjoyable experience.
Mother lead a full life, and had many ardent admirers. She was active until the last three weeks of life, and had all her faculties until the last three days. So there is little to regret in her life, and I feel strangely calm about this whole process. Perhaps true grief will strike later unexpectedly, but I think not.
At such times, it is natural to take stock of one's life, and I intend to do that here now, perhaps for an extended time. I intend, for the first time, to lay out what I believe to be true, based on overwhelming evidence. It is a daunting prospect: I know from experience and training that writing about complex subjects is no easy task. But now, at this time of my life, it is calling to me.
The thousands of posts about Leo on the leo-editor site (and previously on SourceForge) will serve as a template or model for the writing here. That is, the writing will be calm, with the intention of playing with ideas. The emphasis will be on problem solving.
This writing may upset some. That is not my intention, but it may happen. With that in mind, I will insist on the following ground rules. All responses to this blog must be civil, respectful, and calm. Those who write abusive posts will be immediately banned. Name calling, ad hominem remarks and ranting will not be tolerated. Violators will be immediately banned without further comment.
It is a symptom of the present state of society that these rules need to be stated prominently. Considerable personal experience shows that they are necessary.
Mother lead a full life, and had many ardent admirers. She was active until the last three weeks of life, and had all her faculties until the last three days. So there is little to regret in her life, and I feel strangely calm about this whole process. Perhaps true grief will strike later unexpectedly, but I think not.
At such times, it is natural to take stock of one's life, and I intend to do that here now, perhaps for an extended time. I intend, for the first time, to lay out what I believe to be true, based on overwhelming evidence. It is a daunting prospect: I know from experience and training that writing about complex subjects is no easy task. But now, at this time of my life, it is calling to me.
The thousands of posts about Leo on the leo-editor site (and previously on SourceForge) will serve as a template or model for the writing here. That is, the writing will be calm, with the intention of playing with ideas. The emphasis will be on problem solving.
This writing may upset some. That is not my intention, but it may happen. With that in mind, I will insist on the following ground rules. All responses to this blog must be civil, respectful, and calm. Those who write abusive posts will be immediately banned. Name calling, ad hominem remarks and ranting will not be tolerated. Violators will be immediately banned without further comment.
It is a symptom of the present state of society that these rules need to be stated prominently. Considerable personal experience shows that they are necessary.
Monday, April 12, 2010
World population to 2300
I highly recommend the following UN report on population. It makes fascinating reading.
I found the essays beginning on page 89 particularly interesting, but it's all good.
Unless you are an expert demographer, your world view will certainly change as the result of reading these papers.
I found the essays beginning on page 89 particularly interesting, but it's all good.
Unless you are an expert demographer, your world view will certainly change as the result of reading these papers.
Tuesday, February 23, 2010
Leo 4.7 final released
Leo 4.7 final is now available here.
Leo 4.7 final fixes all known bugs in Leo.
Leo is a text editor, data organizer, project manager and much more. See:
http://webpages.charter.net/edreamleo/intro.html
The highlights of Leo 4.7:
--------------------------
- Leo now uses the simplest possible internal data model.
This is the so-called "one-node" world.
- Leo supports Python 3.x.
- Leo requires Python 2.6 or above.
- Several important improvements in file handling.
- Leo converts @file nodes to @thin nodes automatically.
- Leo creates a 'Recovered Nodes' node to hold data that
otherwise might be lost due to clone conflicts.
- @auto-rst now works much more reliably reliably.
- Leo no longer supports @noref trees. Such trees are not
reliable in cooperative environments.
- A new Windows installer.
- Many other features, including new command line options and new plugins.
- Dozens of bug fixes.
Links:
------
Leo: http://webpages.charter.net/edreamleo/front.html
Forum: http://groups.google.com/group/leo-editor
Download: http://sourceforge.net/project/showfiles.php?group_id=3458
Bzr: http://code.launchpad.net/leo-editor/
Quotes: http://webpages.charter.net/edreamleo/testimonials.html
Leo 4.7 final fixes all known bugs in Leo.
Leo is a text editor, data organizer, project manager and much more. See:
http://webpages.charter.net/edreamleo/intro.html
The highlights of Leo 4.7:
--------------------------
- Leo now uses the simplest possible internal data model.
This is the so-called "one-node" world.
- Leo supports Python 3.x.
- Leo requires Python 2.6 or above.
- Several important improvements in file handling.
- Leo converts @file nodes to @thin nodes automatically.
- Leo creates a 'Recovered Nodes' node to hold data that
otherwise might be lost due to clone conflicts.
- @auto-rst now works much more reliably reliably.
- Leo no longer supports @noref trees. Such trees are not
reliable in cooperative environments.
- A new Windows installer.
- Many other features, including new command line options and new plugins.
- Dozens of bug fixes.
Links:
------
Leo: http://webpages.charter.net/edreamleo/front.html
Forum: http://groups.google.com/group/leo-editor
Download: http://sourceforge.net/project/showfiles.php?group_id=3458
Bzr: http://code.launchpad.net/leo-editor/
Quotes: http://webpages.charter.net/edreamleo/testimonials.html
Monday, February 22, 2010
Collaboration between humans and @test
Here is an exchange from another discussion of this topic
http://groups.google.com/group/leo-editor/browse_thread/thread/47dfb2e1767d2cda
> So, looking from this angle, you came to the heart of programming.
> Unit tests is a reference/control info to incrementally build-up
> a program - 'configure' a programmable material.
I agree. We already have these really cool controllers. They are called human beings :-)
Human controllers have all kinds of ideas and desires, but we aren't real good at maintaining attention, and handling myriad details. So we need help. Unit tests are that help. They are as flexible as their human controllers make them. The collaboration of humans and @test is superb: each can do what it does best.
EKR
http://groups.google.com/group/leo-editor/browse_thread/thread/47dfb2e1767d2cda
> So, looking from this angle, you came to the heart of programming.
> Unit tests is a reference/control info to incrementally build-up
> a program - 'configure' a programmable material.
I agree. We already have these really cool controllers. They are called human beings :-)
Human controllers have all kinds of ideas and desires, but we aren't real good at maintaining attention, and handling myriad details. So we need help. Unit tests are that help. They are as flexible as their human controllers make them. The collaboration of humans and @test is superb: each can do what it does best.
EKR
The stupendous Aha: the haiku version
We have been underestimating the potential of unit tests because of their name.
In fact, unit tests are general-purpose helper scripts that can be run at any time by the UnitTest test runner.
The cruft of having to create subclasses of UnitTest.TestCase usually obscure this simple fact. Leo's @test nodes do away with the blah-blah-blah of unit testing, so this Aha is much easier to see and exploit in Leo.
That's all there is to it.
Edward
In fact, unit tests are general-purpose helper scripts that can be run at any time by the UnitTest test runner.
The cruft of having to create subclasses of UnitTest.TestCase usually obscure this simple fact. Leo's @test nodes do away with the blah-blah-blah of unit testing, so this Aha is much easier to see and exploit in Leo.
That's all there is to it.
Edward
Second try: stupdendous Aha re unit tests
I have gotten zero responses to my 42 post:
I should have remembered that nobody reads long posts. So here is the Aha in a nutshell:
Unit tests are not just for testing! They are the master tool for programming, design, testing, refactoring, studying code, or anything else.
Think of a unit test as the root of a tree. The tree represents any task (including designs).
The unit test formalizes, automates and protects the task or design.
Expanding the notion of unit tests this way is a stupendous Aha. It has totally altered how I approach my work.
For details, read the long post. Print it out, including the lengthy "reply". Study it. Respond. Please.
Edward
P.S. I expect three possible responses to this post, and the longer post:
1. Yeah, I knew that. Welcome to the club of hugely effective programmers.
2. Wow! I didn't know that. This is going to change my life.
3. Huh? I have no idea what you are talking about. Can you do a better job of explaining your ideas?
EKR
I should have remembered that nobody reads long posts. So here is the Aha in a nutshell:
Unit tests are not just for testing! They are the master tool for programming, design, testing, refactoring, studying code, or anything else.
Think of a unit test as the root of a tree. The tree represents any task (including designs).
The unit test formalizes, automates and protects the task or design.
Expanding the notion of unit tests this way is a stupendous Aha. It has totally altered how I approach my work.
For details, read the long post. Print it out, including the lengthy "reply". Study it. Respond. Please.
Edward
P.S. I expect three possible responses to this post, and the longer post:
1. Yeah, I knew that. Welcome to the club of hugely effective programmers.
2. Wow! I didn't know that. This is going to change my life.
3. Huh? I have no idea what you are talking about. Can you do a better job of explaining your ideas?
EKR
Friday, February 19, 2010
Are unit tests the 42 of programming and design?
This is a repost of an entry on my Leo and Pylint blog.
I'm beginning to believe that the answer to this question is 'yes'. (It's certainly not '42' :-)
And I believe I can prove it.
This morning I attempted a rewrite of the complex code that imports Python files into Leo. As I went, it was natural to create unit test as the complexity of the problem became more apparent.
About noon I had to stop for awhile while several (old!) unit tests were failing. During my break, I considered scrapping the approach I was using. So was the effort a failure? Hardly. Amidst the rubble the unit tests stood unscathed.
This isn't an isolated example. Comically, failures create a situation in which unit tests come first the next time around :-)
Now lets consider my recent complaints about packaging. I implied at the end of a recent post that we lacked tools of sufficient power to help us with packaging. Could unit tests be those tools? Yes, they could be.
We need only a slight change in point of view. The typical way of viewing unit tests is to consider them a (permanent!) check of the correctness of a piece of code. This so close to the 42.
Let us instead view unit tests as tools. As tools, they have the following characteristics:
1. Unit tests codify and make explicit desired or expected behavior.
2. Unit tests are a way of "maintaining attention" on a problem. Unit tests don't forget, and they are permanent.
3. Unit tests do whatever we want, and they do it automatically.
What then, is the "effective power" of a unit test? Well, there is no limit to what a unit test can do. A unit test can do whatever we have the wit to require it to do. Unit tests are the master tool for any programmer or designer.
Put it another way, unit tests allow us to focus our attention briefly on a problem, and then, by the nature of a unit test, that attention becomes permanent.
Not convinced? Let's consider the question of packaging.
Suppose I think of unit tests as design tools for packaging issues. What exactly, could not be codified as a unit test? Certainly api's can be codified. Certainly use cases could be codified. Certainly interactions with other packages could be codified.
And so on. As soon as a new concern arises, it becomes possible to create a unit test that addresses that concern! Do you see? Unit test are limited only by our intelligence, commitment and desire.
It seems so clear now. If we wanted a tool of unlimited power, it would have to be limited only by our limitations. But that's exactly what is so about unit tests!
Edward
P.S. Seen in this light, design problems are failures to create proper unit tests. Or rather, design problems arise from a failure to see the need for one or more unit tests.
EKR
I'm beginning to believe that the answer to this question is 'yes'. (It's certainly not '42' :-)
And I believe I can prove it.
This morning I attempted a rewrite of the complex code that imports Python files into Leo. As I went, it was natural to create unit test as the complexity of the problem became more apparent.
About noon I had to stop for awhile while several (old!) unit tests were failing. During my break, I considered scrapping the approach I was using. So was the effort a failure? Hardly. Amidst the rubble the unit tests stood unscathed.
This isn't an isolated example. Comically, failures create a situation in which unit tests come first the next time around :-)
Now lets consider my recent complaints about packaging. I implied at the end of a recent post that we lacked tools of sufficient power to help us with packaging. Could unit tests be those tools? Yes, they could be.
We need only a slight change in point of view. The typical way of viewing unit tests is to consider them a (permanent!) check of the correctness of a piece of code. This so close to the 42.
Let us instead view unit tests as tools. As tools, they have the following characteristics:
1. Unit tests codify and make explicit desired or expected behavior.
2. Unit tests are a way of "maintaining attention" on a problem. Unit tests don't forget, and they are permanent.
3. Unit tests do whatever we want, and they do it automatically.
What then, is the "effective power" of a unit test? Well, there is no limit to what a unit test can do. A unit test can do whatever we have the wit to require it to do. Unit tests are the master tool for any programmer or designer.
Put it another way, unit tests allow us to focus our attention briefly on a problem, and then, by the nature of a unit test, that attention becomes permanent.
Not convinced? Let's consider the question of packaging.
Suppose I think of unit tests as design tools for packaging issues. What exactly, could not be codified as a unit test? Certainly api's can be codified. Certainly use cases could be codified. Certainly interactions with other packages could be codified.
And so on. As soon as a new concern arises, it becomes possible to create a unit test that addresses that concern! Do you see? Unit test are limited only by our intelligence, commitment and desire.
It seems so clear now. If we wanted a tool of unlimited power, it would have to be limited only by our limitations. But that's exactly what is so about unit tests!
Edward
P.S. Seen in this light, design problems are failures to create proper unit tests. Or rather, design problems arise from a failure to see the need for one or more unit tests.
EKR
Thursday, February 18, 2010
Leo and Pylint: a new google group
For the last several days I have been studying pylint and 2to3 intensely. Indeed, 2to3 might be converted into a program, call it pep8.py, that would *fix* deviations from Python's style guidelines, not just warn about stylistic problems.
A new Google Group, called Leo and Pylint, is, in effect, an engineering notebook for my studies. Those who are interested in studying complex programs might be interested in this group. In contrast to the tedious reading of source code, this notebook shows how to make study an active, exciting process.
A new Google Group, called Leo and Pylint, is, in effect, an engineering notebook for my studies. Those who are interested in studying complex programs might be interested in this group. In contrast to the tedious reading of source code, this notebook shows how to make study an active, exciting process.
Saturday, February 13, 2010
Tracking down a pylint bug with a new debugger
The latest version of pylint (0.19.0, with astng 0.19.3, common 0.46.0) produces an unbounded recursion on all of Leo's files.
Pylint is complex code, and several hours of tracing with pdb got me no closer to understanding what was going on. At last I had a better idea: to create a debugger that would warn me when the stack got too big. It is a subclass of pdb, but it overrides two methods in bdb.py.
The original version of this blog listed the code, and the traceback, but the listings got garbled. You can now see them both in the bug report page at
https://bugs.launchpad.net/pylint/+bug/456870
I still don't understand the code, but now I know where the unbounded recursion is.
Edward
Pylint is complex code, and several hours of tracing with pdb got me no closer to understanding what was going on. At last I had a better idea: to create a debugger that would warn me when the stack got too big. It is a subclass of pdb, but it overrides two methods in bdb.py.
The original version of this blog listed the code, and the traceback, but the listings got garbled. You can now see them both in the bug report page at
https://bugs.launchpad.net/pylint/+bug/456870
I still don't understand the code, but now I know where the unbounded recursion is.
Edward
Subscribe to:
Posts (Atom)