Notes on Python

On the subject of C program indentation: In My Egotistical Opinion, most people's C programs should be indented six feet downward and covered with dirt.

— Blair P. Houghton


Around the beginning of April, 2001, I finally decided to do something about the feeling I'd had for some time that I'd like to learn a few new programming languages. I started by looking at Python. These are my notes on the process.

Non-religious comments are welcome. Please don't send me advocacy.

I chose Python as a language to try (over a few other choices like Objective Caml or Common Lisp) mostly because it's less of a departure from the languages that I'm already comfortable with. In particular, it's really quite a bit like Perl. I picked this time to start since I had an idea for an initial program to try writing in Python, a program that I probably would normally write in Perl. I needed a program to help me manage releases of the various software package that I maintain, something to put a new version on an ftp site, update a series of web pages, generate a change log in a nice form for the web, and a few other similar things.

I started by reading the Python tutorial off, pretty much straight through. I did keep an interactive Python process running while I did, but I didn't type in many of the examples; the results were explained quite well in the tutorial, and I generally don't need to do things myself to understand them. The tutorial is exceptionally well-written; after finishing reading it straight through (which took me an evening) and skimming the library reference, I felt I had a pretty good grasp on the language.

Things that immediately jumped out at me that I liked a lot:

There were a few things that I immediately didn't like, after having just read the tutorial:

There were also a couple of things that I immediately missed from other languages:


Over the next few days, I started reading the language manual straight through, as well as poking around more parts of the language reference and writing some code. I started with a function to find the RCS keywords and version string in a file and from that extract the version and the last modified date (things that would need to be modified on the web page for that program). I really had a lot of fun with this.

The Python standard documentation is excellent. I mean truly superb. I can't really compare it to Perl (the other language that has truly excellent standard documentation), since I know Perl so well that I can't evaluate its documentation from the perspective of the beginner, but Python's tutorial eased me into the language beautifully and the language manual is well-written, understandable, and enjoyable to read. The library reference is well-organized and internally consistent, and I never had much trouble finding things. And they're available in info format as well as web pages, which is a major advantage for me; info is easier for me to read straight through, and web pages are easier for me to browse.

The language proved rather fun to write. Regex handling is a bit clunky since it's not a language built-in, but I was expecting that and I don't really mind it. The syntax is fun, and XEmacs python-mode does an excellent job handling highlighting and indentation. I was able to put together that little function and wrap a test around it fairly quickly (in a couple of hours while on the train, taking a lot of breaks to absorb the language reference manual or poke around in the library reference for the best way of doing something).

That's where I am at the moment. More as I find time to do more....


I've finished my first Python program, after having gotten distracted by a variety of other things. It wasn't the program I originally started writing, since the problem of releasing a new version of a software package ended up being more complicated than I expected. (In particular, generating the documentation looks like it's going to be tricky.) I did get the code to extract version numbers and dates written, though, and then for another project (automatically generating man pages from scripts with embedded POD when installing them into our site-wide software installation) I needed that same code. So I wrote that program in Python and tested it and it works fine.

The lack of a way to safely execute a program without going through the shell is really bothering me. It was also the source of one of the three bugs in the first pass at my first program; I passed a multiword string to pod2man and forgot to protect it from the shell. What I'm currently doing is still fragile in the presence of single quotes in the string, which is another reason why I much prefer Perl's safe system() function. I feel like I must be missing something; something that fundamental couldn't possibly fail to be present in a scripting language.

A second bug in that program highlights another significant difference from Perl that I'm finding a little strange to deal with, namely the lack of equivalence between numbers and strings. My program had a dictionary of section titles, keyed by the section numbers, and I was using the plain number as the dictionary key. When I tried to look up a title in the dictionary, however, I used as the key a string taken from the end of the output filename, and 1 didn't match "1". It took me a while to track that down. (Admittedly, the problem was really laziness on my part; given the existence of such section numbers as "1m" and "3f", I should have used strings as the dictionary keys in the first place.)

The third bug, for the record, was attempting to use a Perl-like construct to read a file (while line = file.readline():). I see that Python 2.1 has the solution I really want in the form of xreadlines, but in the meantime that was easy enough to recode into a test and a break in the middle of the loop.

The lack of a standard documentation format like Perl's POD is bothering me and I'm not sure what to do about it. I want to put the documentation (preferrably in POD, but I'm willing to learn something else that's reasonably simple) into the same file as the script so that it gets updated when the script does and doesn't get lost in the directory. This apparently is just an unsolved problem, unless I'm missing some great link to an embedded documentation technique (and I quite possibly am). Current best idea is to put a long triple-quoted string at the end of my script containing POD. Ugh.

I took a brief look at the standard getopt library (although I didn't end up using it), and was a little disappointed; one of the features that I really liked about Perl's Getopt::Long was its ability to just stuff either the arguments to options or boolean values into variables directly, without needing something like the long case statement that's a standard feature of main() in many C programs. Looks like Python's getopt is much closer to C's, and requires something quite a bit like that case statement.

Oh, and while the documentation is still excellent, I've started noticing a gap in it when it comes to the core language (not the standard library; the documentation there is great). The language reference manual is an excellent reference manual, complete with clear syntax descriptions, but is a little much if one just wants to figure out how to do something. I wasn't sure of the syntax of the while statement, and the language reference was a little heavier than was helpful. I find myself returning to the tutorial to find things like this, and it has about the right level of explanation, but the problem with that is that the tutorial is laid out as a tutorial and isn't as easy to use as a reference. (For example, the while statement isn't listed in the table of contents, because it was introduced in an earlier section with a more general title.)

I need to get the info pages installed on my desktop machine so that I can look things up in the index easily; right now, I'm still using the documentation on the web.


I've unfortunately not had very much time to work on this, as one can tell from the date.

Aahz pointed out a way to execute a program without going through the shell, namely os.spawnv(). That works, although the documentation is extremely poor. (Even in Python 2.1, it refers me to the Visual C++ Runtime Library documentation for information on what spawnv does, which is of course absurd.) At least the magic constants that it needs are relatively intuitive. Unfortunately, spawnv doesn't search the user's PATH for a command, and there's nothing like spawnvp. Sigh.

There's really no excuse for this being quite this hard. Executing a command without going through the shell is an extremely basic function that should be easily available in any scripting language without jumping through these sorts of hoops.

But this at least gave me a bit of experience in writing some more Python (a function to search the PATH to find a command), and the syntax is still very nice and convenient. I'm bouncing all over the tutorial and library reference to remember how to do things, but usually my first guesses are right.

I see that Debian doesn't have the info pages, only the HTML documentation. That's rather annoying, but workable. I now have the HTML documentation for Python 2.1 on local disk on my laptop.


I've now written a couple of real Python programs (in addition to the simple little thing to generate man pages by running pod2man). You can find them (cvs2xhtml and cl2xhtml) with my web tools. They're not particularly pretty, but they work, and I now have some more experience writing simple procedural Python code. I still haven't done anything interesting with objects. Comments on the code are welcome. Don't expect too much.

There are a few other documentation methods for Python, but they seem primarily aimed at documenting modules and objects rather than documenting scripts. Pydoc in particular looks like it would be nice for API documentation but doesn't really do anything for end-user program documentation. Accordingly, I've given up for the time being on finding a more "native" approach and am just documenting my Python programs the way that I document most things, by writing embedded POD. I've yet to find a better documentation method; everything else seems to either be far too complicated and author-unfriendly to really write directly in (like DocBook) or can't generate Unix man pages, which I consider to be a requirement.

The Python documentation remains excellent, if scattered. I've sometimes spent a lot of time searching through the documentation to find the right module to do something, and questions of basic syntax are fairly hard to resolve (the tutorial is readable but not organized as a reference, and the language reference is too dense to provide a quick answer).


My first major Python application is complete and working (although I'm not yet using it as much as I want to be using it). That's Tasker, a web-based to-do list manager written as a Python CGI script that calls a Python module.

I've now dealt with the Python module building tools, which are quite nice (nicer in some ways than Perl's Makefile.PL system with some more built-in functionality, although less mature in a few ways). Python's handling of the local module library is clearly less mature than Perl, and Debian's Python packages don't handle locally installed modules nearly as well as they should, but overall it was a rather positive experience. Built-in support for generating RPMs is very interesting, since eventually I'd like to provide .deb and RPM packages for all my software.

I played with some OO design for this application and ended up being fairly happy with how Python handled things. I'm not very happy with my object layout, but that's my problem, not Python's. The object system definitely feels far smoother and more comfortable to me than Perl's, although I can still write OO code faster in Perl because I'm more familiar with it. There's none of the $self hash nonsense for instance variables, though, which is quite nice.

The CGI modules for Python, and in particular the cgitb module for displaying exceptions nicely in the browser while debugging CGI applications, are absolutely excellent. I was highly impressed, and other than some confusion about the best way to retrieve POST data that was resolved after reading the documentation more closely, I found those modules very easy to work. The cgitb module is a beautiful, beautiful thing and by itself makes me want to use Python for all future CGI programming.

I still get caught all the time by the lack of interchangability of strings and numbers and I feel like I'm casting things all the time. I appreciate some of the benefits of stronger typing, but this one seems to get in my way more often than it helps.

I'm also still really annoyed at the lack of good documentation for the parts of the language that aren't considered part of the library. If I want documentation on how print works, I have only the tutorial and the detailed language standard, the former of which is not organized for reference and the latter of which is far too hard to understand. This is a gaping hole in the documentation that I really wish someone would fix. Thankfully, it only affects a small handful of things, like control flow constructs and the print statement, so I don't hit this very often, but whenever I do it's extremely frustrating.

I've given up on documentation for scripts and am just including a large POD section at the end of the script, since this seems to be the only option that will generate good man pages and good web pages. I'm not sure what to do about documentation for the module; there seem to be a variety of different proposals but nothing that I can really just use.

Oh, and one last point on documentation: the distutils documentation needs some work. Thankfully I found some really good additional documentation on the PyPI web site that explained a lot more about how to write a script.


Six years later, I still find Python an interesting language, but I never got sufficiently absorbed by it for it to be part of my standard toolkit.

I've subsequently gotten some additional experience with extending Python through incorporating an extension written by Thomas Kula into the remctl distribution. The C interface is relatively nice and more comfortable than Perl, particularly since it doesn't involve a pseudo-C that is run through a preprocessor. It's a bit more comfortable to read and write.

Python's installation facilities, on the other hand, are poor. The distutils equivalent of Perl's ExtUtils::MakeMaker is considerably worse, despite ExtUtils::MakeMaker being old and crufty and strange. (I haven't compared it with Module::Build.) The interface is vaguely similar, but I had to apply all sorts of hacks to get the Python extension to build properly inside a Debian packaging framework, and integrating it with a larger package requires doing Autoconf substitution on a ton of different files. It was somewhat easier to avoid embedding RPATH into the module, but I'd still much rather work with Perl's facilities.

Similarly, while the test suite code has some interesting features (I'm using the core unittest framework), it's clearly inferior to Perl's Test::More support library and TAP protocol. I'm, of course, a known fan of Perl's TAP testing protocol (I even wrote my own implementation in C), but that's because it's well-designed, full-featured, and very useful. The Python unittest framework, by comparison, is awkward to use, has significantly inferior reporting capabilities, makes it harder to understand what test failed and isolate the failure, and requires a lot of digging around to understand how it works. I do like the use of decorators to handle skipping tests, and there are some interesting OO ideas around test setup and teardown, but the whole thing is more awkward than it should be.

I'm not entirely sure why Python has never caught on with me as a language to use on a regular basis. Certainly, one of the things that always bugs me is the lack of good integrated documentation support like POD (although apparently reStructured Text is slowly becoming that), but that's not the whole story. I suspect a lot is just that I'm very familiar with Perl and with its standard and supporting library, and it takes me longer to do anything in Python. But the language just feels slightly more awkward, and I never have gotten comfortable with the way that it uses exceptions for all error reporting.

I may get lured back into it again at some point, though, since Python 3.0 seems to have some very interesting features and it remains popular with people who know lots of programming languages. I want to give it another serious look with a few more test projects at some point in the future.

Last modified and spun 2014-08-09