Tuesday, December 15, 2009

7-Python Classes


Subsections

  • 9.1 A Word About Terminology
  • 9.2 Python Scopes and Name Spaces
  • 9.3 A First Look at Classes
    • 9.3.1 Class Definition Syntax
    • 9.3.2 Class Objects
    • 9.3.3 Instance Objects
    • 9.3.4 Method Objects
  • 9.4 Random Remarks
  • 9.5 Inheritance
    • 9.5.1 Multiple Inheritance
  • 9.6 Private Variables
  • 9.7 Odds and Ends
  • 9.8 Exceptions Are Classes Too
  • 9.9 Iterators
  • 9.10 Generators
  • 9.11 Generator Expressions


9. Classes

Python's class mechanism adds classes to the language with a minimum of new syntax and semantics. It is a mixture of the class mechanisms found in C++ and Modula-3. As is true for modules, classes in Python do not put an absolute barrier between definition and user, but rather rely on the politeness of the user not to ``break into the definition.'' The most important features of classes are retained with full power, however: the class inheritance mechanism allows multiple base classes, a derived class can override any methods of its base class or classes, and a method can call the method of a base class with the same name. Objects can contain an arbitrary amount of private data.

In C++ terminology, all class members (including the data members) are public, and all member functions are virtual. There are no special constructors or destructors. As in Modula-3, there are no shorthands for referencing the object's members from its methods: the method function is declared with an explicit first argument representing the object, which is provided implicitly by the call. As in Smalltalk, classes themselves are objects, albeit in the wider sense of the word: in Python, all data types are objects. This provides semantics for importing and renaming. Unlike C++ and Modula-3, built-in types can be used as base classes for extension by the user. Also, like in C++ but unlike in Modula-3, most built-in operators with special syntax (arithmetic operators, subscripting etc.) can be redefined for class instances.


9.1 A Word About Terminology

Lacking universally accepted terminology to talk about classes, I will make occasional use of Smalltalk and C++ terms. (I would use Modula-3 terms, since its object-oriented semantics are closer to those of Python than C++, but I expect that few readers have heard of it.)

Objects have individuality, and multiple names (in multiple scopes) can be bound to the same object. This is known as aliasing in other languages. This is usually not appreciated on a first glance at Python, and can be safely ignored when dealing with immutable basic types (numbers, strings, tuples). However, aliasing has an (intended!) effect on the semantics of Python code involving mutable objects such as lists, dictionaries, and most types representing entities outside the program (files, windows, etc.). This is usually used to the benefit of the program, since aliases behave like pointers in some respects. For example, passing an object is cheap since only a pointer is passed by the implementation; and if a function modifies an object passed as an argument, the caller will see the change -- this eliminates the need for two different argument passing mechanisms as in Pascal.


9.2 Python Scopes and Name Spaces

Before introducing classes, I first have to tell you something about Python's scope rules. Class definitions play some neat tricks with namespaces, and you need to know how scopes and namespaces work to fully understand what's going on. Incidentally, knowledge about this subject is useful for any advanced Python programmer.

Let's begin with some definitions.

A namespace is a mapping from names to objects. Most namespaces are currently implemented as Python dictionaries, but that's normally not noticeable in any way (except for performance), and it may change in the future. Examples of namespaces are: the set of built-in names (functions such as abs(), and built-in exception names); the global names in a module; and the local names in a function invocation. In a sense the set of attributes of an object also form a namespace. The important thing to know about namespaces is that there is absolutely no relation between names in different namespaces; for instance, two different modules may both define a function ``maximize'' without confusion -- users of the modules must prefix it with the module name.

By the way, I use the word attribute for any name following a dot -- for example, in the expression z.real, real is an attribute of the object z. Strictly speaking, references to names in modules are attribute references: in the expression modname.funcname, modname is a module object and funcname is an attribute of it. In this case there happens to be a straightforward mapping between the module's attributes and the global names defined in the module: they share the same namespace! 9.1

Attributes may be read-only or writable. In the latter case, assignment to attributes is possible. Module attributes are writable: you can write "modname.the_answer = 42". Writable attributes may also be deleted with the del statement. For example, "del modname.the_answer" will remove the attribute the_answer from the object named by modname.

Name spaces are created at different moments and have different lifetimes. The namespace containing the built-in names is created when the Python interpreter starts up, and is never deleted. The global namespace for a module is created when the module definition is read in; normally, module namespaces also last until the interpreter quits. The statements executed by the top-level invocation of the interpreter, either read from a script file or interactively, are considered part of a module called __main__, so they have their own global namespace. (The built-in names actually also live in a module; this is called __builtin__.)

The local namespace for a function is created when the function is called, and deleted when the function returns or raises an exception that is not handled within the function. (Actually, forgetting would be a better way to describe what actually happens.) Of course, recursive invocations each have their own local namespace.

A scope is a textual region of a Python program where a namespace is directly accessible. ``Directly accessible'' here means that an unqualified reference to a name attempts to find the name in the namespace.

Although scopes are determined statically, they are used dynamically. At any time during execution, there are at least three nested scopes whose namespaces are directly accessible: the innermost scope, which is searched first, contains the local names; the namespaces of any enclosing functions, which are searched starting with the nearest enclosing scope; the middle scope, searched next, contains the current module's global names; and the outermost scope (searched last) is the namespace containing built-in names.

If a name is declared global, then all references and assignments go directly to the middle scope containing the module's global names. Otherwise, all variables found outside of the innermost scope are read-only (an attempt to write to such a variable will simply create a new local variable in the innermost scope, leaving the identically named outer variable unchanged).

Usually, the local scope references the local names of the (textually) current function. Outside functions, the local scope references the same namespace as the global scope: the module's namespace. Class definitions place yet another namespace in the local scope.

It is important to realize that scopes are determined textually: the global scope of a function defined in a module is that module's namespace, no matter from where or by what alias the function is called. On the other hand, the actual search for names is done dynamically, at run time -- however, the language definition is evolving towards static name resolution, at ``compile'' time, so don't rely on dynamic name resolution! (In fact, local variables are already determined statically.)

A special quirk of Python is that assignments always go into the innermost scope. Assignments do not copy data -- they just bind names to objects. The same is true for deletions: the statement "del x" removes the binding of x from the namespace referenced by the local scope. In fact, all operations that introduce new names use the local scope: in particular, import statements and function definitions bind the module or function name in the local scope. (The global statement can be used to indicate that particular variables live in the global scope.)


9.3 A First Look at Classes

Classes introduce a little bit of new syntax, three new object types, and some new semantics.


9.3.1 Class Definition Syntax

The simplest form of class definition looks like this:

 
class ClassName:
    
    .
    .
    .
    

Class definitions, like function definitions (def statements) must be executed before they have any effect. (You could conceivably place a class definition in a branch of an if statement, or inside a function.)

In practice, the statements inside a class definition will usually be function definitions, but other statements are allowed, and sometimes useful -- we'll come back to this later. The function definitions inside a class normally have a peculiar form of argument list, dictated by the calling conventions for methods -- again, this is explained later.

When a class definition is entered, a new namespace is created, and used as the local scope -- thus, all assignments to local variables go into this new namespace. In particular, function definitions bind the name of the new function here.

When a class definition is left normally (via the end), a class object is created. This is basically a wrapper around the contents of the namespace created by the class definition; we'll learn more about class objects in the next section. The original local scope (the one in effect just before the class definition was entered) is reinstated, and the class object is bound here to the class name given in the class definition header (ClassName in the example), that is class name act as an object.


9.3.2 Class Objects

Class objects support two kinds of operations: attribute references and instantiation.

Attribute references use the standard syntax used for all attribute references in Python: obj.name. Valid attribute names are all the names that were in the class's namespace when the class object was created. So, if the class definition looked like this:

 
class MyClass:
    "A simple example class"  <--- Comment
    i = 12345
    def f(self):
        return 'hello world'

then MyClass.i and MyClass.f are valid attribute references, returning an integer and a function object, respectively. Class attributes can also be assigned to, so you can change the value of MyClass.i by assignment. __doc__ is also a valid attribute, returning the docstring belonging to the class: "A simple example class".

Class instantiation uses function notation. Just pretend that the class object is a parameterless function that returns a new instance of the class. For example (assuming the above class):

 
x = MyClass() <--- class instantiation 

creates a new instance of the class and assigns this object to the local variable x.

The instantiation operation (``calling'' a class object) creates an empty object. Many classes like to create objects with instances customized to a specific initial state. Therefore a class may define a special method named __init__(), like this:

 
    def __init__(self):         <---constructor
        self.data = []

When a class defines an __init__() method, class instantiation automatically invokes __init__() for the newly-created class instance. So in this example, a new, initialized instance can be obtained by:

 
x = MyClass()  <--- creating object of class ‘MyClass’

Of course, the __init__() method may have arguments for greater flexibility. In that case, arguments given to the class instantiation operator are passed on to __init__(). For example,

 
>>> class Complex:
...     def __init__(self, realpart, imagpart): <---constructor
...         self.r = realpart
...         self.i = imagpart
... 
>>> x = Complex(3.0, -4.5)
>>> x.r, x.i
(3.0, -4.5)


9.3.3 Instance Objects (object of a class)

Now what can we do with instance objects or object ? The only operations understood by instance objects are attribute references (pointing member data and methods). There are two kinds of valid attribute names, data attributes and methods.

data attributes correspond to ``instance variables'' in Smalltalk, and to ``data members'' in C++. Data attributes need not be declared; like local variables, they spring into existence when they are first assigned to. For example, if x is the instance of MyClass created above, the following piece of code will print the value 16, without leaving a trace:

x = MyClass()  <--- creating object of class ‘MyClass’
x.counter = 1
while x.counter <>
    x.counter = x.counter * 2
print x.counter
del x.counter

The other kind of instance attribute reference is a method. A method is a function that ``belongs to'' an object. (In Python, the term method is not unique to class instances: other object types can have methods as well. For example, list objects have methods called append, insert, remove, sort, and so on. However, in the following discussion, we'll use the term method exclusively to mean methods of class instance objects, unless explicitly stated otherwise.)

Valid method names of an instance object depend on its class. By definition, all attributes of a class that are function objects define corresponding methods of its instances. So in our example, x.f is a valid method reference, since MyClass.f is a function, but x.i is not, since MyClass.i is not. But x.f is not the same thing as MyClass.f -- it is a method object, not a function object.


9.3.4 Method Objects

Usually, a method is called right after it is bound:

 
x.f()

In the MyClass example, this will return the string 'hello world'. However, it is not necessary to call a method right away: x.f is a method object, and can be stored away and called at a later time. For example:

 
xf = x.f
while True:
    print xf()

will continue to print "hello world" until the end of time.

What exactly happens when a method is called? You may have noticed that x.f() was called without an argument above, even though the function definition for f specified an argument. What happened to the argument? Surely Python raises an exception when a function that requires an argument is called without any -- even if the argument isn't actually used...

Actually, you may have guessed the answer: the special thing about methods is that the object is passed as the first argument of the function. In our example, the call x.f() is exactly equivalent to MyClass.f(x) . In general, calling a method with a list of n arguments is equivalent to calling the corresponding function with an argument list that is created by inserting the method's object before the first argument.

x = MyClass() <--- creating object of class ‘MyClass’

x.f() <--- Calling the method ‘f’. Here implicitly passing object of class ‘MyClass’ as argument to it

MyClass.f(x) <--- calling the function ‘f’ with explicitly passing object of class ‘MyClass’ as argument to it.

If you still don't understand how methods work, a look at the implementation can perhaps clarify matters. When an instance attribute (member function of object) is referenced that isn't a data attribute (member data), its class is searched. If the name (member function) denotes a valid class attribute that is a function object, a method object is created by packing (pointers to) the instance object (object of the class) and the function object just found together in an abstract object: this is the method object. When the method object is called with an argument list, it (abstract object or method object) is unpacked again, a new argument list is constructed from the instance object (object of the class) and the original argument list, and the function object is called with this new argument list.<--- Important


9.4 Random Remarks

Data attributes override method attributes with the same name; to avoid accidental name conflicts, which may cause hard-to-find bugs in large programs, it is wise to use some kind of convention that minimizes the chance of conflicts. Possible conventions include capitalizing method names, prefixing data attribute names with a small unique string (perhaps just an underscore), or using verbs for methods and nouns for data attributes.

Data attributes may be referenced by methods as well as by ordinary users (``clients'') of an object. In other words, classes are not usable to implement pure abstract data types. In fact, nothing in Python makes it possible to enforce data hiding -- it is all based upon convention. (On the other hand, the Python implementation, written in C, can completely hide implementation details and control access to an object if necessary; this can be used by extensions to Python written in C.)

Clients should use data attributes with care -- clients may mess up invariants maintained by the methods by stamping on their data attributes. Note that clients may add data attributes of their own to an instance object without affecting the validity of the methods, as long as name conflicts are avoided -- again, a naming convention can save a lot of headaches here.

There is no shorthand for referencing data attributes (or other methods!) from within methods. I find that this actually increases the readability of methods: there is no chance of confusing local variables and instance variables when glancing through a method.

Often, the first argument of a method is called self. This is nothing more than a convention: the name self has absolutely no special meaning to Python. (Note, however, that by not following the convention your code may be less readable to other Python programmers, and it is also conceivable that a class browser program might be written that relies upon such a convention.)

Any function object, that is a class attribute defines a method for instances of that class. It is not necessary that the function definition is textually enclosed in the class definition: assigning a function object to a local variable in the class is also ok. For example:

 
# Function defined outside the class
def f1(self, x, y):
    return min(x, x+y)
 
class C:
    f = f1
    def g(self):
        return 'hello world'
    h = g

Now f, g and h are all attributes of class C that refer to function objects, and consequently they are all methods of instances of class C -- h being exactly equivalent to g. Note that this practice usually only serves to confuse the reader of a program.

Methods may call other methods by using method attributes of the self argument: <--- Important

 
class Bag:
    def __init__(self):
        self.data = []
    def add(self, x):
        self.data.append(x)
    def addtwice(self, x):
        self.add(x) <--- Method ‘addtwice’ call method ‘add’ by using method attribute ‘add’.
        self.add(x)

Methods may reference global names in the same way as ordinary functions. The global scope associated with a method is the module containing the class definition. (The class itself is never used as a global scope!) While one rarely encounters a good reason for using global data in a method, there are many legitimate uses of the global scope: for one thing, functions and modules imported into the global scope can be used by methods, as well as functions and classes defined in it. Usually, the class containing the method is itself defined in this global scope, and in the next section we'll find some good reasons why a method would want to reference its own class!


9.5 Inheritance

Of course, a language feature would not be worthy of the name ``class'' without supporting inheritance. The syntax for a derived class definition looks like this:

 
class DerivedClassName(BaseClassName):
    
    .
    .
    .
    

The name BaseClassName must be defined in a scope containing the derived class definition. In place of a base class name, other arbitrary expressions are also allowed. This can be useful, for example, when the base class is defined in another module:

 
class DerivedClassName(modname.BaseClassName):

Execution of a derived class definition proceeds the same as for a base class. When the class object is constructed, the base class is remembered. This is used for resolving attribute references: if a requested attribute is not found in the class, the search proceeds to look in the base class. This rule is applied recursively if the base class itself is derived from some other class.

There's nothing special about instantiation of derived classes: DerivedClassName() creates a new instance of the class. Method references are resolved as follows: the corresponding class attribute is searched, descending down the chain of base classes if necessary, and the method reference is valid if this yields a function object.

Derived classes may override methods of their base classes. Because methods have no special privileges when calling other methods of the same object, a method of a base class that calls another method defined in the same base class may end up calling a method of a derived class that overrides it. (For C++ programmers: all methods in Python are effectively virtual.)

An overriding method in a derived class may in fact want to extend rather than simply replace the base class method of the same name. There is a simple way to call the base class method directly: just call "BaseClassName.methodname(self, arguments)". This is occasionally useful to clients as well. (Note that this only works if the base class is defined or imported directly in the global scope.)


9.5.1 Multiple Inheritance

Python supports a limited form of multiple inheritance as well. A class definition with multiple base classes looks like this:

 
class DerivedClassName(Base1, Base2, Base3):
    
    .
    .
    .
    

The only rule necessary to explain the semantics is the resolution rule used for class attribute references. This is depth-first, left-to-right. Thus, if an attribute (method or data) is not found in DerivedClassName, it is searched in Base1, then (recursively) in the base classes of Base1, and only if it is not found there, it is searched in Base2, and so on.

(To some people breadth first -- searching Base2 and Base3 before the base classes of Base1 -- looks more natural. However, this would require you to know whether a particular attribute of Base1 is actually defined in Base1 or in one of its base classes before you can figure out the consequences of a name conflict with an attribute of Base2. The depth-first rule makes no differences between direct and inherited attributes of Base1.)

It is clear that indiscriminate use of multiple inheritance is a maintenance nightmare, given the reliance in Python on conventions to avoid accidental name conflicts. A well-known problem with multiple inheritance is a class derived from two classes that happen to have a common base class. While it is easy enough to figure out what happens in this case (the instance will have a single copy of ``instance variables'' or data attributes used by the common base class), it is not clear that these semantics are in any way useful.


9.6 Private Variables

There is limited support for class-private identifiers. Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the identifier, so it can be used to define class-private instance and class variables, methods, variables stored in globals, and even variables stored in instances. private to this class on instances of other classes. Truncation may occur when the mangled name would be longer than 255 characters. Outside classes, or when the class name consists of only underscores, no mangling occurs.

Name mangling is intended to give classes an easy way to define ``private'' instance variables and methods, without having to worry about instance variables defined by derived classes, or mucking with instance variables by code outside the class. Note that the mangling rules are designed mostly to avoid accidents; it still is possible for a determined soul to access or modify a variable that is considered private. This can even be useful in special circumstances, such as in the debugger, and that's one reason why this loophole is not closed. (Buglet: derivation of a class with the same name as the base class makes use of private variables of the base class possible.)

Notice that code passed to exec, eval() or execfile() does not consider the classname of the invoking class to be the current class; this is similar to the effect of the global statement, the effect of which is likewise restricted to code that is byte-compiled together. The same restriction applies to getattr(), setattr() and delattr(), as well as when referencing __dict__ directly.


9.7 Odds and Ends

Sometimes it is useful to have a data type similar to the Pascal ``record'' or C ``struct'', bundling together a few named data items. An empty class definition will do nicely:

 
class Employee:        <--- Important
    pass
 
john = Employee() # Create an empty employee record (creating object of class Employee)
 
# Fill the fields of the record
john.name = 'John Doe'
john.dept = 'computer lab'
john.salary = 1000

A piece of Python code that expects a particular abstract data type can often be passed a class that emulates the methods of that data type instead. For instance, if you have a function that formats some data from a file object, you can define a class with methods read() and readline() that get the data from a string buffer instead, and pass it as an argument.

Instance method objects have attributes, too: m.im_self is the instance object with the method m, and m.im_func is the function object corresponding to the method.


9.8 Exceptions Are Classes Too

User-defined exceptions are identified by classes as well. Using this mechanism it is possible to create extensible hierarchies of exceptions.

There are two new valid (semantic) forms for the raise statement:

 
raise Class, instance
 
raise instance

In the first form, instance must be an instance of Class or of a class derived from it. The second form is a shorthand for:

 
raise instance.__class__, instance

A class in an except clause is compatible with an exception if it is the same class or a base class thereof (but not the other way around -- an except clause listing a derived class is not compatible with a base class). For example, the following code will print B, C, D in that order:

 
class B:
    pass
class C(B):
    pass
class D(C):
    pass
 
for c in [B, C, D]:
    try:
        raise c()
    except D:
        print "D"
    except C:
        print "C"
    except B:
        print "B"

Note that if the except clauses were reversed (with "except B" first), it would have printed B, B, B -- the first matching except clause is triggered.

When an error message is printed for an unhandled exception, the exception's class name is printed, then a colon and a space, and finally the instance converted to a string using the built-in function str().


9.9 Iterators

By now you have probably noticed that most container objects can be looped over using a for statement:

 
for element in [1, 2, 3]: <--- List
    print element
for element in (1, 2, 3): <--- tuple
    print element
for key in {'one':1, 'two':2}: <--- dictionary
    print key
for char in "123":     <--- String
    print char
for line in open("myfile.txt"): <--- File object
    print line

This style of access is clear, concise, and convenient. The use of iterators pervades and unifies Python. Behind the scenes, the for statement calls iter() on the container object. The function iter() returns an iterator object that defines the method next() which accesses elements in the container one at a time. When there are no more elements, next() raises a StopIteration exception which tells the for loop to terminate. This example shows how it all works:

 
>>> s = 'abc'
>>> it = iter(s) <--- The function iter() returns an iterator object
>>> it
>>> it.next()
'a'
>>> it.next()
'b'
>>> it.next()
'c'
>>> it.next()
 
Traceback (most recent call last):
  File "", line 1, in ?
    it.next()
StopIteration

Having seen the mechanics behind the iterator protocol, it is easy to add iterator behavior to your classes. Define a __iter__() method which returns an object with a next() method. If the class defines next(), then __iter__() can just return self:

 
class Reverse: <--- Important
    "Iterator for looping over a sequence backwards"
    def __init__(self, data):
        self.data = data
        self.index = len(data)
    def __iter__(self):
        return self
    def next(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]
 
>>> for char in Reverse('spam'):
...     print char
...
m
a
p
s


9.10 Generators

Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called, the generator resumes where it left-off (it remembers all the data values and which statement was last executed). An example shows that generators can be trivially easy to create:

 
def reverse(data):
    for index in range(len(data)-1, -1, -1):
        yield data[index]
        
>>> for char in reverse('golf'):
...     print char
...
f
l
o
g

Anything that can be done with generators can also be done with class based iterators as described in the previous section. What makes generators so compact is that the __iter__() and next() methods are created automatically.

Another key feature is that the local variables and execution state are automatically saved between calls. This made the function easier to write and much more clear than an approach using instance variables like self.index and self.data.

In addition to automatic method creation and saving program state, when generators terminate, they automatically raise StopIteration. In combination, these features make it easy to create iterators with no more effort than writing a regular function.


9.11 Generator Expressions

Some simple generators can be coded succinctly as expressions using a syntax similar to list comprehensions but with parentheses instead of brackets. These expressions are designed for situations where the generator is used right away by an enclosing function. Generator expressions are more compact but less versatile than full generator definitions and tend to be more memory friendly than equivalent list comprehensions.

Examples:

 
>>> sum(i*i for i in range(10))                 # sum of squares
285
 
>>> xvec = [10, 20, 30]
>>> yvec = [7, 5, 3]
>>> sum(x*y for x,y in zip(xvec, yvec))         # dot product
260
 
>>> from math import pi, sin
>>> sine_table = dict((x, sin(x*pi/180)) for x in range(0, 91))
 
>>> unique_words = set(word  for line in page  for word in line.split())
 
>>> valedictorian = max((student.gpa, student.name) for student in graduates)
 
>>> data = 'golf'
>>> list(data[i] for i in range(len(data)-1,-1,-1))
['f', 'l', 'o', 'g']


Footnotes

... namespace!9.1

Except for one thing. Module objects have a secret read-only attribute called __dict__ which returns the dictionary used to implement the module's namespace; the name __dict__ is an attribute but not a global name. Obviously, using this violates the abstraction of namespace implementation, and should be restricted to things like post-mortem debuggers.


6-Python Errors and Exceptions


Subsections

  • 8.1 Syntax Errors
  • 8.2 Exceptions
  • 8.3 Handling Exceptions
  • 8.4 Raising Exceptions
  • 8.5 User-defined Exceptions
  • 8.6 Defining Clean-up Actions
  • 8.7 Predefined Clean-up Actions


8. Errors and Exceptions

Until now error messages haven't been more than mentioned, but if you have tried out the examples you have probably seen some. There are (at least) two distinguishable kinds of errors: syntax errors and exceptions.


8.1 Syntax Errors

Syntax errors, also known as parsing errors, are perhaps the most common kind of complaint you get while you are still learning Python:

 
>>> while True print 'Hello world'
  File "", line 1, in ?
    while True print 'Hello world'
                   ^
SyntaxError: invalid syntax

The parser repeats the offending line and displays a little `arrow' pointing at the earliest point in the line where the error was detected. The error is caused by (or at least detected at) the token preceding the arrow: in the example, the error is detected at the keyword print, since a colon (":") is missing before it. File name and line number are printed so you know where to look in case the input came from a script.


8.2 Exceptions

Even if a statement or expression is syntactically correct, it may cause an error when an attempt is made to execute it. Errors detected during execution are called exceptions and are not unconditionally fatal: you will soon learn how to handle them in Python programs. Most exceptions are not handled by programs, however, and result in error messages as shown here:

 
>>> 10 * (1/0)
Traceback (most recent call last):
  File "", line 1, in ?
ZeroDivisionError: integer division or modulo by zero
>>> 4 + spam*3
Traceback (most recent call last):
  File "", line 1, in ?
NameError: name 'spam' is not defined
>>> '2' + 2
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: cannot concatenate 'str' and 'int' objects

The last line of the error message indicates what happened. Exceptions come in different types, and the type is printed as part of the message: the types in the example are ZeroDivisionError, NameError and TypeError. The string printed as the exception type is the name of the built-in exception that occurred. This is true for all built-in exceptions, but need not be true for user-defined exceptions (although it is a useful convention). Standard exception names are built-in identifiers (not reserved keywords).

The rest of the line provides detail based on the type of exception and what caused it.

The preceding part of the error message shows the context where the exception happened, in the form of a stack traceback. In general it contains a stack traceback listing source lines; however, it will not display lines read from standard input.

The Python Library Reference lists the built-in exceptions and their meanings.


8.3 Handling Exceptions

It is possible to write programs that handle selected exceptions. Look at the following example, which asks the user for input until a valid integer has been entered, but allows the user to interrupt the program (using Control-C or whatever the operating system supports); note that a user-generated interruption is signalled by raising the KeyboardInterrupt exception.

 
>>> while True:
...     try:
...         x = int(raw_input("Please enter a number: ")) <--- Read from keyboard and convert to integer and store to ‘x’.
...         break
...     except ValueError: <--- Here ValueError is the Exception name.
...         print "Oops!  That was no valid number.  Try again..."
...

The try statement works as follows.

  • First, the try clause (the statement(s) between the try and except keywords) is executed.
  • If no exception occurs, the except clause is skipped and execution of the try statement is finished.
  • If an exception occurs during execution of the try clause, the rest of the clause is skipped. Then if its type matches the exception named after the except keyword, the except clause is executed, and then execution continues after the try statement.
  • If an exception occurs which does not match the exception named (ValueError) in the except clause, it is passed on to outer try statements; if no handler is found, it is an unhandled exception and execution stops with a message as shown above.

A try statement may have more than one except clause, to specify handlers for different exceptions. At most one handler will be executed. Handlers only handle exceptions that occur in the corresponding try clause, not in other handlers of the same try statement. An except clause may name multiple exceptions as a parenthesized tuple, for example:

 
... except (RuntimeError, TypeError, NameError):  <--- multiple exceptions, using tuple to represent it
...     pass

The last except clause may omit the exception name(s), to serve as a wildcard. Use this with extreme caution, since it is easy to mask a real programming error in this way! It can also be used to print an error message and then re-raise the exception (allowing a caller to handle the exception as well):

 
import sys
 
try:
    f = open('myfile.txt')
    s = f.readline()
    i = int(s.strip())
except IOError, (errno, strerror):
    print "I/O error(%s): %s" % (errno, strerror)
except ValueError:
    print "Could not convert data to an integer."
except:        <---  last except clause may omit the exception name(s).
    print "Unexpected error:", sys.exc_info()[0]
    raise

The try ... except statement has an optional else clause, which, when present, must follow all except clauses. It is useful for code that must be executed if the try clause does not raise an exception. For example:

 
for arg in sys.argv[1:]:
    try:
        f = open(arg, 'r')
    except IOError: <---  check and executed if the try clause raise an exception
        print 'cannot open', arg
    else:      <---  executed if the try clause does not raise an exception
        print arg, 'has', len(f.readlines()), 'lines'
        f.close()

The use of the else clause is better than adding additional code to the try clause because it avoids accidentally catching an exception that wasn't raised by the code being protected by the try ... except statement.

When an exception occurs, it may have an associated value, also known as the exception's argument. The presence and type of the argument depend on the exception type (RuntimeError, TypeError, NameError).

The except clause may specify a variable after the exception name (or tuple). The variable is bound to an exception instance with the arguments stored in instance.args. For convenience, the exception instance defines __getitem__ and __str__ so the arguments can be accessed or printed directly without having to reference .args.

But use of .args is discouraged. Instead, the preferred use is to pass a single argument to an exception (which can be a tuple if multiple arguments are needed) and have it bound to the message attribute. One my also instantiate an exception first before raising it and add any attributes to it as desired.

 
>>> try:
...    raise Exception('spam', 'eggs') <--- instantiate an exception ‘Exception’ and add attributes 'spam', 'eggs' to it.
... except Exception, inst: <--- Here ‘inst’ is a variable and is bound to an exception instance
...    print type(inst)     # the exception instance,(checking type of ‘inst’)
...    print inst.args      # arguments stored in .args
...    print inst           # __str__ allows args to printed directly the arguments stored.
...    x, y = inst          # __getitem__ allows args to be unpacked directly
...    print 'x =', x
...    print 'y =', y
...
OUTPUT
------
('spam', 'eggs')
('spam', 'eggs')
x = spam
y = eggs

If an exception has an argument, it is printed as the last part (`detail') of the message for unhandled exceptions.

Exception handlers don't just handle exceptions if they occur immediately in the try clause, but also if exception occurs inside functions that are called (even indirectly) in the try clause. For example:

 
>>> def this_fails(): <---  (function definition) Here exception occurs inside function
...     x = 1/0
... 
>>> try:
...     this_fails() <--- calling function ‘this_fails()’.
... except ZeroDivisionError, detail: <--- Equal to “ZeroDivisionError(detail)” <--- Important
...     print 'Handling run-time error:', detail
... 
OUTPUT
------
Handling run-time error: integer division or modulo by zero


8.4 Raising Exceptions

The raise statement allows the programmer to force a specified exception to occur. For example:

 
>>> raise NameError, 'HiThere' <--- Here ‘NameError’ is the exception name and 'HiThere' is the exception's argument
Traceback (most recent call last):
  File "", line 1, in ?
NameError: HiThere

The first argument to raise names the exception to be raised. The optional second argument specifies the exception's argument. Alternatively, the above could be written as raise NameError('HiThere'). Either form works fine, but there seems to be a growing stylistic preference for the latter.

>>> raise NameError, 'HiThere' <--- Equal to raise NameError('HiThere').

If you need to determine whether an exception was raised but don't intend to handle it, a simpler form of the raise statement allows you to re-raise the exception:

 
>>> try:
...     raise NameError, 'HiThere'
... except NameError:
...     print 'An exception flew by!'
...     raise
...
OUTPUT
------
An exception flew by!
Traceback (most recent call last):
  File "", line 2, in ?
NameError: HiThere


8.5 User-defined Exceptions

Programs may name their own exceptions by creating a new exception class. Exceptions should typically be derived from the Exception class, either directly or indirectly. For example:

 
>>> class MyError(Exception):  <--- Creating new class ‘MyError’ by inheriting from base class ‘Exception’.
...     def __init__(self, value):    <--- Constructor
...         self.value = value
...     def __str__(self):
...         return repr(self.value)
... 
>>> try:
...     raise MyError(2*2)     <--- Forcing exception ‘MyError’ to occur
... except MyError, e:         <--- Handing the exception ‘MyError’. Equal to “except MyError(e)”
...     print 'My exception occurred, value:', e.value
... 
OUTPUT
------
My exception occurred, value: 4
 
 
>>> raise MyError, 'oops!'
Traceback (most recent call last):
  File "", line 1, in ?
__main__.MyError: 'oops!'

In this example, the default __init__ constructor of Exception class has been overridden. The new behavior simply creates the value attribute. This replaces the default behavior of creating the args attribute.

Exception classes can be defined which do anything any other class can do, but are usually kept simple, often only offering a number of attributes that allow information about the error to be extracted by handlers for the exception. When creating a module that can raise several distinct errors, a common practice is to create a base class for exceptions defined by that module, and subclass that to create specific exception classes for different error conditions:

 
class Error(Exception):        <--- Creating new class ‘Error’ by inheriting from base class ‘Exception’.
    """Base class for exceptions in this module."""
    pass
 
class InputError(Error): <--- Creating subclass class ‘InputError’ by inheriting from class ‘Error’.
    """Exception raised for errors in the input.
 
    Attributes:
        expression -- input expression in which the error occurred
        message -- explanation of the error
    """
 
    def __init__(self, expression, message): <--- Constructor
        self.expression = expression
        self.message = message
 
class TransitionError(Error): <--- Creating suclass class ‘TransitionError’ by inheriting from class ‘Error’.
    """Raised when an operation attempts a state transition that's not
    allowed.
 
    Attributes:
        previous -- state at beginning of transition
        next -- attempted new state
        message -- explanation of why the specific transition is not allowed
    """
 
    def __init__(self, previous, next, message): <--- Constructor
        self.previous = previous
        self.next = next
        self.message = message

Most exceptions are defined with names that end in ``Error,'' similar to the naming of the standard exceptions.

Many standard modules define their own exceptions to report errors that may occur in functions they define. More information on classes is presented in chapter 9, ``Classes.''


8.6 Defining Clean-up Actions

The try statement has another optional clause which is intended to define clean-up actions that must be executed under all circumstances. For example:

 
>>> try:
...     raise KeyboardInterrupt <---Force exception ‘KeyboardInterrupt’ to occur
... finally:
...     print 'Goodbye, world!'
...
OUTPUT
------ 
Goodbye, world!
Traceback (most recent call last):
  File "", line 2, in ?
KeyboardInterrupt

A finally clause is always executed before leaving the try statement, whether an exception has occurred or not. When an exception has occurred in the try clause and has not been handled by an except clause (or it has occurred in a except or else clause), it is re-raised after the finally clause has been executed. The finally clause is also executed ``on the way out'' when any other clause of the try statement is left via a break, continue or return statement. A more complicated example:

 
>>> def divide(x, y):
...     try:
...         result = x / y
...     except ZeroDivisionError: <---  check and executed if the try clause raise an exception
...         print "division by zero!"
...     else:  <---  executed if the try clause does not raise an exception.
...         print "result is", result
...     finally: <---  executed if an exception has occurred or not
...         print "executing finally clause"
...
>>> divide(2, 1)
OUTPUT
------ 
result is 2
executing finally clause
 
>>> divide(2, 0)
OUTPUT
------ 
division by zero!
executing finally clause
 
>>> divide("2", "1")
OUTPUT
------ 
executing finally clause
Traceback (most recent call last):
  File "", line 1, in ?
  File "", line 3, in divide
TypeError: unsupported operand type(s) for /: 'str' and 'str'
 

As you can see, the finally clause is executed in any event. The TypeError raised by dividing two strings is not handled by the except clause and therefore re-raised after the finally clauses has been executed. <--- re-raising Important

In real world applications, the finally clause is useful for releasing external resources (such as files or network connections), regardless of whether the use of the resource was successful.


8.7 Predefined Clean-up Actions

Some objects define standard clean-up actions to be undertaken when the object is no longer needed, regardless of whether or not the operation using the object succeeded or failed. Look at the following example, which tries to open a file and print its contents to the screen.

 
for line in open("myfile.txt"): <--- open function return a file object.(loop over file object to read lines)
    print line

The problem with this code is that it leaves the file open for an indeterminate amount of time after the code has finished executing. This is not an issue in simple scripts, but can be a problem for larger applications. The with statement allows objects like files to be used in a way that ensures they are always cleaned up promptly and correctly.

 
with open("myfile.txt") as f:
    for line in f:
        print line

After the statement is executed, the file f is always closed, even if a problem was encountered while processing the lines. Other objects which provide predefined clean-up actions will indicate this in their documentation.


5-Python Input and Output


Subsections

  • 7.1 Fancier Output Formatting
  • 7.2 Reading and Writing Files
    • 7.2.1 Methods of File Objects
    • 7.2.2 The pickle Module


7. Input and Output

There are several ways to present the output of a program; data can be printed in a human-readable form, or written to a file for future use. This chapter will discuss some of the possibilities.


7.1 Fancier Output Formatting

So far we've encountered two ways of writing values: expression statements and the print statement. (A third way is using the write() method of file objects; the standard output file can be referenced as sys.stdout. See the Library Reference for more information on this.)

Often you'll want more control over the formatting of your output than simply printing space-separated values. There are two ways to format your output; the first way is to do all the string handling yourself; using string slicing and concatenation operations you can create any layout you can imagine. The standard module string contains some useful operations for padding strings to a given column width; these will be discussed shortly. The second way is to use the % operator with a string as the left argument. The % operator interprets the left argument much like a sprintf()-style format string to be applied to the right argument, and returns the string resulting from this formatting operation.

One question remains, of course: how do you convert values to strings? Luckily, Python has ways to convert any value to a string: pass it to the repr() or str() functions. Reverse quotes (``) are equivalent to repr(), but they are no longer used in modern Python code and will likely not be in future versions of the language.

The str() function is meant to return representations of values which are fairly human-readable, while repr() is meant to generate representations which can be read by the interpreter (or will force a SyntaxError if there is not equivalent syntax). For objects which don't have a particular representation for human consumption, str() will return the same value as repr(). Many values, such as numbers or structures like lists and dictionaries, have the same representation using either function. Strings and floating point numbers, in particular, have two distinct representations.

Some examples:

 
>>> s = 'Hello, world.'
>>> str(s)    <--- human-readable
'Hello, world.'  <--- Note that string output contain single quotes (‘ ’).
>>> repr(s)         <--- read by the interpreter
"'Hello, world.'" <--- Note that string output contain double and single quotes (“‘ ’”).
>>> str(0.1)
'0.1'
>>> repr(0.1)
'0.10000000000000001'
>>> x = 10 * 3.25
>>> y = 200 * 200
>>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...'
>>> print s
The value of x is 32.5, and y is 40000...
>>> # The repr() of a string adds string quotes and backslashes:
... hello = 'hello, world\n'
>>> hellos = repr(hello)
>>> print hellos
'hello, world\n'
>>> # The argument to repr() may be any Python object:
... repr((x, y, ('spam', 'eggs')))
"(32.5, 40000, ('spam', 'eggs'))"
>>> # reverse quotes are convenient in interactive sessions:
... `x, y, ('spam', 'eggs')`
"(32.5, 40000, ('spam', 'eggs'))"

Here are two ways to write a table of squares and cubes:

 
>>> for x in range(1, 11):
...     print repr(x).rjust(2), repr(x*x).rjust(3),
...     # Note trailing comma on previous line
...     print repr(x*x*x).rjust(4)
...
 1   1    1
 2   4    8
 3   9   27
 4  16   64
 5  25  125
 6  36  216
 7  49  343
 8  64  512
 9  81  729
10 100 1000
>>> for x in range(1,11):
...     print '%2d %3d %4d' % (x, x*x, x*x*x) <--- Note 4 % symbols. ‘%d’ means dgit or integer
... 
 1   1    1
 2   4    8
 3   9   27
 4  16   64
 5  25  125
 6  36  216
 7  49  343
 8  64  512
 9  81  729
10 100 1000

(Note that one space between each column was added by the way print works: it always adds spaces between its arguments.) <--- Important

This example demonstrates the rjust() method of string objects, which right-justifies a string in a field of a given width by padding it with spaces on the left. There are similar methods ljust() and center(). These methods do not write anything, they just return a new string. If the input string is too long, they don't truncate it, but return it unchanged; this will mess up your column lay-out but that's usually better than the alternative, which would be lying about a value. (If you really want truncation you can always add a slice operation, as in "x.ljust(n)[:n]".)

There is another method, zfill(), which pads a numeric string on the left with zeros. It understands about plus and minus signs:

 
>>> '12'.zfill(5)
'00012'
>>> '-3.14'.zfill(7)
'-003.14'
>>> '3.14159265359'.zfill(5)
'3.14159265359'

Using the % operator looks like this:

 
>>> import math
>>> print 'The value of PI is approximately %5.3f.' % math.pi
The value of PI is approximately 3.142.

If there is more than one format in the string, you need to pass a tuple as right operand, as in this example:

 
>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 7678}
>>> for name, phone in table.items():
...     print '%-10s ==> %10d' % (name, phone) <--- ‘%s’ means string, ’%d’ means digit
... 
Jack       ==>       4098
Dcab       ==>       7678
Sjoerd     ==>       4127

Most formats work exactly as in C and require that you pass the proper type; however, if you don't you get an exception, not a core dump. The %s format is more relaxed: if the corresponding argument is not a string object, it is converted to string using the str() built-in function. Using * to pass the width or precision in as a separate (integer) argument is supported. The C formats %n and %p are not supported.

If you have a really long format string that you don't want to split up, it would be nice if you could reference the variables to be formatted by name instead of by position. This can be done by using form %(name)format, as shown here:

 
>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}
>>> print 'Jack: %(Jack)d; Sjoerd: %(Sjoerd)d; Dcab: %(Dcab)d' % table
Jack: 4098; Sjoerd: 4127; Dcab: 8637678

This is particularly useful in combination with the new built-in vars() function, which returns a dictionary containing all local variables.


7.2 Reading and Writing Files

open() returns a file object, and is most commonly used with two arguments: "open(filename, mode)".

 
>>> f=open('/tmp/workfile', 'w')
>>> print f

The first argument is a string containing the filename. The second argument is another string containing a few characters describing the way in which the file will be used. mode can be 'r' when the file will only be read, 'w' for only writing (an existing file with the same name will be erased), and 'a' opens the file for appending; any data written to the file is automatically added to the end. 'r+' opens the file for both reading and writing. The mode argument is optional; 'r' will be assumed if it's omitted.

On Windows and the Macintosh, 'b' appended to the mode opens the file in binary mode, so there are also modes like 'rb', 'wb', and 'r+b'. Windows makes a distinction between text and binary files; the end-of-line characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is fine for ASCII text files, but it'll corrupt binary data like that in JPEG or EXE files. Be very careful to use binary mode when reading and writing such files.


7.2.1 Methods of File Objects

The rest of the examples in this section will assume that a file object called f has already been created.

1) To read a file's contents, call f.read(size), which reads some quantity of data and returns it as a string. size is an optional numeric argument. When size is omitted or negative, the entire contents of the file will be read and returned; it's your problem if the file is twice as large as your machine's memory. Otherwise, at most size bytes are read and returned. If the end of the file has been reached, f.read() will return an empty string ("").

 
>>> f.read()
'This is the entire file.\n'
>>> f.read() <--- end of the file
''

2) f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn't end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.

 
>>> f.readline()
'This is the first line of the file.\n'
>>> f.readline()
'Second line of the file\n'
>>> f.readline()  <--- end of the file
''

3) f.readlines() returns a list containing all the lines of data in the file. If given an optional parameter sizehint, it reads that many bytes from the file and enough more to complete a line, and returns the lines from that. This is often used to allow efficient reading of a large file by lines, but without having to load the entire file in memory. Only complete lines will be returned.

 
>>> f.readlines()
['This is the first line of the file.\n', 'Second line of the file\n']

An alternate approach to reading lines is to loop over the file object. This is memory efficient, fast, and leads to simpler code:

 
>>> for line in f: <--- Important
        print line,
        
This is the first line of the file.
Second line of the file

The alternative approach is simpler but does not provide as fine-grained control. Since the two approaches manage line buffering differently, they should not be mixed.

f.write(string) writes the contents of string to the file, returning None.

 
>>> f.write('This is a test\n')

To write something other than a string, it needs to be converted to a string first:

 
>>> value = ('the answer', 42)
>>> s = str(value)
>>> f.write(s)

f.tell() returns an integer giving the file object's current position in the file, measured in bytes from the beginning of the file. To change the file object's position, use "f.seek(offset, from_what)". The position is computed from adding offset to a reference point; the reference point is selected by the from_what argument. A from_what value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. from_what can be omitted and defaults to 0, using the beginning of the file as the reference point.

 
>>> f = open('/tmp/workfile', 'r+')
>>> f.write('0123456789abcdef')      <--- Writing to file ‘/tmp/workfile’. 
>>> f.seek(5)     #<--- Go to the 6th byte in the file
>>> f.read(1)        
'5'
>>> f.seek(-3, 2) #<--- Go to the 3rd byte before the end
>>> f.read(1)
'd'

When you're done with a file, call f.close() to close it and free up any system resources taken up by the open file. After calling f.close(), attempts to use the file object will automatically fail.

 
>>> f.close()
>>> f.read()
Traceback (most recent call last):
  File "", line 1, in ?
ValueError: I/O operation on closed file

File objects have some additional methods, such as isatty() and truncate() which are less frequently used; consult the Library Reference for a complete guide to file objects.


7.2.2 The pickle Module

Strings can easily be written to and read from a file. Numbers take a bit more effort, since the read() method only returns strings, which will have to be passed to a function like int(), which takes a string like '123' and returns its numeric value 123. However, when you want to save more complex data types like lists, dictionaries, or class instances, things get a lot more complicated.

Rather than have users be constantly writing and debugging code to save complicated data types, Python provides a standard module called pickle. This is an amazing module that can take almost any Python object (even some forms of Python code!), and convert it to a string representation; this process is called pickling. Reconstructing the object from the string representation is called unpickling. Between pickling and unpickling, the string representing the object may have been stored in a file or data, or sent over a network connection to some distant machine.

If you have an object ‘x’, and a file object ‘f’ that's been opened for writing, the simplest way to pickle the object takes only one line of code:

 
pickle.dump(x, f)

To unpickle the object again, if ‘f’ is a file object which has been opened for reading:

 
x = pickle.load(f)

(There are other variants of this, used when pickling many objects or when you don't want to write the pickled data to a file; consult the complete documentation for pickle in the Python Library Reference.)

pickle is the standard way to make Python objects which can be stored and reused by other programs or by a future invocation of the same program; the technical term for this is a persistent object. Because pickle is so widely used, many authors who write Python extensions take care to ensure that new data types such as matrices can be properly pickled and unpickled.