Friday, April 2, 2010

Sorting in Python

Sorting in Python

To sort a list in Python, use the “sort” method. For example:

li=[1,9,2,3];
li.sort();
print li;

Note that sort is a method, and the list is changed in place.

Suppose you have a matrix, and you want to sort by second column. Example Solution:

li=[[2,6],[1,3],[5,4]]
li.sort(lambda x, y: cmp(x[1],y[1]))
print li; # prints [[1, 3], [5, 4], [2, 6]]

The argument to sort is a function of two arguments, that returns -1, 0, 1. This function is a decision function that tells sort() how to decide the order of any two elements in your list. If the first argument is “less” then second argument, the function should return -1. If equal, then 0. Else, 1.

-------------------------

Here's a more complex example. Suppose you have a list of strings.

'my283.jpg'
'my23i.jpg'
'web7-s.jpg'
'fris88large.jpg'
...

You want to sort them by the number embedded in them. What you have to do, is to provide sort() method a order-deciding function, that takes two strings, and compares the integer inside the string. Here's the solution:

li=[
'my283.jpg',
'my23i.jpg',
'web7-s.jpg',
'fris88large.jpg',
]

def myComp (x,y):
import re
def getNum(str): return float(re.findall(r'\d+',str)[0])
return cmp(getNum(x),getNum(y))

li.sort(myComp)
print li # returns ['web7-s.jpg', 'my23i.jpg', 'fris88large.jpg', 'my283.jpg']

Here, we defined a function myComp to tell sort about the ordering. Normally, one would use the “lambda” construct, but Python's lambda construct cannot be used to define any function that takes more than one Python-line to express.

-------------------------

Python's “sort” method's optional parameters: “key” and “reverse

Most of the time, sorting is done for a list of atomic element such as [3,2,4]. This is simply done by myList.sort() without any argument. Other than simple list, sort is frequently used on matrixes (e.g. [[2,6],[1,3],[5,4]]). For matrixes, almost always a particular column is used for the basis of ordering. For example, if we want to sort by second column, we do: “li.sort(lambda x, y: cmp(x[1],y[1]))”. Since this is frequently used, Python provides a somewhat shorter syntax for it, by specifying the column used as the ordering “key”. For example:

li=[[2,6],[1,3],[5,4]]
li.sort(key=lambda x:x[1] ) # is equivalent to the following
#li.sort(lambda x, y: cmp(x[1],y[1]))
print li; # prints [[1, 3], [5, 4], [2, 6]]

Because the Python compiler is not very refined, this specialized syntax is algorithmically a order of magnitude faster than the general form “lambda x, y: cmp(x[1],y[1])”.

That is to say, the programer must now learn 2 syntaxes for the ordering function, one general and one specialized, and he must choose the right one otherwise he will be penalized by a magnitude of slow down. This also means that, in order to make the right choice of the syntax, the programer must known in advance of the nature of the list to be sorted.

Another idiosyncratic provision is the optional “reverse” argument. This parameter is necessary when using the “key” parameter. In the normal order comparison function lambda(x,y), the ascending/descending nature can be changed by simply switching the parameters x and y. But now a single “key=lambda(x)” can't provide that, thus another optional parameter “reverse” is provided. Example:

The following are equivalent:

li.sort(key=lambda x:x[1], reverse=True )
li.sort(lambda x, y: cmp(x[1],y[1]), reverse=True)
li.sort(lambda x, y: cmp(y[1],x[1]))

Of course, one can reverse a list by the “reverse()” method for lists. But li.sort(...).reverse() is algorithmically another order of magnitude slower since Python compiler is not smart.

-----------------------

No comments:

Post a Comment