Dalke Scientific Software: More science. Less time. Products

List Comprehensions

In graph11 and graph12 of the plotting lecture I defined this function

def get_per_residue_values(sequence, table):
    values = []
    for residue in sequence:
        values.append(table[residue])
    return values
That code pattern – use elements from one list to make another list – occurs frequently. For another example of this pattern, here's a way to get the number of HSPs given a BLAST record:
hsp_counts = []
for alignment in blast.alignments:
    hsp_counts.append(len(alignment.hsps))

In programming it's often a good idea to turn frequently used patterns into functions. That doesn't work in this case because different loops use different per-element calculations. You can see in the above two examples that one loop uses table[residue] and the other uses len(alignment.hsps). Python functions don't easily or naturally let you define arbitrary code to pass to a function.

Instead, the Python developers added what's known as list comprehensions. (The name comes from the Haskell programing language.) This is new syntax for the very common pattern of making a new list based on per-element calculations of an old list.

Here are a few examples of it in use:

>>> values = [1, 4, 9, 25]
>>> [v for v in values]
[1, 4, 9, 25]
>>> [v*2 for v in values]
[2, 8, 18, 50]
>>> [v*v for v in values]
[1, 16, 81, 625]
>>> [0 for v in values]
[0, 0, 0, 0]
>>> 
The basic syntax is:

I can rewrite the two examples at the top of this lecture using list comprehensions:

hydrophobicity_values = [table[residue] for residue in sequence]
 ...
hsp_counts = [len(alignment.hsps) for alignment in alignments]
Even though it has the fancy term list comprehension it isn't a complicated idea. It's only a compact way of writing a pattern you've used many times. If you see Python code which looks like:
new_list = [expression(i) for i in old_list]
then if you want you can think of it in your head as
new_list = []
for i in old_list:
    new_list.append(expressions(i))
because those two forms are identical.

Python list comprehensions are a bit more powerful than what I listed here. They also allow filtering

>>> [v for v in values if v < 10]
[1, 4, 9]
>>> [v for v in values if v%2==1]
[1, 9, 25]
>>> 
In general if you see
new_list = [expression(i) for i in old_list if filter(i)]
then if you want you can think of it in your head as
new_list = []
for i in old_list:
    if filter(i):
        new_list.append(expressions(i))

I don't use the filtering part of list comprehensions that often but it does prove useful.

List comprehensions are nice because they provide a compact way to represent a common pattern in Python. Compact is usually good because you get to see more of the code in one view. But too compact is a problem because of the effort to understand what's happening. Sometimes when the list comprehension gets too long I'll split it over a few lines:

[alignnment for alignment in blast.alignments
                 if len(alignment.hsps) > 1]
but if it gets much more complicated than this I'll use the original pattern of appending elements to a list inside of a for-loop. Besides, if I have a complicated list comprehension then there will almost certainly be bugs. I debug with print-statements but it's not possible to put a print statement in a list comprehension (at least not without some trickery).

(As an aside for those with more programming experience: Python allows some functional programming and supports simple lambda functions and has built-in map, filter and reduce functions. Functions are first-class objects so higher-order functions are simple to define. Here are implementations of the above two snippets of code:

hydrophobicity_values = map(lambda residue: table[residue], sequence)
 ...
hsp_counts = map(lambda alignment: len(alignment.hsps), alignments)
However, Python's author believes it was a mistake to encourage this sort of programming so the community view discourages code using this style. Unless you have experience using Lisp you'll likely agree that the above looks pretty strange and is somewhat difficult understand, especially compared to list comprehensions.)

Because people asked, here are some examples of more complex list comprehensions. You can use a list comprehension in a list comprehension in several places. Here I use a list comprehension as the element expression for another list comprehension.

>>> for row in [[i*j for i in range(1, 8)] for j in range(1, 4)]:
...     print row
... 
[1, 2, 3, 4, 5, 6, 7]
[2, 4, 6, 8, 10, 12, 14]
[3, 6, 9, 12, 15, 18, 21]
>>> 
Breaking that down, there will be three rows, for values of j in (1, 2, 3). The simplified version of the outer list comprehension is
>>> [j for j in range(1, 4)] 
[1, 2, 3]
>>> 
Instead of using the simple "j" as the element expression I'll use a single list "[j]".
>>> [[j] for j in range(1, 4)]
[[1], [2], [3]]
>>> 
Or a slightly more complex element expression
>>> [[0]*j for j in range(1, 4)]
[[0], [0, 0], [0, 0, 0]]
>>> 
(The notation list*integer means to make a new list which repeats the old list that many times, so
>>> ["A"]*0
[]
>>> ["A"]*1
['A']
>>> ["A"]*2
['A', 'A']
>>> ["A"]*3
['A', 'A', 'A']
>>> ["A"]*4
['A', 'A', 'A', 'A']
>>> ["A"]*10
['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A']
>>> 
) Now I'll use the list comprehension [i*j for i in range(1,8)]. The variable j doesn't exist in the list comprehension so I'll define one first, so you can see how it works.
>>> j = 3
>>> [i*j for i in range(1,8)]
[3, 6, 9, 12, 15, 18, 21]
>>> 
Variables in a list comprehension can be used inside of expressions of the list comprehension, so the inner element expression of this list comprehension uses the j from the outer list comprehension
>>> for row in [[i*j for i in range(1, 8)] for j in range(1, 4)]:
...     print row
... 
[1, 2, 3, 4, 5, 6, 7]
[2, 4, 6, 8, 10, 12, 14]
[3, 6, 9, 12, 15, 18, 21]
>>> 

You can also use a list comprehension in the list part of the comprehension (the one used for the source of the elements):

>>> [i*2 for i in [j+1 for j in range(5)]]
[0, 2, 4, 6, 8]
>>>
In most cases this isn't useful because the two list comprehensions can be merged into one.
 
>>> [(i+1)*2 for i in range(5)]
[2, 4, 6, 8, 10]
>>> 
About the only time it's useful is if you want several levels of filtering, including filtering on intermediate values.
>>> [i*2 for i in [j+1 for j in range(20) if (j%3)==0] if i*i>19]
[14, 20, 26, 32, 38]
>>> 
I can't think of a time when I've needed that. Usually I'll do that in two steps,
>>> mod3_values = [j+1 for j in range(20) if (j%3)==0]
>>> [i*2 for i in mod3_values if i*i>19]
[14, 20, 26, 32, 38]
>>> 



Copyright © 2001-2013 Andrew Dalke Scientific AB