Dalke Scientific Software: More science. Less time. Products

See Assignment #1 for the instructions of how to submit this assignment. The short version is to send me a tar or zip archive of a directory named "assignment2" with answers in the README file. Some of the README text will refer to files included in the directory.


Save a copy of blast_parser.py into your assignment2 directory. It is a copy of blast7.py from the lecture. You will modify your copy of the file so it extracts a few new fields.

Place a copy of blastp.txt into your assignment2 directory. You will use it for your testing.

Part 1

The BLAST footer (the end of the BLAST output) contains about 25 fields describing the BLAST search. Modify blast_parser.py so it reads the "Matrix", "Gap Penalties" and "Number of Hits to DB" lines. Store the results as new attributes in the BlastResults class named "matrix", "gap_penalities", and "num_hits". The gap_penalities will be a two element list. (I want those fields to be the same as Biopython's so they should look like the sample values from the lecture.)

You must modify the test function so it checks that these 3 properties are read correctly. I will inspect your test function.

I will test your file as a library so it must not print anything out when it's imported. I will import your modified version of "blast_parser.py" on a different BLASTP file. (I'll manually change a few of the fields but the main structure will be identical - I'm not going to do anything tricky.)

Part 2

Write a new program called "count_hsps.py". This program will read the file named "blastp.txt" in the current directory (the same one we've been working with). It will print the number of alignments and the total number of HSPs found in the file.

The output should look something like

number of alignments: 83
number of HSPs: 105
Note: in a BLAST file the number of descriptions may be different than the number of alignments; both are independently set on the command-line. You must count the number of alignments and not the number of descriptions.

Some Socratic questions to help you along. You do not need to answer these for the assignment. Is there a characteristic indication of the start of an alignment? What about for an HSP? Can you manually find an alignment with more than one HSP?

Part 3

Using the Biopython parser write a program named "blast_info.py" that opens "blastp.txt" in the current directory and answers the following questions:

Copyright © 2001-2013 Andrew Dalke Scientific AB