Dalke Scientific Software: More science. Less time. Products

See Assignment #1 for the instructions of how to submit this assignment. The short version is to send me a tar or zip archive of a directory named "assignment4" with answers in the README file. Some of the README text will refer to files included in the directory.

In this assignment you will change graph10.py from yesterday's lecture so it reads the sequence information from a GenPept file.

If you are a more advanced programmer, make these changes to the final plotting code ("graph12") from yesterday. That's the code which showed all three hydrophobicity plots.

Part 0

Copy the graph10 example from yesterday into your "assignment4" directory. Name the file "br_plot.py". Also copy the bacteriorhodopsin.fasta data file into that directory.

Run the br_plot.py to double check that it works. It's always nice to start with working code. Use the fasta_reader.py from your previous exercise. There's no need to include fasta_reader.py with the assignment.

Part 1

In this part you will modify the code to read the sequence and graph title information from a GenBank record using Biopython's GenBank parser. Start by copying br.gp into your assignment directory. This contains the GenPept entry for the bacteriorhodopsin entry I was using.

Look at the br.gp file to get and idea of what it contains.

Edit the br_plot.py file to use the RecordParser from Bio.GenBank. Here are the changes you'll likely need to make:

Rerun the code. The hydrophobicity plot should be (almost) unchanged.

Questions to answer in your README

(I asked the last question because there have been many times where I've edited a program and reran it only to find I didn't save what I did or I changed the wrong file. You need to develop skill in recognizing when this happens.)

Part 2

In this part you'll modify the code to get the secondary structure information from the GenBank record instead of using hard-coded one. You will replace the section of code which starts

# Draw the known helix and sheet ranges
axvspan(9, 32, facecolor="yellow", alpha=0.4) # helix
axvspan(36, 61, facecolor="yellow", alpha=0.4)
axvspan(64, 71, facecolor="red", alpha=0.4)  # sheet
with code that uses data from the GenBank record.

Start by testing that you can get the secondary struture information from the record. Add a some just before the above section of code to print the secondary structure information. You might use the print_secstr from the lecture as a guide.

Verify that the coordinates and the secondary structure type information match the values in the existing code and that both match what's in the file.

Once that works, change the new code so that instead of printing the coordinates and secondary structure type to the screen it calls axvspan with the right coordinate and color properties. Remember to remove the old axvspan calls so you know you're seeing the results of the new code.

There are no questions for this section.

Part 3

Change the program so it accepts the input file name as a command-line program. When you do this remember to change the name of the program because it's no longer specific to bacteriorhodopin. Change it to something which makes sense for the new purpose of the program.

Also remember to change any comments which are now no longer appropriate and remove any imports which aren't used for anything.

If your program is named program_name then I should be able to run your program like this.

 % python program_name 1Z9KA.gp
 % python program_name br.gp
I'll test it on another data file of my own.

Test your program using chain A from 1Z9K.

NOTE: for the last one, by "double-check" I mean I would like to know how you would evaluate the quality of the hydrophobicity plot as a way to predict transmembrane helicies. Here you have a prediction and you have information that those are helicies, but are those helicies really helicies? How do you validate that prediction?

Find another GenPept structure with secondary structure information. (Hint: start by finding a PDB file since those are all in GenBank.) Save the GenPept record to a file in the assignment4 directory and use your program to view the hydrophobicity plot(s).

Part 4 (advanced programmers only)

For the more advanced programmers. Accept a window size as an optional second argument. Use 19 as the default. For examples,
 % program_name 1Z9KA.gp
will use the default window size of 19 and
 % program_name 1Z9KA.gp 5
will use a window size of 5.

Your code must handle all three three filter types. (In my lecture I linked to a module for computing the Savitzky-Golay weights given the window size.)

Check the input window size given on the command-line. If it's too small (under 5 residues) or is an even number then use sys.exit(msg) to print an error message and exit Python without doing the plot.

Copyright © 2001-2013 Andrew Dalke Scientific AB