See Assignment #1 for the instructions of how to submit this assignment. The short version is to send me a tar or zip archive of a directory named "assignment4" with answers in the README file. Some of the README text will refer to files included in the directory.
In this assignment you will change graph10.py from yesterday's lecture so it reads the sequence information from a GenPept file.
If you are a more advanced programmer, make these changes to the final plotting code ("graph12") from yesterday. That's the code which showed all three hydrophobicity plots.
In this part you will modify the code to read the sequence and graph title information from a GenBank record using Biopython's GenBank parser. Start by copying br.gp into your assignment directory. This contains the GenPept entry for the bacteriorhodopsin entry I was using.
Look at the br.gp file to get and idea of what it contains.
Edit the br_plot.py file to use the RecordParser from Bio.GenBank. Here are the changes you'll likely need to make:
- at the top of the file import the Bio.Genbank module
- replace the code to read a FASTA record with code to read a GenBank record.
- The plotting code uses record.title to make the title for the graph. Change that to use the string "gi:" + record.gi.
Rerun the code. The hydrophobicity plot should be (almost) unchanged.
Questions to answer in your README
- Why didn't you need to change record.sequence?
- How do you know you ran the modified program instead of the original program?
In this part you'll modify the code to get the secondary structure information from the GenBank record instead of using hard-coded one. You will replace the section of code which starts
# Draw the known helix and sheet ranges axvspan(9, 32, facecolor="yellow", alpha=0.4) # helix axvspan(36, 61, facecolor="yellow", alpha=0.4) axvspan(64, 71, facecolor="red", alpha=0.4) # sheetwith code that uses data from the GenBank record.
Start by testing that you can get the secondary struture information from the record. Add a some just before the above section of code to print the secondary structure information. You might use the print_secstr from the lecture as a guide.
Verify that the coordinates and the secondary structure type information match the values in the existing code and that both match what's in the file.
Once that works, change the new code so that instead of printing the coordinates and secondary structure type to the screen it calls axvspan with the right coordinate and color properties. Remember to remove the old axvspan calls so you know you're seeing the results of the new code.
There are no questions for this section.
Change the program so it accepts the input file name as a command-line program. When you do this remember to change the name of the program because it's no longer specific to bacteriorhodopin. Change it to something which makes sense for the new purpose of the program.
- What's the name of your new program?
If your program is named program_name then I should be able to run your program like this.
% python program_name 1Z9KA.gp % python program_name br.gpI'll test it on another data file of my own.
Test your program using chain A from 1Z9K.
- Save the plot as a PNG image named "1Z9KA"
- How well did it do in predicting transmembrane helicies? (If you implemented all three multiple filters, how did they compare?)
- How did you double-check that the helix structure in the record are actually transmembrane helix structures?
Find another GenPept structure with secondary structure information. (Hint: start by finding a PDB file since those are all in GenBank.) Save the GenPept record to a file in the assignment4 directory and use your program to view the hydrophobicity plot(s).
- What's the name of the GenPept file?
- Did your code work right the first time?
Part 4 (advanced programmers only)For the more advanced programmers. Accept a window size as an optional second argument. Use 19 as the default. For examples,
% program_name 1Z9KA.gpwill use the default window size of 19 and
% program_name 1Z9KA.gp 5will use a window size of 5.
Your code must handle all three three filter types. (In my lecture I linked to a module for computing the Savitzky-Golay weights given the window size.)
Check the input window size given on the command-line. If it's too small (under 5 residues) or is an even number then use sys.exit(msg) to print an error message and exit Python without doing the plot.
Copyright © 2001-2013 Andrew Dalke Scientific AB