![]() |
![]() |
![]() |
||||||||||||||||||||||||||||||||||||
![]() |
Writing HTML using PythonYou've written some HTML by hand. Here I'll show you how to write HTML using Python. There are better ways using HTML template languages which I'll talk about next week. But to understand them I think it's best to know how to do things manually first. I'm going to write a program which takes a GenBank file and makes an HTML page with a table. Each entry of the table will have the feature name, start position and end position. The name will be a hyperlink to a FASTA file containing the sequence data for that feature. The first step is to get the feature information from a FASTA file. For my input data I'll used AB077698 (originally from the BioJava distribution). You've already written a program to get GenBank feature data so I'll skip the Biopython specific part and show you the code.
from Bio import GenBank
parser = GenBank.RecordParser()
record = parser.parse(open("AB077698.gb"))
for feature in record.features:
print "Feature", repr(feature.key), repr(feature.location)
Here's the output
Feature 'source' '1..2701' Feature 'gene' '1..2701' Feature "5'UTR" '<1..79' Feature 'CDS' '80..1144' Feature 'misc_feature' '137..196' Feature 'misc_feature' '239..292' Feature 'misc_feature' '617..676' Feature 'misc_feature' '725..778' Feature "3'UTR" '1145..2659' Feature 'polyA_site' '1606' Feature 'polyA_site' '2660'You can see there are three types of feature locations: just a position, a start and end position, and the strange one with the "<". That's called a fuzzy location and it means the start is at or to the left of position 1. (Actually, that one isn't really fuzzy. Consider (1.10)..(60.88) ). For details read the NCBI feature table definition. Biopython can parse the details of the feature table. Many times that information isn't needed and there's extra effort in parsing that data so there's another GenBank parser – FeatureParser – which is used for that data.
from Bio import GenBank
parser = GenBank.FeatureParser()
record = parser.parse(open("AB077698.gb"))
for feature in record.features:
print "Feature", repr(feature.type), repr(feature.location)
Note that I changed feature.key into feature.type.
It would have been better if those were the same attribute names.
Here's the output from running the FeatureParser-based code. Feature 'source' <Bio.SeqFeature.FeatureLocation instance at 0x1284b98> Feature 'gene' <Bio.SeqFeature.FeatureLocation instance at 0x128d260> Feature "5'UTR" <Bio.SeqFeature.FeatureLocation instance at 0x10efa58> Feature 'CDS' <Bio.SeqFeature.FeatureLocation instance at 0x12910a8> Feature 'misc_feature' <Bio.SeqFeature.FeatureLocation instance at 0x1288800> Feature 'misc_feature' <Bio.SeqFeature.FeatureLocation instance at 0x1280f08> Feature 'misc_feature' <Bio.SeqFeature.FeatureLocation instance at 0x1112440> Feature 'misc_feature' <Bio.SeqFeature.FeatureLocation instance at 0x1114cd8> Feature "3'UTR" <Bio.SeqFeature.FeatureLocation instance at 0x1118698> Feature 'polyA_site' <Bio.SeqFeature.FeatureLocation instance at 0x1116058> Feature 'polyA_site' <Bio.SeqFeature.FeatureLocation instance at 0x1114030>You can see that the location is now a FeatureLocation instance instead of a string. This object has ways to return the fuzzy and non-fuzzy location information.
from Bio import GenBank
parser = GenBank.FeatureParser()
record = parser.parse(open("AB077698.gb"))
for feature in record.features:
print "Feature", repr(feature.type)
loc = feature.location
print " fuzzy", loc.start, loc.end
print " nofuzzy", loc.nofuzzy_start, loc.nofuzzy_end
The output
Feature 'source' fuzzy 0 2701 nofuzzy 0 2701 Feature 'gene' fuzzy 0 2701 nofuzzy 0 2701 Feature "5'UTR" fuzzy <0 79 nofuzzy 0 79 Feature 'CDS' fuzzy 79 1144 nofuzzy 79 1144 Feature 'misc_feature' fuzzy 136 196 nofuzzy 136 196 Feature 'misc_feature' fuzzy 238 292 nofuzzy 238 292 Feature 'misc_feature' fuzzy 616 676 nofuzzy 616 676 Feature 'misc_feature' fuzzy 724 778 nofuzzy 724 778 Feature "3'UTR" fuzzy 1144 2659 nofuzzy 1144 2659 Feature 'polyA_site' fuzzy 1605 1605 nofuzzy 1605 1605 Feature 'polyA_site' fuzzy 2659 2659 nofuzzy 2659 2659I'll use the non-fuzzy information because that's easier to deal with. Now that I have the coordinate information I want to make HTML output for each feature. I'll use a table
from Bio import GenBank
parser = GenBank.FeatureParser()
record = parser.parse(open("AB077698.gb"))
print """<html>
<head>
<title>Feature information</title>
</head>
<body>
<table border="1">"""
print "<tr><th>Feature</th><th>Start</th><th>End</th></tr>"
for feature in record.features:
loc = feature.location
print <tr><td>%s</td><td>%s</td><td>%s</td></tr>" % (
feature.type, loc.nofuzzy_start, loc.nofuzzy_end)
print """</table>
</body></html>"""
Here's the output
<html> <head> <title>Feature information</title> </head> <body> <table border="1"> <tr><th>Feature</th><th>Start</th><th>End</th></tr> <tr><td>source</td><td>0</td><td>2701</td></tr> <tr><td>gene</td><td>0</td><td>2701</td></tr> <tr><td>5'UTR</td><td>0</td><td>79</td></tr> <tr><td>CDS</td><td>79</td><td>1144</td></tr> <tr><td>misc_feature</td><td>136</td><td>196</td></tr> <tr><td>misc_feature</td><td>238</td><td>292</td></tr> <tr><td>misc_feature</td><td>616</td><td>676</td></tr> <tr><td>misc_feature</td><td>724</td><td>778</td></tr> <tr><td>3'UTR</td><td>1144</td><td>2659</td></tr> <tr><td>polyA_site</td><td>1605</td><td>1605</td></tr> <tr><td>polyA_site</td><td>2659</td><td>2659</td></tr> </table> </body></html>and here's what the table looks like as HTML
I want to save the output to the file "features.html". I'll use a new bit of Python syntax for that. It's an extension of the print statement used to print to a file instead of to the screen
outfile = open("hello.txt", "w")
print >>outfile, "Hello!"
The changes to the Python code are minimal
from Bio import GenBank
parser = GenBank.FeatureParser()
record = parser.parse(open("AB077698.gb"))
outfile = open("features.html", "w")
print >>outfile, """<html>
<head>
<title>Feature information</title>
</head>
<body>
<table border="1">"""
print >>outfile, "<tr><th>Feature</th><th>Start</th><th>End</th></tr>"
for feature in record.features:
loc = feature.location
print >>outfile, "<tr><td>%s</td><td>%s</td><td>%s</td></tr>" % (
feature.type, loc.nofuzzy_start, loc.nofuzzy_end)
print >>outfile, """</table>
</body></html>"""
The output file.
I also want to make FASTA files for the sequence from each feature. I'll name the first FASTA file "seq1.fasta", the second "seq2.fasta", and so on. First, the code to make the FASTA files.
def save_fasta(filename, title, sequence):
fasta_file = open(filename, "w")
fasta_file.write(">" + title + "\n")
for i in range(0, len(sequence), 72):
fasta_file.write(sequence[i:i+72])
fasta_file.write("\n")
fasta_file.close()
You've seens this code before, though in a different form. Here it is
in the full program:
from Bio import GenBank
def save_fasta(filename, title, sequence):
fasta_file = open(filename, "w")
fasta_file.write(">" + title + "\n")
for i in range(0, len(sequence), 72):
fasta_file.write(sequence[i:i+72])
fasta_file.write("\n")
fasta_file.close()
parser = GenBank.FeatureParser()
record = parser.parse(open("AB077698.gb"))
outfile = open("features.html", "w")
print >>outfile, """<html>
<head>
<title>Feature information</title>
</head>
<body>
<table border="1">"""
print >>outfile, "<tr><th>Feature</th><th>Start</th><th>End</th></tr>"
counter = 1
for feature in record.features:
loc = feature.location
start = loc.nofuzzy_start
end = loc.nofuzzy_end
# Make the FASTA file
filename = "seq%s.fasta" % counter
title = "Feature %s: %s" % (counter, feature.type)
save_fasta(filename, title, record.seq.data[start:end+1])
print >>outfile, "<tr><td>%s</td><td>%s</td><td>%s</td></tr>" % (
feature.type, loc.start, loc.end)
counter += 1
print >>outfile, """</table>
</body></html>"""
Here's the 9th record.
Finally, I need to make a hyperlink from the feature's name to the FASTA record. In this case I only need an extra <a href="...">...</a>.
from Bio import GenBank
def save_fasta(filename, title, sequence):
fasta_file = open(filename, "w")
fasta_file.write(">" + title + "\n")
for i in range(0, len(sequence), 72):
fasta_file.write(sequence[i:i+72])
fasta_file.write("\n")
fasta_file.close()
parser = GenBank.FeatureParser()
record = parser.parse(open("AB077698.gb"))
outfile = open("features.html", "w")
print >>outfile, """<html>
<head>
<title>Feature information</title>
</head>
<body>
<table border="1">"""
print >>outfile, "<tr><th>Feature</th><th>Start</th><th>End</th></tr>"
counter = 1
for feature in record.features:
loc = feature.location
start = loc.nofuzzy_start
end = loc.nofuzzy_end
# Make the FASTA file
filename = "seq%s.fasta" % counter
title = "Feature %s: %s" % (counter, feature.type)
save_fasta(filename, title, record.seq.data[start:end+1])
print >>outfile, '''<tr><td><a href="%s">%s</a></td><td>%s</td><td>%s</td></tr>''' % (
filename, feature.type, loc.start, loc.end)
counter += 1
print >>outfile, """</table>
</body></html>"""
To see it in action, here is features.html.
Contact Us | Home Copyright © 2001-2008 Dalke Scientific Software, LLC. All rights reserved. |
|||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||