![]() |
![]() |
![]() |
||||||||||||||||||||||||||||||||||||
![]() |
Writing HTML using PythonYou've written some HTML by hand. Here I'll show you how to write HTML using Python. There are better ways using HTML template languages which I'll talk about next week. But to understand them I think it's best to know how to do things manually first. I'm going to write a program which takes a GenBank file and makes an HTML page with a table. Each entry of the table will have the feature name, start position and end position. The name will be a hyperlink to a FASTA file containing the sequence data for that feature. The first step is to get the feature information from a FASTA file. For my input data I'll used AB077698 (originally from the BioJava distribution). You've already written a program to get GenBank feature data so I'll skip the Biopython specific part and show you the code. from Bio import GenBank parser = GenBank.RecordParser() record = parser.parse(open("AB077698.gb")) for feature in record.features: print "Feature", repr(feature.key), repr(feature.location)Here's the output Feature 'source' '1..2701' Feature 'gene' '1..2701' Feature "5'UTR" '<1..79' Feature 'CDS' '80..1144' Feature 'misc_feature' '137..196' Feature 'misc_feature' '239..292' Feature 'misc_feature' '617..676' Feature 'misc_feature' '725..778' Feature "3'UTR" '1145..2659' Feature 'polyA_site' '1606' Feature 'polyA_site' '2660'You can see there are three types of feature locations: just a position, a start and end position, and the strange one with the "<". That's called a fuzzy location and it means the start is at or to the left of position 1. (Actually, that one isn't really fuzzy. Consider (1.10)..(60.88) ). For details read the NCBI feature table definition. Biopython can parse the details of the feature table. Many times that information isn't needed and there's extra effort in parsing that data so there's another GenBank parser – FeatureParser – which is used for that data. from Bio import GenBank parser = GenBank.FeatureParser() record = parser.parse(open("AB077698.gb")) for feature in record.features: print "Feature", repr(feature.type), repr(feature.location)Note that I changed feature.key into feature.type. It would have been better if those were the same attribute names. Here's the output from running the FeatureParser-based code. Feature 'source' <Bio.SeqFeature.FeatureLocation instance at 0x1284b98> Feature 'gene' <Bio.SeqFeature.FeatureLocation instance at 0x128d260> Feature "5'UTR" <Bio.SeqFeature.FeatureLocation instance at 0x10efa58> Feature 'CDS' <Bio.SeqFeature.FeatureLocation instance at 0x12910a8> Feature 'misc_feature' <Bio.SeqFeature.FeatureLocation instance at 0x1288800> Feature 'misc_feature' <Bio.SeqFeature.FeatureLocation instance at 0x1280f08> Feature 'misc_feature' <Bio.SeqFeature.FeatureLocation instance at 0x1112440> Feature 'misc_feature' <Bio.SeqFeature.FeatureLocation instance at 0x1114cd8> Feature "3'UTR" <Bio.SeqFeature.FeatureLocation instance at 0x1118698> Feature 'polyA_site' <Bio.SeqFeature.FeatureLocation instance at 0x1116058> Feature 'polyA_site' <Bio.SeqFeature.FeatureLocation instance at 0x1114030>You can see that the location is now a FeatureLocation instance instead of a string. This object has ways to return the fuzzy and non-fuzzy location information. from Bio import GenBank parser = GenBank.FeatureParser() record = parser.parse(open("AB077698.gb")) for feature in record.features: print "Feature", repr(feature.type) loc = feature.location print " fuzzy", loc.start, loc.end print " nofuzzy", loc.nofuzzy_start, loc.nofuzzy_endThe output Feature 'source' fuzzy 0 2701 nofuzzy 0 2701 Feature 'gene' fuzzy 0 2701 nofuzzy 0 2701 Feature "5'UTR" fuzzy <0 79 nofuzzy 0 79 Feature 'CDS' fuzzy 79 1144 nofuzzy 79 1144 Feature 'misc_feature' fuzzy 136 196 nofuzzy 136 196 Feature 'misc_feature' fuzzy 238 292 nofuzzy 238 292 Feature 'misc_feature' fuzzy 616 676 nofuzzy 616 676 Feature 'misc_feature' fuzzy 724 778 nofuzzy 724 778 Feature "3'UTR" fuzzy 1144 2659 nofuzzy 1144 2659 Feature 'polyA_site' fuzzy 1605 1605 nofuzzy 1605 1605 Feature 'polyA_site' fuzzy 2659 2659 nofuzzy 2659 2659I'll use the non-fuzzy information because that's easier to deal with. Now that I have the coordinate information I want to make HTML output for each feature. I'll use a table from Bio import GenBank parser = GenBank.FeatureParser() record = parser.parse(open("AB077698.gb")) print """<html> <head> <title>Feature information</title> </head> <body> <table border="1">""" print "<tr><th>Feature</th><th>Start</th><th>End</th></tr>" for feature in record.features: loc = feature.location print <tr><td>%s</td><td>%s</td><td>%s</td></tr>" % ( feature.type, loc.nofuzzy_start, loc.nofuzzy_end) print """</table> </body></html>"""Here's the output <html> <head> <title>Feature information</title> </head> <body> <table border="1"> <tr><th>Feature</th><th>Start</th><th>End</th></tr> <tr><td>source</td><td>0</td><td>2701</td></tr> <tr><td>gene</td><td>0</td><td>2701</td></tr> <tr><td>5'UTR</td><td>0</td><td>79</td></tr> <tr><td>CDS</td><td>79</td><td>1144</td></tr> <tr><td>misc_feature</td><td>136</td><td>196</td></tr> <tr><td>misc_feature</td><td>238</td><td>292</td></tr> <tr><td>misc_feature</td><td>616</td><td>676</td></tr> <tr><td>misc_feature</td><td>724</td><td>778</td></tr> <tr><td>3'UTR</td><td>1144</td><td>2659</td></tr> <tr><td>polyA_site</td><td>1605</td><td>1605</td></tr> <tr><td>polyA_site</td><td>2659</td><td>2659</td></tr> </table> </body></html>and here's what the table looks like as HTML
I want to save the output to the file "features.html". I'll use a new bit of Python syntax for that. It's an extension of the print statement used to print to a file instead of to the screen outfile = open("hello.txt", "w") print >>outfile, "Hello!" The changes to the Python code are minimal from Bio import GenBank parser = GenBank.FeatureParser() record = parser.parse(open("AB077698.gb")) outfile = open("features.html", "w") print >>outfile, """<html> <head> <title>Feature information</title> </head> <body> <table border="1">""" print >>outfile, "<tr><th>Feature</th><th>Start</th><th>End</th></tr>" for feature in record.features: loc = feature.location print >>outfile, "<tr><td>%s</td><td>%s</td><td>%s</td></tr>" % ( feature.type, loc.nofuzzy_start, loc.nofuzzy_end) print >>outfile, """</table> </body></html>"""The output file. I also want to make FASTA files for the sequence from each feature. I'll name the first FASTA file "seq1.fasta", the second "seq2.fasta", and so on. First, the code to make the FASTA files. def save_fasta(filename, title, sequence): fasta_file = open(filename, "w") fasta_file.write(">" + title + "\n") for i in range(0, len(sequence), 72): fasta_file.write(sequence[i:i+72]) fasta_file.write("\n") fasta_file.close()You've seens this code before, though in a different form. Here it is in the full program: from Bio import GenBank def save_fasta(filename, title, sequence): fasta_file = open(filename, "w") fasta_file.write(">" + title + "\n") for i in range(0, len(sequence), 72): fasta_file.write(sequence[i:i+72]) fasta_file.write("\n") fasta_file.close() parser = GenBank.FeatureParser() record = parser.parse(open("AB077698.gb")) outfile = open("features.html", "w") print >>outfile, """<html> <head> <title>Feature information</title> </head> <body> <table border="1">""" print >>outfile, "<tr><th>Feature</th><th>Start</th><th>End</th></tr>" counter = 1 for feature in record.features: loc = feature.location start = loc.nofuzzy_start end = loc.nofuzzy_end # Make the FASTA file filename = "seq%s.fasta" % counter title = "Feature %s: %s" % (counter, feature.type) save_fasta(filename, title, record.seq.data[start:end+1]) print >>outfile, "<tr><td>%s</td><td>%s</td><td>%s</td></tr>" % ( feature.type, loc.start, loc.end) counter += 1 print >>outfile, """</table> </body></html>"""Here's the 9th record. Finally, I need to make a hyperlink from the feature's name to the FASTA record. In this case I only need an extra <a href="...">...</a>. from Bio import GenBank def save_fasta(filename, title, sequence): fasta_file = open(filename, "w") fasta_file.write(">" + title + "\n") for i in range(0, len(sequence), 72): fasta_file.write(sequence[i:i+72]) fasta_file.write("\n") fasta_file.close() parser = GenBank.FeatureParser() record = parser.parse(open("AB077698.gb")) outfile = open("features.html", "w") print >>outfile, """<html> <head> <title>Feature information</title> </head> <body> <table border="1">""" print >>outfile, "<tr><th>Feature</th><th>Start</th><th>End</th></tr>" counter = 1 for feature in record.features: loc = feature.location start = loc.nofuzzy_start end = loc.nofuzzy_end # Make the FASTA file filename = "seq%s.fasta" % counter title = "Feature %s: %s" % (counter, feature.type) save_fasta(filename, title, record.seq.data[start:end+1]) print >>outfile, '''<tr><td><a href="%s">%s</a></td><td>%s</td><td>%s</td></tr>''' % ( filename, feature.type, loc.start, loc.end) counter += 1 print >>outfile, """</table> </body></html>"""To see it in action, here is features.html. ![]() Contact Us | Home Copyright © 2001-2020 Andrew Dalke Scientific AB. All rights reserved. |
|||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||||||||||||||
![]() |