A taxonomy data browser
I'm going to show you how I developed a basic taxonomy browser using TurboGears, with the project stored in my SVN repository. I put a copy of the final code on the server in /home/andrew. It's running now at http://192.168.1.1:9080 if you want to see it in action.
Create the project in subversion
I started by making the project skeleton for subversion. Most projects will have this same initial structure. Note that I'll be using the "-m" flag to specify the check-in message on the command line while you'll almost certainly be using some text editor for this.
[~/nbn] % mkdir tax [~/nbn] % mkdir tax/trunk [~/nbn] % mkdir tax/branches [~/nbn] % mkdir tax/tags [~/nbn] % svn import tax file:///Users/dalke/svnrepos/taxonomy -m "initial import" Adding tax/trunk Adding tax/branches Adding tax/tags Committed revision 15. [~/nbn] %This is not revision 1 for me because I have other code in the repository.
My next step is to remove the skeleton code and restore it from subversion. I'll import the "trunk" into the local directory named "taxonomy". Intially it is empty.
[~/nbn] % rm -rf tax/ [~/nbn] % svn co file:///Users/dalke/svnrepos/taxonomy/trunk taxonomy Checked out revision 15. [~/nbn] % ls -l taxonomy/ [~/nbn] % cd taxonomy
I'll run TurboGears' quickstart inside the "taxonomy" directory. The name of the project is "TaxonomyServer" and the package name is "taxonomyserver".
[~/nbn] % tg-admin quickstart TaxonomyServer Enter package name [taxonomyserver]: Do you need Identity (usernames/passwords) in this project? [no] ...I'll add the new code into subversion. Adding a directory adds all of its subdirectories. I forgot to copy the output from doing. What I have are the commands I used to add the quickstart.
svn add TaxonomyServer svn commit -m "added taxonomy server from TG quickstart"
Setting up the database and model
I'll change the database name and show you the difference via "svn diff"
[~/nbn/taxonomy] % vi TaxonomyServer/dev.cfg [~/nbn/taxonomy] % svn diff TaxonomyServer/dev.cfg Index: TaxonomyServer/dev.cfg =================================================================== --- TaxonomyServer/dev.cfg (revision 16) +++ TaxonomyServer/dev.cfg (working copy) @@ -13,7 +13,7 @@ # If you have sqlite, here's a simple default to get you started # in development -sqlobject.dburi="sqlite://%(current_dir_uri)s/devdata.sqlite" +sqlobject.dburi="sqlite:///Users/dalke/nbn/taxonomy/taxdata.sqlite" # if you are using a database or table type without transactions
I'll create a new model.py file using a text editor. Getting it right took about an hour or so, and I had to delete and rebuild the data several times trying out alternatives. Here's the final version, in TaxonomyServer/taxonomyserver/model.py:
from sqlobject import *
from turbogears.database import PackageHub
hub = PackageHub("taxonomyserver")
__connection__ = hub
class Taxonomy(SQLObject):
class sqlmeta:
idName = "tax_id"
scientific_name = StringCol()
rank = StringCol()
parent = ForeignKey("Taxonomy")
children = MultipleJoin("Taxonomy", joinColumn="parent_id")
genetic_code = ForeignKey("GeneticCode")
mitochondrial_genetic_code = ForeignKey("GeneticCode")
class GeneticCode(SQLObject):
# use the default identifier name of 'id'
name = StringCol()
I defined two tables. A taxonomy record contains a link to its genetic code. I could have stored the genetic code name in the Taxonomy table but that would have been "denomalized" because the text data would be repeated once for every record. That's not so bad here but the original NCBI data also has the translation table for each genetic code, which is not something you want to reproduce for every Taxonomy record.
The Taxonomy record links to the GeneticCode through the attribute "genetic_code". This is a ForeignKey meaning that "genetic_code" is really named "genetic_code_id" and its value is the id for the cooresponding GeneticCode entry. This is a form of relationship and is the reason why this is a "relational database".
Each Taxonomy record has a link to its parent, excepting the root element. This is also a ForeignKey, the only difference being that it links to the primary identifier for the Taxonomy table and not the GeneticCode table. Again, while the Python variable is named "parent" the underlying SQL column is "parent_id". This the default, which is the usual database convention.
The MultipleJoin is a "one-to-many" relationship. It says to make a relationship named "children". If a given Taxonomy database has a primary id of X then asking for its children is done by finding all records in the "Taxonomy" database which have "parent_id" column equal to X.
Here's the SQL corresponding to the model definition
[~/nbn/taxonomy] % cd TaxonomyServer/
[~/nbn/taxonomy/TaxonomyServer] % tg-admin sql sql
Using database URI sqlite:///Users/dalke/nbn/taxonomy/taxdata.sqlite
CREATE TABLE genetic_code (
id INTEGER PRIMARY KEY,
name TEXT
);
CREATE TABLE taxonomy (
tax_id INTEGER PRIMARY KEY,
scientific_name TEXT,
rank TEXT,
parent_id INT,
genetic_code_id INT,
mitochondrial_genetic_code_id INT
);
I'll go ahead and create the database
[~/nbn/taxonomy/TaxonomyServer] % tg-admin sql create Using database URI sqlite:///Users/dalke/nbn/taxonomy/taxdata.sqlite [~/nbn/taxonomy/TaxonomyServer] %
Populating the database
With the schema defined (btw, it could be defined through sqlite and does not need to be SQLObject) I need to populate the database. While I could use SQLObject for it I've found that a specialized Python program talking to the underlying db-api layer to be easier and more understandable. Here's the loader, which I've named "load_taxdata.py" and placed in the top-directory of my project tree:
from pysqlite2 import dbapi2 as sqlite
# Reader for the file format used in the taxonomy files
#
# Field terminator is "\t|\t"
# Row terminator is "\t|\n"
def tax_reader(infile, field_names):
for line in infile:
assert line.endswith("\t|\n") # double-check for valid format
line = line[:-3]
fields = line.split("\t|\t")
assert len(fields) == len(field_names) # another double-check
# This can also be written as
# yield dict(zip(field_names, fields))
d = {}
for (name, field) in zip(field_names, fields):
d[name] = field
yield d
# Translate the empty string to None/NULL/undefined
def empty2none(s):
if s == "":
return None
return s
def main():
conn = sqlite.connect("./taxdata.sqlite")
cursor = conn.cursor()
### Load the genetic code data
# CREATE TABLE genetic_code (
# id INTEGER PRIMARY KEY,
# name TEXT
# );
print "Loading genetic codes"
for rec in tax_reader(open("tax/gencode.dmp"),
["id", "abbreviation", "name", "cde", "starts"]):
cursor.execute(
"INSERT INTO genetic_code VALUES (?, ?)",
(int(rec["id"]), rec["name"]))
### Load the taxonomy table
# CREATE TABLE taxonomy (
# tax_id INTEGER PRIMARY KEY,
# scientific_name TEXT UNIQUE,
# rank TEXT,
# parent_id INT,
# genetic_code_id INT,
# mitochondrial_genetic_code_id INT
# );
# First read the names
print "Loading names"
counter = 0
for rec in tax_reader(open("tax/names.dmp"),
["tax_id", "name_txt", "unique_name", "name_class"]):
if rec["name_class"] == "scientific name":
cursor.execute(
"INSERT INTO taxonomy (tax_id, scientific_name) VALUES (?, ?)",
(int(rec["tax_id"]), empty2none(rec["name_txt"])) )
counter += 1
if counter % 10000 == 0:
print counter, "..."
print "Done."
# Then read the taxonomy tree structure
print "Loading tree structure"
counter = 0
for rec in tax_reader(open("tax/nodes.dmp"),
["tax_id", "parent_tax_id", "rank", "embl_code",
"division_id", "div_flag",
"genetic_code_id", "inherited_gc_flag",
"mito_gc_id", "mito_gc_flag",
"hidden_flag", "hidden_subtree_flag", "comments"]):
cursor.execute(
"UPDATE taxonomy SET rank=?, parent_id=?, "
" genetic_code_id=?, mitochondrial_genetic_code_id=? "
"WHERE tax_id=?",
(rec["rank"], int(rec["parent_tax_id"]),
int(rec["genetic_code_id"]), int(rec["mito_gc_id"]),
int(rec["tax_id"])) )
counter += 1
if counter % 10000 == 0:
print counter, "..."
print "Done."
conn.commit()
if __name__ == "__main__":
main()
Quite a bit of code but about usual for this sort of thing. I'll use
it to populate the database
[~/nbn/taxonomy] % ~/cvses/python-svn/python.exe load_taxdata.py Loading genetic codes Loading names 10000 ... 20000 ... 30000 ... 40000 ... 50000 ... 60000 ... 70000 ... 80000 ... 90000 ... 100000 ... 110000 ... 120000 ... 130000 ... 140000 ... 150000 ... 160000 ... 170000 ... 180000 ... 190000 ... 200000 ... 210000 ... 220000 ... 230000 ... 240000 ... 250000 ... 260000 ... 270000 ... 280000 ... 290000 ... 300000 ... 310000 ... 320000 ... 330000 ... 340000 ... 350000 ... 360000 ... 370000 ... 380000 ... 390000 ... 400000 ... 410000 ... Done. Loading tree structure 10000 ... 20000 ... 30000 ... 40000 ... 50000 ... 60000 ... 70000 ... 80000 ... 90000 ... 100000 ... 110000 ... 120000 ... 130000 ... 140000 ... 150000 ... 160000 ... 170000 ... 180000 ... 190000 ... 200000 ... 210000 ... 220000 ... 230000 ... 240000 ... 250000 ... 260000 ... 270000 ... 280000 ... 290000 ... 300000 ... 310000 ... Done. [~/nbn/taxonomy] %
SQL Queries
I'll then double check that the data is there, and show you a few SQL queries.
[~/nbn/taxonomy] % sqlite3 taxdata.sqlite SQLite version 3.2.7 Enter ".help" for instructions sqlite> select count(*) from taxonomy; 316506 sqlite> select count(*) from genetic_code; 18 sqlite> sqlite> select * from taxonomy where genetic_code_id == 12; 5476|Candida albicans|species|5475|12|4 5480|Candida parapsilosis|species|5475|12|4 5481|Candida rugosa|species|5475|12|3 5482|Candida tropicalis|species|5475|12|3 5491|Candida melibiosica|species|5475|12|3 5493|Candida zeylanoides|species|5475|12|3 36911|Clavispora lusitaniae|species|36910|12|3 42374|Candida dubliniensis|species|5475|12|3 130814|Candida mycetangii|species|5475|12|3 237561|Candida albicans SC5314|no rank|5476|12|4 273371|Candida orthopsilosis|species|5475|12|4 284146|Gene disruption vector SAT1-flipper|species|45778|12|0 294747|Candida tropicalis MYA-3404|no rank|5482|12|3 294748|Candida albicans WO-1|no rank|5476|12|4 300021|Candida albicans var. stellatoidea|varietas|5476|12|4 306902|Clavispora lusitaniae ATCC 42720|no rank|36911|12|3 308923|Candida cellae|species|5475|12|4 308924|Candida riodocensis|species|5475|12|4 356546|Candida sp. AS 2.3072|species|5475|12|3 356547|Candida sp. AS 2.3073|species|5475|12|3 359171|Candida sp. HA 1671|species|5475|12|4 sqlite> select * from genetic_code where id == 12; 12|Alternative Yeast Nuclear sqlite> select taxonomy.scientific_name from taxonomy, genetic_code where taxonomy.genetic_code_id == genetic_code.id and genetic_code.name = 'Alternative Yeast Nuclear'; Candida albicans Candida parapsilosis Candida rugosa Candida tropicalis Candida melibiosica Candida zeylanoides Clavispora lusitaniae Candida dubliniensis Candida mycetangii Candida albicans SC5314 Candida orthopsilosis Gene disruption vector SAT1-flipper Candida tropicalis MYA-3404 Candida albicans WO-1 Candida albicans var. stellatoidea Clavispora lusitaniae ATCC 42720 Candida cellae Candida riodocensis Candida sp. AS 2.3072 Candida sp. AS 2.3073 Candida sp. HA 1671 sqlite>
If you try that out you'll find that the search seemed to hesitate before giving a result. The reason is it's missing an index. Searches across indexed fields can be a lot faster than unindexed searched. It's like the difference in looking something up in a Python list vs. a dictionary. There are tradeoffs; loading a database when indicies are enabled is slow, so what most people do is load the database and only afterwards do I index it:
sqlite> create index genetic_code_id_idx on taxonomy (genetic_code_id); sqlite> create index name_idx on genetic_code (name); sqlite> sqlite> create index parent_id_idx on taxonomy (parent_id); sqlite> .exitThe first two of these made the most recent SQL query go much faster. I also indexed the Taxonomy's parent_id because I'll be using it later to find the parent and children of a given node.
Now that I have a working loader script I'll add and commit it to the database:
[~/nbn/taxonomy] % svn add load_taxdata.py A load_taxdata.py [~/nbn/taxonomy] % svn commit load_taxdata.py -m "load the NCBI taxonomy data" Adding load_taxdata.py Transmitting file data . Committed revision 17. [~/nbn/taxonomy] %
SQLObject queries
I'll show you a bit about getting records from the server. Here's how to fetch a record given its primary key.
[~/nbn/taxonomy] % cd TaxonomyServer/ [~/nbn/taxonomy/TaxonomyServer] % tg-admin shell Python 2.4.2 (#6, Apr 15 2006, 11:26:48) [GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin Type "help", "copyright", "credits" or "license" for more information. (CustomShell) >>> import model >>> model.Taxonomy.get(1) <Taxonomy 1 scientific_name='root' rank='no rank' parentID=1 genetic_codeID=1 mitochondrial_genetic_codeID=0> >>> model.Taxonomy.get(2) <Taxonomy 2 scientific_name='Bacteria' rank='superkingdom' parentID=131567 genetic_codeID=11 mitochondrial_genetic_codeID=0> >>>Watch as I get the parent and children for a given node
>>> model.Taxonomy.get(2).parent <Taxonomy 131567 scientific_name='cellular organisms' rank='no rank' parentID=1 genetic_codeID=1 mitochondrial_genetic_codeID=0> >>> >>> model.Taxonomy.get(1).children [<Taxonomy 1 scientific_name='root' rank='no rank' parentID=1 genetic_codeID=1 mitochondrial_genetic_codeID=0>, <Taxonomy 10239 scientific_name='Viruses' rank='no rank' parentID=1 genetic_codeID=1 mitochondrial_genetic_codeID=0>, <Taxonomy 12884 scientific_name='Viroids' rank='no rank' parentID=1 genetic_codeID=1 mitochondrial_genetic_codeID=0>, <Taxonomy 12908 scientific_name="'unclassified seq...'" rank='no rank' parentID=1 genetic_codeID=1 mitochondrial_genetic_codeID=2>, <Taxonomy 28384 scientific_name='other sequences' rank='no rank' parentID=1 genetic_codeID=11 mitochondrial_genetic_codeID=0>, <Taxonomy 131567 scientific_name='cellular organisms' rank='no rank' parentID=1 genetic_codeID=1 mitochondrial_genetic _codeID=0>] >>> len(_) 6 >>> [child.scientific_name for child in model.Taxonomy.get(1).children] ['root', 'Viruses', 'Viroids', 'unclassified sequences', 'other sequences', 'cellular organisms'] >>>That was a bit strange. The root node links back to itself. I checked the original data file and indeed that's what it says to do. What I'll do is replace the root node's parent so it's None, which maps to the NULL object in SQL.
>>> model.Taxonomy.get(1).parent = None >>> model.hub.hub.commit() >>> [child.scientific_name for child in model.Taxonomy.get(1).children] ['Viruses', 'Viroids', 'unclassified sequences', 'other sequences', 'cellular organisms'] >>>Note the commit()!
If the record does not exist, SQLObject raises an exception
>>>
>>> model.Taxonomy.get(8)
Traceback (most recent call last):
File "<console>", line 1, in ?
File "/usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1457-py2.4.egg/sqlobject/main.py", line 912, in get
val._init(id, connection, selectResults)
File "/usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1457-py2.4.egg/sqlobject/main.py", line 957, in _init
raise SQLObjectNotFound, "The object %s by the ID %s does not exist" % (self.__class__.__name__, self.id)
SQLObjectNotFound: The object Taxonomy by the ID 8 does not exist
>>>
I'm going to do a few queries of the database, but this time using the SQLObject query language. It uses a special class attribute named "q". This is used to build a query expression for the select command (and a couple of other commands).
>>> from model import Taxonomy >>> Taxonomy.select(Taxonomy.q.id < 10) <SelectResults at 20a39b0> >>> for result in Taxonomy.select(Taxonomy.q.id < 10): ... print result.id, result.scientific_name ... 1 root 2 Bacteria 6 Azorhizobium 7 Azorhizobium caulinodans 9 Buchnera aphidicola >>> >>> for result in Taxonomy.select( ... AND(Taxonomy.q.id >= 10, Taxonomy.q.id <= 20)): ... print result.id, result.scientific_name ... 10 Cellvibrio 11 Cellvibrio gilvus 13 Dictyoglomus 14 Dictyoglomus thermophilum 16 Methylophilus 17 Methylophilus methylotrophus 18 Pelobacter 19 Pelobacter carbinolicus 20 Phenylobacterium >>> >>> for result in Taxonomy.select( ... AND(Taxonomy.q.genetic_codeID == GeneticCode.q.id, ... GeneticCode.q.name == "Alternative Yeast Nuclear")): ... print result.id, result.scientific_name ... 5476 Candida albicans 5480 Candida parapsilosis 5481 Candida rugosa 5482 Candida tropicalis 5491 Candida melibiosica 5493 Candida zeylanoides 36911 Clavispora lusitaniae 42374 Candida dubliniensis 130814 Candida mycetangii 237561 Candida albicans SC5314 273371 Candida orthopsilosis 284146 Gene disruption vector SAT1-flipper 294747 Candida tropicalis MYA-3404 294748 Candida albicans WO-1 300021 Candida albicans var. stellatoidea 306902 Clavispora lusitaniae ATCC 42720 308923 Candida cellae 308924 Candida riodocensis 356546 Candida sp. AS 2.3072 356547 Candida sp. AS 2.3073 359171 Candida sp. HA 1671
All of the databases support a text search mode, with at least basic support for fields which match, start with, end with, or match a given piece of text. Here I'll search for taxonomy records for armadillo:
>>> Taxonomy.q.scientific_name.contains("dasypus")
<SQLOp 20c4760>
>>> q=_
>>> list(Taxonomy.select(q))
[<Taxonomy 9360 scientific_name='Dasypus' rank='genus' parentID=9359
genetic_codeID=1 mitochondrial_genetic_codeID=2>, <Taxonomy 9361
scientific_name="'Dasypus novemcin...'" rank='species' parentID=9360
genetic_codeID=1 mitochondrial_genetic_codeID=2>, <Taxonomy 81630
scientific_name='Dasypus kappleri' rank='species' parentID=9360
genetic_codeID=1 mitochondrial_genetic_codeID=2>, <Taxonomy 286599
scientific_name="'Dasypus sp. VJL-...'" rank='species' parentID=9360
genetic_codeID=1 mitochondrial_genetic_codeID=2>, <Taxonomy 317145
scientific_name='Delichon dasypus' rank='species' parentID=88115
genetic_codeID=1 mitochondrial_genetic_codeID=2>]
>>> len(_)
5
>>> import time
>>> if 1:
... t1 = time.time()
... results = list(Taxonomy.select(q))
... t2 = time.time()
... print "Elapsed time", t2-t1
...
Elapsed time 1.18850398064
>>>
As you see at the end, "contain" searches are slow. I tried indexing
the field. That didn't help. I then checked the sqlite documentation
which says indexing does not help "contains" searches; it only helps
"startswith" searches. Other database engines may be faster.
I know the model is pretty good now so I'll check it into subversion
[~/nbn/taxonomy/TaxonomyServer] % svn commit taxonomyserver/model.py -m "initial taxonomy data model" Sending taxonomyserver/model.py Transmitting file data . Committed revision 18. [~/nbn/taxonomy/TaxonomyServer] % svn commit dev.cfg -m "set path to taxonomy database" Sending dev.cfg Transmitting file data . Committed revision 19. [~/nbn/taxonomy/TaxonomyServer] %
Making the web interface
I have data. I want to see it. I'll start by making a query form which looks like:
To make it I'll create the "search.kid" template with the following content:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://purl.org/kid/ns#"
py:extends="'master.kid'">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="content-type" py:replace="''"/>
<title>Search for taxonomy records</title>
</head>
<body>
<form method="GET" action="search">
Scientific name
<select name="searchtype">
<option value="substring">contains</option>
<option value="startswith">starts with</option>
<option value="exact">is exactly equal to</option>
</select>
<input name="text" /><input type="submit" value="Search" /><br />
Genetic Code
<select name="genetic_code">
<option value="any">Any</option>
<option py:for="code in codes" value="${code.id}">(${code.id}) ${code.name}</option>
</select>
</form>
</body>
</html>
The template needs to know the available genetic codes, which I will
get from the database and pass it in via the controller. Here are the
full initial contents of my "controller.py" file:
import logging
import cherrypy
import turbogears
from turbogears import controllers, expose, validate, redirect
from taxonomyserver import json
log = logging.getLogger("taxonomyserver.controllers")
import model
from model import GeneticCode
class Root(controllers.RootController):
@expose(template="taxonomyserver.templates.search")
def index(self):
return dict(codes=GeneticCode.select(orderBy=GeneticCode.q.id))
note the "orderBy". That orders the select command so the record with
the smallest id is first. The order of the results aren't guaranteed,
which can sometimes cause strange problems.
The search form needs search functionality. Here's an updated "search" method for the controllers.py file:
@expose(template="taxonomyserver.templates.results")
def search(self, text, searchtype="substring", genetic_code="any"):
if searchtype == "startswith":
query = Taxonomy.q.scientific_name.startswith(text)
elif searchtype == "exact":
query = Taxonomy.q.scientific_name == text
else:
query = Taxonomy.q.scientific_name.contains(text)
if genetic_code != "any":
try:
genetic_code = int(genetic_code)
except ValueError:
pass
else:
query = sqlobject.AND(
query, Taxonomy.q.genetic_codeID == genetic_code)
taxons = Taxonomy.select(query)
return dict(taxons=taxons)
You will likely need an "import sqlobject" somewhere earlier to make
this work.
The search results (a select iterator over taxons) is passed to this template, named "results.kid".
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://purl.org/kid/ns#"
py:extends="'master.kid'">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="content-type" py:replace="''"/>
<title>Search results</title>
</head>
<body>
<P py:if="not taxons">No taxons found.</P>
<P py:if="taxons">
<ul>
<li py:for="taxon in taxons">${taxon.id} - ${taxon.scientific_name}</li>
</ul>
</P>
<P>
<a href="/">Start another search.</a>
</P>
</body>
</html>
With that in place we're ready for searches. Here was the result for "yeast".
- 36547 - yeast plasmid pGKl2
- 36926 - Laodelphax striatellus yeast-like symbiont
- 36927 - Nilaparvata lugens yeast-like symbiont
- 36928 - Sogatella furcifera yeast-like symbiont
- 48273 - Lasioderma serricorne yeast-like symbiont
- 48274 - Stegobium paniceum yeast-like symbiont
- 50958 - basidiosporogenous yeast M2
- 56772 - Hamiltonaphis styraci yeast-like symbiont
- 111293 - Yeast two-hybrid vector pCD.1
- 111294 - Yeast two-hybrid vector pCD.2
- 111295 - Yeast two-hybrid vector pC-ACT.1
- 111296 - Yeast two-hybrid vector pC-ACT.2
- 118976 - Ricania japonica yeast-like symbiont
- 118977 - Geisha distinctissima yeast-like symbiont
- 118978 - Tuberaphis styraci yeast-like symbiont
- 118979 - Tuberaphis taiwana yeast-like symbiont
- 118980 - Tuberaphis takenouchii yeast-like symbiont
- 118981 - Glyphinaphis bambusae yeast-like symbiont
- 118982 - Cerataphis fransseni yeast-like symbiont
- 150230 - yeast-like fungal sp. UWO(PS)95-766.4
- 160692 - Antarctic yeast CBS 8943
- 160693 - Antarctic yeast CBS 8944
- 160694 - Antarctic yeast CBS 8928
- 160695 - Antarctic yeast CBS 8923
- 164423 - Antarctic yeast CBS 8927
- 164424 - Antarctic yeast CBS 8942
- 164425 - Antarctic yeast CBS 8938
- 164426 - Antarctic yeast CBS 8939
- 164427 - Antarctic yeast CBS 8940
- 164428 - Antarctic yeast ML 4515
- 164429 - Antarctic yeast CBS 8932
- 164430 - Antarctic yeast CBS 8929
- 164431 - Antarctic yeast CBS 8941
- 164432 - Antarctic yeast CBS 8913
- 164433 - Antarctic yeast CBS 8931
- 170514 - marine yeast Y5318
- 171191 - hot spring yeast RND13
- 191556 - yeast mitochondrial synthetic construct
- 204938 - yeast isolate H-1008
- 205958 - yeast isolate H-1006
- 205959 - yeast isolate H-1007
- 205960 - yeast isolate H-1009
- 205961 - yeast isolate H-1010
- 205962 - yeast isolate H-1011
- 210760 - basidiomycete yeast sp. HB1047
- 210761 - basidiomycete yeast sp. HA1554
- 211102 - hymenomycete yeast sp. SP-5
- 225995 - yeast sp. YUTI-CL166
- 232236 - Yeast truncation assay backbone vector pLSK870
- 234480 - basidiomycete yeast sp. BG01-7-21-008A-2-1
- 234481 - basidiomycete yeast sp. BG01-7-22-001C-1-1
- 234482 - basidiomycete yeast sp. BG01-7-22-006A-1-1
- 234483 - basidiomycete yeast sp. BG01-7-22-009A-1-1
- 234484 - basidiomycete yeast sp. BG01-7-23-019A-1-1
- 234485 - basidiomycete yeast sp. BG01-8-26-001B-1-1
- 234486 - basidiomycete yeast sp. BG01-8-5-001B-1-1
- 234487 - basidiomycete yeast sp. BG98-12-9-1-2
- 234488 - basidiomycete yeast sp. BG98-9-24-3-1
- 243657 - uncultured basidiomycete yeast
- 271368 - basidiomycete yeast sp. BG02-5-27-3-2-2
- 271369 - basidiomycete yeast sp. BG02-5-23-001-C2
- 271371 - basidiomycete yeast sp. BG02-5-23-003I-7
- 271372 - basidiomycete yeast sp. BG02-5-30-002A-3
- 271373 - basidiomycete yeast sp. BG02-5-30-002A-7
- 271374 - basidiomycete yeast sp. BG02-5-30-004A-1
- 271375 - basidiomycete yeast sp. BG02-5-30-005A-4
- 271376 - basidiomycete yeast sp. BG02-5-30-005A-5
- 271377 - basidiomycete yeast sp. BG02-6-15-006A-1
- 271378 - basidiomycete yeast sp. BG02-6-6-1-1
- 271379 - basidiomycete yeast sp. BG02-6-6-1-5
- 271380 - basidiomycete yeast sp. BG02-6-6-1-9
- 271381 - basidiomycete yeast sp. BG02-6-6-2-5
- 271382 - basidiomycete yeast sp. BG02-6-9-2
- 271383 - basidiomycete yeast sp. BG02-7-13-014-4-1
- 271384 - basidiomycete yeast sp. BG02-7-14-002A-1-1
- 271385 - basidiomycete yeast sp. BG02-7-15-015A-1-1
- 271386 - basidiomycete yeast sp. BG02-7-16-015A-1-1
- 271387 - basidiomycete yeast sp. BG02-7-17-001A-1-3
- 271388 - basidiomycete yeast sp. BG02-7-18-013E-1-1
- 271389 - basidiomycete yeast sp. BG02-7-18-013E-1-3
- 271390 - basidiomycete yeast sp. BG02-7-18-032B-1-1
- 271391 - basidiomycete yeast sp. BG02-7-20-006A-2-1
- 271392 - basidiomycete yeast sp. BG02-7-20-011B-1-2
- 271393 - basidiomycete yeast sp. BG02-7-21-004G-1-4
- 271394 - basidiomycete yeast sp. BG02-7-21-004Q-1-4
- 284151 - Yeast two-hybrid vector pAB8
- 284152 - Yeast two-hybrid vector pDG1
- 284153 - Yeast two-hybrid vector pDG2
- 284154 - Yeast two-hybrid vector pDG3
- 284155 - Yeast two-hybrid vector pDG4
- 284156 - Yeast two-hybrid vector pMK498-TEV
- 284157 - Yeast two-hybrid vector pMK500-TEV
- 284158 - Yeast two-hybrid vector pMK502-TEV
- 293968 - Yeast TAP expression vector pYL435
- 300263 - basidiomycete yeast sp. A11
- 308695 - Yeast expression vector pJG485
- 308696 - Yeast expression vector pJG484
- 308697 - Yeast expression vector pJG516
- 308698 - Yeast expression vector pJG518
- 308699 - Yeast expression vector pJG482
- 308700 - Yeast expression vector pJG483
- 308701 - Yeast expression vector pJG514
- 308702 - Yeast expression vector pJG515
- 329402 - Yeast expression vector p426GALL
- 356895 - Yeast centromeric expression vector p416GPD
- 378826 - basidiomycete yeast sp. DX-2006N
- 378827 - basidiomycete yeast sp. DX-2006O
- 378828 - basidiomycete yeast sp. DX-2006P
- 378829 - basidiomycete yeast sp. DX-2006Q
- 393769 - basidiomycete yeast sp. YNS8.4-120506
Using the 'default' handler
I want each one to be a hypertext link to more information about the record. Specifically, I want "/taxon/300263" to give more details about taxon 300263, etc. I'll change the template to make a link to that URL. That is, I'll change from
<li py:for="taxon in taxons">${taxon.id} - ${taxon.scientific_name}</li>
to
<li py:for="taxon in taxons"><a href="/taxon/${taxon.id}">${taxon.id} - ${taxon.scientific_name}</a></li>
I need a controller for it. This one will be a bit different than the previous ones. I don't want a Python function for every taxon id. I want a single Python function which will accept the URI component after "/taxon". For that I need a Python object which implements the "default" method
class TaxIdLookup(object):
@expose(template="taxonomyserver.templates.details")
def default(self, tax_id):
taxon = Taxonomy.get(tax_id)
return {"taxon": taxon}
class Root(controllers.RootController):
.... the following goes at the end of the existing class ...
taxon = TaxIdLookup()
The new "TaxIdLookup.default" method also needs an @expose. This one points to the "details.kid" file, which I'll show next. Note how I use the "genetic_code" and the "parent" and "children" links to get Python objects from the database
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://purl.org/kid/ns#"
py:extends="'master.kid'">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="content-type" py:replace="''"/>
<title>${taxon.scientific_name}</title>
</head>
<body>
<h2>${taxon.scientific_name}</h2>
<P>
Taxon identifier: ${taxon.id}<br />
Genetic Code: ${taxon.genetic_code.name}<br />
</P>
<P>
Parent taxon: <a href="/taxon/${taxon.parent.id}">${taxon.parent.scientific_name}</a><br />
Children:
<i py:if="not taxon.children">None</i>
<ul py:if="taxon.children">
<li py:for="child in taxon.children"><a href="/taxon/${child.id}">${child.scientific_name}</a></li>
</ul>
</P>
</body>
</html>
Here's what the page http://localhost:8080/taxon/5204 looks like. Assuming I have a server running.
Basidiomycota
Taxon identifier: 5204
Genetic Code: Standard
Parent taxon: Fungi
Children:
- Ustilaginomycetes
- Hymenomycetes
- Urediniomycetes
- unclassified Basidiomycota
- mitosporic Basidiomycota
- Basidiomycota incertae sedis
- environmental samples
- mycorrhizal samples
Bug fixes and improvements
It turns out there's is a bug in the code. It doesn't handle the root node, which has no parent. To make that work I changed
Parent taxon:
<a href="/taxon/${taxon.parent.id}">${taxon.parent.scientific_name}</a><br />
to
<i py:if="taxon.parent is None">no parent</i>
<a py:if="taxon.parent is not None"
href="/taxon/${taxon.parent.id}">${taxon.parent.scientific_name}</a><br />
Test that it works, making changes if needed. I'll add the three templates to version control.
svn add taxonomyserver/templates/search.kid svn add taxonomyserver/templates/results.kid svn add taxonomyserver/templates/details.kid svn commit -m "added basic search and browse capabilities"
I worked for a bit to improve the code. Here are the changes I made:
[~/nbn/taxonomy/TaxonomyServer] % svn diff
Index: taxonomyserver/templates/results.kid
===================================================================
--- taxonomyserver/templates/results.kid (revision 20)
+++ taxonomyserver/templates/results.kid (working copy)
@@ -11,6 +11,7 @@
<P py:if="not taxons">No taxons found.</P>
<P py:if="taxons">
+${taxons.count()} taxons found
<ul>
<li py:for="taxon in taxons"><a href="/taxon/${taxon.id}">${taxon.id} - ${taxon.scientific_name}</a></li>
</ul>
Index: taxonomyserver/templates/details.kid
===================================================================
--- taxonomyserver/templates/details.kid (revision 20)
+++ taxonomyserver/templates/details.kid (working copy)
@@ -8,7 +8,7 @@
</head>
<body>
-
+<a href="/">Start a new search</a>
<h2>${taxon.scientific_name}</h2>
<P>
Taxon identifier: ${taxon.id}<br />
[~/nbn/taxonomy/TaxonomyServer] % svn commit taxonomyserver/templates/results.kid -m "added count of how many records match"
Sending taxonomyserver/templates/results.kid
Transmitting file data .
Committed revision 21.
[~/nbn/taxonomy/TaxonomyServer] % svn commit taxonomyserver/templates/details.kid -m "added link to do a new search"
Sending taxonomyserver/templates/details.kid
Transmitting file data .
Committed revision 22.
[~/nbn/taxonomy/TaxonomyServer] % svn diff
Index: taxonomyserver/controllers.py
===================================================================
--- taxonomyserver/controllers.py (revision 20)
+++ taxonomyserver/controllers.py (working copy)
@@ -42,7 +42,7 @@
query = sqlobject.AND(
query, Taxonomy.q.genetic_codeID == genetic_codeID)
- taxons = Taxonomy.select(query)
+ taxons = list(Taxonomy.select(query))
return dict(taxons=taxons)
taxon = TaxIdLookup()
Index: taxonomyserver/templates/results.kid
===================================================================
--- taxonomyserver/templates/results.kid (revision 21)
+++ taxonomyserver/templates/results.kid (working copy)
@@ -11,7 +11,7 @@
<P py:if="not taxons">No taxons found.</P>
<P py:if="taxons">
-${taxons.count()} taxons found
+${len(taxons)} taxons found
<ul>
<li py:for="taxon in taxons"><a href="/taxon/${taxon.id}">${taxon.id} - ${taxon.scientific_name}</a></li>
</ul>
[~/nbn/taxonomy/TaxonomyServer] % svn diff
Index: taxonomyserver/controllers.py
===================================================================
--- taxonomyserver/controllers.py (revision 23)
+++ taxonomyserver/controllers.py (working copy)
@@ -40,7 +40,7 @@
pass
else:
query = sqlobject.AND(
- query, Taxonomy.q.genetic_codeID == genetic_codeID)
+ query, Taxonomy.q.genetic_codeID == genetic_code)
taxons = list(Taxonomy.select(query))
return dict(taxons=taxons)
[~/nbn/taxonomy/TaxonomyServer] % svn commit -m "fixed typo"
Sending TaxonomyServer/taxonomyserver/controllers.py
Transmitting file data .
Committed revision 24.
[~/nbn/taxonomy/TaxonomyServer] %
As you can see, several small ones but nothing major or complex.
Copyright © 2001-2008 Dalke Scientific Software, LLC.


