Dalke Scientific Software: More science. Less time. Products

A taxonomy data browser

I'm going to show you how I developed a basic taxonomy browser using TurboGears, with the project stored in my SVN repository. I put a copy of the final code on the server in /home/andrew. It's running now at http://192.168.1.1:9080 if you want to see it in action.

Create the project in subversion

I started by making the project skeleton for subversion. Most projects will have this same initial structure. Note that I'll be using the "-m" flag to specify the check-in message on the command line while you'll almost certainly be using some text editor for this.

[~/nbn] % mkdir tax
[~/nbn] % mkdir tax/trunk
[~/nbn] % mkdir tax/branches
[~/nbn] % mkdir tax/tags
[~/nbn] % svn import tax file:///Users/dalke/svnrepos/taxonomy -m "initial import"
Adding         tax/trunk
Adding         tax/branches
Adding         tax/tags

Committed revision 15.
[~/nbn] %
This is not revision 1 for me because I have other code in the repository.

My next step is to remove the skeleton code and restore it from subversion. I'll import the "trunk" into the local directory named "taxonomy". Intially it is empty.

[~/nbn] % rm -rf tax/

[~/nbn] % svn co file:///Users/dalke/svnrepos/taxonomy/trunk taxonomy
Checked out revision 15.
[~/nbn] % ls -l taxonomy/
[~/nbn] % cd taxonomy

I'll run TurboGears' quickstart inside the "taxonomy" directory. The name of the project is "TaxonomyServer" and the package name is "taxonomyserver".

[~/nbn] % tg-admin quickstart TaxonomyServer
Enter package name [taxonomyserver]: 
Do you need Identity (usernames/passwords) in this project? [no] 
  ...
I'll add the new code into subversion. Adding a directory adds all of its subdirectories. I forgot to copy the output from doing. What I have are the commands I used to add the quickstart.
svn add TaxonomyServer
svn commit -m "added taxonomy server from TG quickstart"

Setting up the database and model

I'll change the database name and show you the difference via "svn diff"

[~/nbn/taxonomy] % vi TaxonomyServer/dev.cfg
[~/nbn/taxonomy] % svn diff TaxonomyServer/dev.cfg
Index: TaxonomyServer/dev.cfg
===================================================================
--- TaxonomyServer/dev.cfg      (revision 16)
+++ TaxonomyServer/dev.cfg      (working copy)
@@ -13,7 +13,7 @@
 
 # If you have sqlite, here's a simple default to get you started
 # in development
-sqlobject.dburi="sqlite://%(current_dir_uri)s/devdata.sqlite"
+sqlobject.dburi="sqlite:///Users/dalke/nbn/taxonomy/taxdata.sqlite"
 
 
 # if you are using a database or table type without transactions

I'll create a new model.py file using a text editor. Getting it right took about an hour or so, and I had to delete and rebuild the data several times trying out alternatives. Here's the final version, in TaxonomyServer/taxonomyserver/model.py:

from sqlobject import *

from turbogears.database import PackageHub

hub = PackageHub("taxonomyserver")
__connection__ = hub

class Taxonomy(SQLObject):
    class sqlmeta:
        idName = "tax_id"

    scientific_name = StringCol()
    rank = StringCol()

    parent = ForeignKey("Taxonomy")
    children = MultipleJoin("Taxonomy", joinColumn="parent_id")

    genetic_code = ForeignKey("GeneticCode")
    mitochondrial_genetic_code = ForeignKey("GeneticCode")

class GeneticCode(SQLObject):
    # use the default identifier name of 'id'
    name = StringCol()

I defined two tables. A taxonomy record contains a link to its genetic code. I could have stored the genetic code name in the Taxonomy table but that would have been "denomalized" because the text data would be repeated once for every record. That's not so bad here but the original NCBI data also has the translation table for each genetic code, which is not something you want to reproduce for every Taxonomy record.

The Taxonomy record links to the GeneticCode through the attribute "genetic_code". This is a ForeignKey meaning that "genetic_code" is really named "genetic_code_id" and its value is the id for the cooresponding GeneticCode entry. This is a form of relationship and is the reason why this is a "relational database".

Each Taxonomy record has a link to its parent, excepting the root element. This is also a ForeignKey, the only difference being that it links to the primary identifier for the Taxonomy table and not the GeneticCode table. Again, while the Python variable is named "parent" the underlying SQL column is "parent_id". This the default, which is the usual database convention.

The MultipleJoin is a "one-to-many" relationship. It says to make a relationship named "children". If a given Taxonomy database has a primary id of X then asking for its children is done by finding all records in the "Taxonomy" database which have "parent_id" column equal to X.

Here's the SQL corresponding to the model definition

[~/nbn/taxonomy] % cd TaxonomyServer/
[~/nbn/taxonomy/TaxonomyServer] % tg-admin sql sql
Using database URI sqlite:///Users/dalke/nbn/taxonomy/taxdata.sqlite
CREATE TABLE genetic_code (
    id INTEGER PRIMARY KEY,
    name TEXT
);

CREATE TABLE taxonomy (
    tax_id INTEGER PRIMARY KEY,
    scientific_name TEXT,
    rank TEXT,
    parent_id INT,
    genetic_code_id INT,
    mitochondrial_genetic_code_id INT
);

I'll go ahead and create the database
[~/nbn/taxonomy/TaxonomyServer] % tg-admin sql create
Using database URI sqlite:///Users/dalke/nbn/taxonomy/taxdata.sqlite
[~/nbn/taxonomy/TaxonomyServer] % 

Populating the database

With the schema defined (btw, it could be defined through sqlite and does not need to be SQLObject) I need to populate the database. While I could use SQLObject for it I've found that a specialized Python program talking to the underlying db-api layer to be easier and more understandable. Here's the loader, which I've named "load_taxdata.py" and placed in the top-directory of my project tree:

from pysqlite2 import dbapi2 as sqlite

# Reader for the file format used in the taxonomy files
#
# Field terminator is "\t|\t"
# Row terminator is "\t|\n"

def tax_reader(infile, field_names):
    for line in infile:
        assert line.endswith("\t|\n")  # double-check for valid format
        line = line[:-3]
        fields = line.split("\t|\t")
        assert len(fields) == len(field_names)  # another double-check

        # This can also be written as
        #     yield dict(zip(field_names, fields))
        d = {}
        for (name, field) in zip(field_names, fields):
            d[name] = field
        yield d

# Translate the empty string to None/NULL/undefined
def empty2none(s):
    if s == "":
        return None
    return s

def main():
    conn = sqlite.connect("./taxdata.sqlite")
    cursor = conn.cursor()

    ### Load the genetic code data
    # CREATE TABLE genetic_code (
    #     id INTEGER PRIMARY KEY,
    #     name TEXT
    # );
    print "Loading genetic codes"
    for rec in tax_reader(open("tax/gencode.dmp"),
                          ["id", "abbreviation", "name", "cde", "starts"]):
        cursor.execute(
            "INSERT INTO genetic_code VALUES (?, ?)",
            (int(rec["id"]), rec["name"]))

    ### Load the taxonomy table
    # CREATE TABLE taxonomy (
    #     tax_id INTEGER PRIMARY KEY,
    #     scientific_name TEXT UNIQUE,
    #     rank TEXT,
    #     parent_id INT,
    #     genetic_code_id INT,
    #     mitochondrial_genetic_code_id INT
    # );

    # First read the names
    print "Loading names"
    counter = 0
    for rec in tax_reader(open("tax/names.dmp"),
                          ["tax_id", "name_txt", "unique_name", "name_class"]):
        if rec["name_class"] == "scientific name":
            cursor.execute(
                "INSERT INTO taxonomy (tax_id, scientific_name) VALUES (?, ?)",
                (int(rec["tax_id"]), empty2none(rec["name_txt"])) )
        counter += 1
        if counter % 10000 == 0:
            print counter, "..."
    print "Done."
    

    # Then read the taxonomy tree structure
    print "Loading tree structure"
    counter = 0
    for rec in tax_reader(open("tax/nodes.dmp"),
                          ["tax_id", "parent_tax_id", "rank", "embl_code",
                           "division_id", "div_flag",
                           "genetic_code_id", "inherited_gc_flag",
                           "mito_gc_id", "mito_gc_flag",
                           "hidden_flag", "hidden_subtree_flag", "comments"]):
        cursor.execute(
            "UPDATE taxonomy SET rank=?, parent_id=?, "
            "   genetic_code_id=?, mitochondrial_genetic_code_id=? "
            "WHERE tax_id=?",
            (rec["rank"], int(rec["parent_tax_id"]),
             int(rec["genetic_code_id"]), int(rec["mito_gc_id"]),
             int(rec["tax_id"])) )
        counter += 1
        if counter % 10000 == 0:
            print counter, "..."
    print "Done."

    conn.commit()

if __name__ == "__main__":
    main()
Quite a bit of code but about usual for this sort of thing. I'll use it to populate the database
[~/nbn/taxonomy] % ~/cvses/python-svn/python.exe load_taxdata.py
Loading genetic codes
Loading names
10000 ...
20000 ...
30000 ...
40000 ...
50000 ...
60000 ...
70000 ...
80000 ...
90000 ...
100000 ...
110000 ...
120000 ...
130000 ...
140000 ...
150000 ...
160000 ...
170000 ...
180000 ...
190000 ...
200000 ...
210000 ...
220000 ...
230000 ...
240000 ...
250000 ...
260000 ...
270000 ...
280000 ...
290000 ...
300000 ...
310000 ...
320000 ...
330000 ...
340000 ...
350000 ...
360000 ...
370000 ...
380000 ...
390000 ...
400000 ...
410000 ...
Done.
Loading tree structure
10000 ...
20000 ...
30000 ...
40000 ...
50000 ...
60000 ...
70000 ...
80000 ...
90000 ...
100000 ...
110000 ...
120000 ...
130000 ...
140000 ...
150000 ...
160000 ...
170000 ...
180000 ...
190000 ...
200000 ...
210000 ...
220000 ...
230000 ...
240000 ...
250000 ...
260000 ...
270000 ...
280000 ...
290000 ...
300000 ...
310000 ...
Done.
[~/nbn/taxonomy] % 

SQL Queries

I'll then double check that the data is there, and show you a few SQL queries.

[~/nbn/taxonomy] % sqlite3 taxdata.sqlite
SQLite version 3.2.7
Enter ".help" for instructions
sqlite> select count(*) from taxonomy;
316506
sqlite> select count(*) from genetic_code;
18
sqlite> 

sqlite> select * from taxonomy where genetic_code_id == 12;
5476|Candida albicans|species|5475|12|4
5480|Candida parapsilosis|species|5475|12|4
5481|Candida rugosa|species|5475|12|3
5482|Candida tropicalis|species|5475|12|3
5491|Candida melibiosica|species|5475|12|3
5493|Candida zeylanoides|species|5475|12|3
36911|Clavispora lusitaniae|species|36910|12|3
42374|Candida dubliniensis|species|5475|12|3
130814|Candida mycetangii|species|5475|12|3
237561|Candida albicans SC5314|no rank|5476|12|4
273371|Candida orthopsilosis|species|5475|12|4
284146|Gene disruption vector SAT1-flipper|species|45778|12|0
294747|Candida tropicalis MYA-3404|no rank|5482|12|3
294748|Candida albicans WO-1|no rank|5476|12|4
300021|Candida albicans var. stellatoidea|varietas|5476|12|4
306902|Clavispora lusitaniae ATCC 42720|no rank|36911|12|3
308923|Candida cellae|species|5475|12|4
308924|Candida riodocensis|species|5475|12|4
356546|Candida sp. AS 2.3072|species|5475|12|3
356547|Candida sp. AS 2.3073|species|5475|12|3
359171|Candida sp. HA 1671|species|5475|12|4
sqlite> select * from genetic_code where id == 12;
12|Alternative Yeast Nuclear
sqlite> select taxonomy.scientific_name from taxonomy, genetic_code where taxonomy.genetic_code_id == genetic_code.id and genetic_code.name = 'Alternative Yeast Nuclear';
Candida albicans
Candida parapsilosis
Candida rugosa
Candida tropicalis
Candida melibiosica
Candida zeylanoides
Clavispora lusitaniae
Candida dubliniensis
Candida mycetangii
Candida albicans SC5314
Candida orthopsilosis
Gene disruption vector SAT1-flipper
Candida tropicalis MYA-3404
Candida albicans WO-1
Candida albicans var. stellatoidea
Clavispora lusitaniae ATCC 42720
Candida cellae
Candida riodocensis
Candida sp. AS 2.3072
Candida sp. AS 2.3073
Candida sp. HA 1671
sqlite>

If you try that out you'll find that the search seemed to hesitate before giving a result. The reason is it's missing an index. Searches across indexed fields can be a lot faster than unindexed searched. It's like the difference in looking something up in a Python list vs. a dictionary. There are tradeoffs; loading a database when indicies are enabled is slow, so what most people do is load the database and only afterwards do I index it:

sqlite> create index genetic_code_id_idx on taxonomy (genetic_code_id);
sqlite> create index name_idx on genetic_code (name);
sqlite>
sqlite> create index parent_id_idx on taxonomy (parent_id);
sqlite> .exit
The first two of these made the most recent SQL query go much faster. I also indexed the Taxonomy's parent_id because I'll be using it later to find the parent and children of a given node.

Now that I have a working loader script I'll add and commit it to the database:

[~/nbn/taxonomy] % svn add load_taxdata.py
A         load_taxdata.py
[~/nbn/taxonomy] % svn commit load_taxdata.py -m "load the NCBI taxonomy data"
Adding         load_taxdata.py
Transmitting file data .
Committed revision 17.
[~/nbn/taxonomy] % 

SQLObject queries

I'll show you a bit about getting records from the server. Here's how to fetch a record given its primary key.

[~/nbn/taxonomy] % cd TaxonomyServer/
[~/nbn/taxonomy/TaxonomyServer] % tg-admin shell
Python 2.4.2 (#6, Apr 15 2006, 11:26:48) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(CustomShell)
>>> import model
>>> model.Taxonomy.get(1)
<Taxonomy 1 scientific_name='root' rank='no rank' parentID=1 genetic_codeID=1 mitochondrial_genetic_codeID=0>
>>> model.Taxonomy.get(2)
<Taxonomy 2 scientific_name='Bacteria' rank='superkingdom' parentID=131567 genetic_codeID=11 mitochondrial_genetic_codeID=0>
>>> 
Watch as I get the parent and children for a given node
>>> model.Taxonomy.get(2).parent
<Taxonomy 131567 scientific_name='cellular organisms' rank='no rank' parentID=1 genetic_codeID=1 mitochondrial_genetic_codeID=0>
>>> 
>>> model.Taxonomy.get(1).children
[<Taxonomy 1 scientific_name='root' rank='no rank' parentID=1 genetic_codeID=1
 mitochondrial_genetic_codeID=0>, <Taxonomy 10239 scientific_name='Viruses'
 rank='no rank' parentID=1 genetic_codeID=1 mitochondrial_genetic_codeID=0>,
 <Taxonomy 12884 scientific_name='Viroids' rank='no rank' parentID=1 
 genetic_codeID=1 mitochondrial_genetic_codeID=0>, <Taxonomy 12908
 scientific_name="'unclassified seq...'" rank='no rank' parentID=1
 genetic_codeID=1 mitochondrial_genetic_codeID=2>, <Taxonomy 28384
 scientific_name='other sequences' rank='no rank' parentID=1 genetic_codeID=11
 mitochondrial_genetic_codeID=0>, <Taxonomy 131567 scientific_name='cellular
 organisms' rank='no rank' parentID=1 genetic_codeID=1 mitochondrial_genetic
_codeID=0>]
>>> len(_)
6
>>> [child.scientific_name for child in model.Taxonomy.get(1).children]
['root', 'Viruses', 'Viroids', 'unclassified sequences', 'other sequences',
 'cellular organisms']
>>> 
That was a bit strange. The root node links back to itself. I checked the original data file and indeed that's what it says to do. What I'll do is replace the root node's parent so it's None, which maps to the NULL object in SQL.
>>> model.Taxonomy.get(1).parent = None
>>> model.hub.hub.commit()
>>> [child.scientific_name for child in model.Taxonomy.get(1).children]
['Viruses', 'Viroids', 'unclassified sequences', 'other sequences',
 'cellular organisms']
>>> 
Note the commit()!

If the record does not exist, SQLObject raises an exception

>>> 
>>> model.Taxonomy.get(8)
Traceback (most recent call last):
  File "<console>", line 1, in ?
  File "/usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1457-py2.4.egg/sqlobject/main.py", line 912, in get
    val._init(id, connection, selectResults)
  File "/usr/local/lib/python2.4/site-packages/SQLObject-0.7.1dev_r1457-py2.4.egg/sqlobject/main.py", line 957, in _init
    raise SQLObjectNotFound, "The object %s by the ID %s does not exist" % (self.__class__.__name__, self.id)
SQLObjectNotFound: The object Taxonomy by the ID 8 does not exist
>>>

I'm going to do a few queries of the database, but this time using the SQLObject query language. It uses a special class attribute named "q". This is used to build a query expression for the select command (and a couple of other commands).

>>> from model import Taxonomy         
>>> Taxonomy.select(Taxonomy.q.id < 10)
<SelectResults at 20a39b0>
>>> for result in Taxonomy.select(Taxonomy.q.id < 10):
...   print result.id, result.scientific_name
... 
1 root
2 Bacteria
6 Azorhizobium
7 Azorhizobium caulinodans
9 Buchnera aphidicola
>>> 
>>> for result in Taxonomy.select(
...                 AND(Taxonomy.q.id >= 10, Taxonomy.q.id <= 20)):
...   print result.id, result.scientific_name
... 
10 Cellvibrio
11 Cellvibrio gilvus
13 Dictyoglomus
14 Dictyoglomus thermophilum
16 Methylophilus
17 Methylophilus methylotrophus
18 Pelobacter
19 Pelobacter carbinolicus
20 Phenylobacterium
>>> 

>>> for result in Taxonomy.select(
...      AND(Taxonomy.q.genetic_codeID == GeneticCode.q.id,
...          GeneticCode.q.name == "Alternative Yeast Nuclear")):
...     print result.id, result.scientific_name
... 
5476 Candida albicans
5480 Candida parapsilosis
5481 Candida rugosa
5482 Candida tropicalis
5491 Candida melibiosica
5493 Candida zeylanoides
36911 Clavispora lusitaniae
42374 Candida dubliniensis
130814 Candida mycetangii
237561 Candida albicans SC5314
273371 Candida orthopsilosis
284146 Gene disruption vector SAT1-flipper
294747 Candida tropicalis MYA-3404
294748 Candida albicans WO-1
300021 Candida albicans var. stellatoidea
306902 Clavispora lusitaniae ATCC 42720
308923 Candida cellae
308924 Candida riodocensis
356546 Candida sp. AS 2.3072
356547 Candida sp. AS 2.3073
359171 Candida sp. HA 1671

All of the databases support a text search mode, with at least basic support for fields which match, start with, end with, or match a given piece of text. Here I'll search for taxonomy records for armadillo:

>>> Taxonomy.q.scientific_name.contains("dasypus")
<SQLOp 20c4760>
>>> q=_
>>> list(Taxonomy.select(q))
[<Taxonomy 9360 scientific_name='Dasypus' rank='genus' parentID=9359
 genetic_codeID=1 mitochondrial_genetic_codeID=2>, <Taxonomy 9361
 scientific_name="'Dasypus novemcin...'" rank='species' parentID=9360
 genetic_codeID=1 mitochondrial_genetic_codeID=2>, <Taxonomy 81630
 scientific_name='Dasypus kappleri' rank='species' parentID=9360
 genetic_codeID=1 mitochondrial_genetic_codeID=2>, <Taxonomy 286599
 scientific_name="'Dasypus sp. VJL-...'" rank='species' parentID=9360
 genetic_codeID=1 mitochondrial_genetic_codeID=2>, <Taxonomy 317145
 scientific_name='Delichon dasypus' rank='species' parentID=88115
 genetic_codeID=1 mitochondrial_genetic_codeID=2>]
>>> len(_)
5
>>> import time
>>> if 1:        
...   t1 = time.time()
...   results = list(Taxonomy.select(q))
...   t2 = time.time()
...   print "Elapsed time", t2-t1
... 
Elapsed time 1.18850398064
>>> 
As you see at the end, "contain" searches are slow. I tried indexing the field. That didn't help. I then checked the sqlite documentation which says indexing does not help "contains" searches; it only helps "startswith" searches. Other database engines may be faster.

I know the model is pretty good now so I'll check it into subversion

[~/nbn/taxonomy/TaxonomyServer] % svn commit taxonomyserver/model.py -m "initial taxonomy data model"
Sending        taxonomyserver/model.py
Transmitting file data .
Committed revision 18.
[~/nbn/taxonomy/TaxonomyServer] % svn commit dev.cfg -m "set path to taxonomy database"
Sending        dev.cfg
Transmitting file data .
Committed revision 19.
[~/nbn/taxonomy/TaxonomyServer] % 

Making the web interface

I have data. I want to see it. I'll start by making a query form which looks like:

Scientific name
Genetic Code

To make it I'll create the "search.kid" template with the following content:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://purl.org/kid/ns#"
    py:extends="'master.kid'">

<head>
    <meta content="text/html; charset=UTF-8" http-equiv="content-type" py:replace="''"/>
    <title>Search for taxonomy records</title>
</head>

<body>
<form method="GET" action="search">
Scientific name
<select name="searchtype">
 <option value="substring">contains</option>
 <option value="startswith">starts with</option>
 <option value="exact">is exactly equal to</option>
</select>
<input name="text" /><input type="submit" value="Search" /><br />
Genetic Code
<select name="genetic_code">
 <option value="any">Any</option>
 <option py:for="code in codes" value="${code.id}">(${code.id}) ${code.name}</option>
</select>

</form>
</body>
</html>
The template needs to know the available genetic codes, which I will get from the database and pass it in via the controller. Here are the full initial contents of my "controller.py" file:
import logging

import cherrypy

import turbogears
from turbogears import controllers, expose, validate, redirect

from taxonomyserver import json

log = logging.getLogger("taxonomyserver.controllers")

import model
from model import GeneticCode

class Root(controllers.RootController):
    @expose(template="taxonomyserver.templates.search")
    def index(self):
        return dict(codes=GeneticCode.select(orderBy=GeneticCode.q.id))
note the "orderBy". That orders the select command so the record with the smallest id is first. The order of the results aren't guaranteed, which can sometimes cause strange problems.

The search form needs search functionality. Here's an updated "search" method for the controllers.py file:

    @expose(template="taxonomyserver.templates.results")
    def search(self, text, searchtype="substring", genetic_code="any"):
        if searchtype == "startswith":
            query = Taxonomy.q.scientific_name.startswith(text)
        elif searchtype == "exact":
            query = Taxonomy.q.scientific_name == text
        else:
            query = Taxonomy.q.scientific_name.contains(text)

        if genetic_code != "any":
            try:
                genetic_code = int(genetic_code)
            except ValueError:
                pass
            else:
                query = sqlobject.AND(
                    query, Taxonomy.q.genetic_codeID == genetic_code)

        taxons = Taxonomy.select(query)
        return dict(taxons=taxons)
You will likely need an "import sqlobject" somewhere earlier to make this work.

The search results (a select iterator over taxons) is passed to this template, named "results.kid".

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://purl.org/kid/ns#"
    py:extends="'master.kid'">

<head>
    <meta content="text/html; charset=UTF-8" http-equiv="content-type" py:replace="''"/>
    <title>Search results</title>
</head>

<body>

<P py:if="not taxons">No taxons found.</P>
<P py:if="taxons">
<ul>
 <li py:for="taxon in taxons">${taxon.id} - ${taxon.scientific_name}</li>
</ul>
</P>

<P>
<a href="/">Start another search.</a>
</P>

</body>
</html>

With that in place we're ready for searches. Here was the result for "yeast".

Using the 'default' handler

I want each one to be a hypertext link to more information about the record. Specifically, I want "/taxon/300263" to give more details about taxon 300263, etc. I'll change the template to make a link to that URL. That is, I'll change from

 <li py:for="taxon in taxons">${taxon.id} - ${taxon.scientific_name}</li>
to
 <li py:for="taxon in taxons"><a href="/taxon/${taxon.id}">${taxon.id} - ${taxon.scientific_name}</a></li>

I need a controller for it. This one will be a bit different than the previous ones. I don't want a Python function for every taxon id. I want a single Python function which will accept the URI component after "/taxon". For that I need a Python object which implements the "default" method

class TaxIdLookup(object):
    @expose(template="taxonomyserver.templates.details")
    def default(self, tax_id):
        taxon = Taxonomy.get(tax_id)
        return {"taxon": taxon}

class Root(controllers.RootController):
    .... the following goes at the end of the existing class ...
    taxon = TaxIdLookup()

The new "TaxIdLookup.default" method also needs an @expose. This one points to the "details.kid" file, which I'll show next. Note how I use the "genetic_code" and the "parent" and "children" links to get Python objects from the database

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://purl.org/kid/ns#"
    py:extends="'master.kid'">

<head>
    <meta content="text/html; charset=UTF-8" http-equiv="content-type" py:replace="''"/>
    <title>${taxon.scientific_name}</title>
</head>

<body>

<h2>${taxon.scientific_name}</h2>
<P>
Taxon identifier: ${taxon.id}<br />
Genetic Code: ${taxon.genetic_code.name}<br />
</P>

<P>
Parent taxon: <a href="/taxon/${taxon.parent.id}">${taxon.parent.scientific_name}</a><br />
Children:
<i py:if="not taxon.children">None</i>
<ul py:if="taxon.children">
 <li py:for="child in taxon.children"><a href="/taxon/${child.id}">${child.scientific_name}</a></li>
</ul>
</P>

</body>
</html>

Here's what the page http://localhost:8080/taxon/5204 looks like. Assuming I have a server running.

Basidiomycota

Taxon identifier: 5204
Genetic Code: Standard

Parent taxon: Fungi
Children:

Bug fixes and improvements

It turns out there's is a bug in the code. It doesn't handle the root node, which has no parent. To make that work I changed

Parent taxon:
<a href="/taxon/${taxon.parent.id}">${taxon.parent.scientific_name}</a><br />
to
<i py:if="taxon.parent is None">no parent</i>
<a py:if="taxon.parent is not None"
   href="/taxon/${taxon.parent.id}">${taxon.parent.scientific_name}</a><br />

Test that it works, making changes if needed. I'll add the three templates to version control.

svn add taxonomyserver/templates/search.kid
svn add taxonomyserver/templates/results.kid
svn add taxonomyserver/templates/details.kid
svn commit -m "added basic search and browse capabilities"

I worked for a bit to improve the code. Here are the changes I made:

[~/nbn/taxonomy/TaxonomyServer] % svn diff
Index: taxonomyserver/templates/results.kid
===================================================================
--- taxonomyserver/templates/results.kid        (revision 20)
+++ taxonomyserver/templates/results.kid        (working copy)
@@ -11,6 +11,7 @@
 
 <P py:if="not taxons">No taxons found.</P>
 <P py:if="taxons">
+${taxons.count()} taxons found
 <ul>
  <li py:for="taxon in taxons"><a href="/taxon/${taxon.id}">${taxon.id} - ${taxon.scientific_name}</a></li>
 </ul>
Index: taxonomyserver/templates/details.kid
===================================================================
--- taxonomyserver/templates/details.kid        (revision 20)
+++ taxonomyserver/templates/details.kid        (working copy)
@@ -8,7 +8,7 @@
 </head>
 
 <body>
-
+<a href="/">Start a new search</a>
 <h2>${taxon.scientific_name}</h2>
 <P>
 Taxon identifier: ${taxon.id}<br />


[~/nbn/taxonomy/TaxonomyServer] % svn commit taxonomyserver/templates/results.kid -m "added count of how many records match"
Sending        taxonomyserver/templates/results.kid
Transmitting file data .
Committed revision 21.
[~/nbn/taxonomy/TaxonomyServer] % svn commit taxonomyserver/templates/details.kid -m "added link to do a new search"
Sending        taxonomyserver/templates/details.kid
Transmitting file data .
Committed revision 22.

[~/nbn/taxonomy/TaxonomyServer] % svn diff 
Index: taxonomyserver/controllers.py
===================================================================
--- taxonomyserver/controllers.py       (revision 20)
+++ taxonomyserver/controllers.py       (working copy)
@@ -42,7 +42,7 @@
                 query = sqlobject.AND(
                     query, Taxonomy.q.genetic_codeID == genetic_codeID)
 
-        taxons = Taxonomy.select(query)
+        taxons = list(Taxonomy.select(query))
         return dict(taxons=taxons)
 
     taxon = TaxIdLookup()
Index: taxonomyserver/templates/results.kid
===================================================================
--- taxonomyserver/templates/results.kid        (revision 21)
+++ taxonomyserver/templates/results.kid        (working copy)
@@ -11,7 +11,7 @@
 
 <P py:if="not taxons">No taxons found.</P>
 <P py:if="taxons">
-${taxons.count()} taxons found
+${len(taxons)} taxons found
 <ul>
  <li py:for="taxon in taxons"><a href="/taxon/${taxon.id}">${taxon.id} - ${taxon.scientific_name}</a></li>
 </ul>


[~/nbn/taxonomy/TaxonomyServer] % svn diff
Index: taxonomyserver/controllers.py
===================================================================
--- taxonomyserver/controllers.py       (revision 23)
+++ taxonomyserver/controllers.py       (working copy)
@@ -40,7 +40,7 @@
                 pass
             else:
                 query = sqlobject.AND(
-                    query, Taxonomy.q.genetic_codeID == genetic_codeID)
+                    query, Taxonomy.q.genetic_codeID == genetic_code)
 
         taxons = list(Taxonomy.select(query))
         return dict(taxons=taxons)
[~/nbn/taxonomy/TaxonomyServer] % svn commit -m "fixed typo"
Sending        TaxonomyServer/taxonomyserver/controllers.py
Transmitting file data .
Committed revision 24.
[~/nbn/taxonomy/TaxonomyServer] % 
As you can see, several small ones but nothing major or complex.



Copyright © 2001-2013 Andrew Dalke Scientific AB