Dalke Scientific Software: More science. Less time. Products

Loading a database

Here are a few ways to load a database. I started with a subset of taxonomy names from NCBI saved into a file named "small_taxonomy.dat".

1 root
2 Bacteria
6 Azorhizobium
7 Azorhizobium caulinodans
9 Buchnera aphidicola
10 Cellvibrio
11 Cellvibrio gilvus
13 Dictyoglomus
14 Dictyoglomus thermophilum
16 Methylophilus
17 Methylophilus methylotrophus
18 Pelobacter
19 Pelobacter carbinolicus
20 Phenylobacterium
21 Phenylobacterium immobile
22 Shewanella
23 Shewanella colwelliana
24 Shewanella putrefaciens
25 Shewanella hanedai
27 halophilic eubacterium NRCC 41227
The format is the taxonomy id followed by a space followed by the name. In this case it's the scientific name but I'll use the common name field to store it.

There are several ways to load the data. You can do it through the Python shell, like this:

>>> import model
>>> infile = open("small_taxonomy.dat")
>>> for line in infile:
...   id, name = line[:-1].split(None, 1)
...   model.Taxonomy(id=int(id), common_name = name)  
...   
<Taxonomy 1 common_name='root'>
<Taxonomy 2 common_name='Bacteria'>
<Taxonomy 6 common_name='Azorhizobium'>
<Taxonomy 7 common_name="'Azorhizobium cau...'">
<Taxonomy 9 common_name="'Buchnera aphidic...'">
<Taxonomy 10 common_name='Cellvibrio'>
<Taxonomy 11 common_name="'Cellvibrio gilvu...'">
<Taxonomy 13 common_name='Dictyoglomus'>
<Taxonomy 14 common_name="'Dictyoglomus the...'">
<Taxonomy 16 common_name='Methylophilus'>
<Taxonomy 17 common_name="'Methylophilus me...'">
<Taxonomy 18 common_name='Pelobacter'>
<Taxonomy 19 common_name="'Pelobacter carbi...'">
<Taxonomy 20 common_name='Phenylobacterium'>
<Taxonomy 21 common_name="'Phenylobacterium...'">
<Taxonomy 22 common_name='Shewanella'>
<Taxonomy 23 common_name="'Shewanella colwe...'">
<Taxonomy 24 common_name="'Shewanella putre...'">
<Taxonomy 25 common_name="'Shewanella haned...'">
<Taxonomy 27 common_name="'halophilic eubac...'">
>>> model.hub.hub.commit()
>>> 

To see that it worked, quit the Python shell and start sqlite3.

[~/tmp/taxonomy] % sqlite3 ./devdata.sqlite
SQLite version 3.2.7
Enter ".help" for instructions
sqlite> select * from Taxonomy where (14 < taxid and taxid < 20);
16|Methylophilus
17|Methylophilus methylotrophus
18|Pelobacter
19|Pelobacter carbinolicus
sqlite> 

I want to show another way to load the database so remove all of the Taxonomy records from the database:

sqlite> delete from Taxonomy;
sqlite> select * from Taxonomy;
sqlite> 
and quite sqlite3.

Another way to load the database is through sqlite3, skipping SQLObject entirely.

from pysqlite2 import dbapi2 as sqlite

conn = sqlite.connect("devdata.sqlite")
cursor = conn.cursor()

infile = open("small_taxonomy.dat")
for line in infile:
    taxid, name = line[:-1].split(None, 1)
    cursor.execute("INSERT INTO Taxonomy VALUES (?, ?)",
                   (taxid, name))
conn.commit()

cursor.execute("SELECT common_name FROM Taxonomy where "
               "  14 < taxid and taxid < 20")
for result in cursor.fetchall():
    print result
and a final way is to generate SQL statements for use by sqlite3. However, I find the pysqlite interface the easiest.



Copyright © 2001-2013 Andrew Dalke Scientific AB