Dalke Scientific Software: More science. Less time. Products
Products

Martel

Martel lets you work with existing flat-file bioinformatics formats as if they are already in XML. It is the core parsing framework for our proucts and for Biopython because it simplifies and standardizes how to access all the data in a file.

Some of the tasks that can be done with Martel are:

  • Extract the identifier name and sequence from a record
  • Convert a record to HTML (including generating cross-reference hyperlinks)
  • Identify the file format
  • Validate a record is in the correct format
  • Index a database file for fast record lookup (see our Mindy product)
  • Load a database file into a relational or XML database.

Technical Details

Martel uses a modified form of the Perl regular expression language to describe the format of a file. The definition is used to generate a parser for that format. An input file is converted into a parse tree, which is traversed in prefix order to generate SAX 2.0 events, as used in XML processing. Element names and attributes are specified in the regular expression grammar using the named group extension popularized by Python.

The events can be used by any SAX handler. Some of the common handlers can: build a DOM tree or any other data structures, load an XML database, identify specific data fields (accession number, sequence, cross reference), find the record start and end positions, and drive an XSL transformation.

More details can be found in the paper presented at the 9th International Python Conference and in the slides of a talk presented at EBI. (Portions of that talk were used for the ISMB 2001 poster.) The Martel source code is available as part of the Biopython distribution.



Contact Us | Home
Copyright © 2001-2013 Andrew Dalke Scientific AB. All rights reserved.
Company
Contact Us
News