Click here to start

Table of contents

Slide 1

Bioinformatics

Biopython

Many Formats

Human Genome Project

What does a format look like?

Need to parse a format

lex/yacc are too complicated

Roll one by hand

What's needed? / Use cases

More requirements

Icarus

General form of most formats

Form of a parser

Arrangment of blocks

This is regular format

Most bioinformatics formats are regular!

Slide 18

Parsing with regular expressions

Can't get all of the data

The regexp makes a parse tree

XML

SAX traversal of the parse tree

Martel - a new regexp engine

Martel (continued)

Why Plex is needed

With Plex

Example use - displaying as XML

Adding HTML markup

Marking up semi-structured formats

Using XML tools

Equivalent Bioperl code

Large Files

RecordReaders

Format definition using RecordReaders

Named Group Repeats

Other features

Timings

Validation

Version Detection? "Or" each format

Version Detection Overhead

XSLT

Iterators

make_iterator

Mindy

Creating and using a Mindy Database

Bugs

Future Work

Naming

Why "Martel"?

Author: Andrew Dalke

E-mail: dalke@dalkescientific.com