Ruminations about DAS2

[ previous | newer ] /home/writings/diary/archive/2003/09/20/DAS2

Ruminations about DAS2

I visited Gregg Helt at Affymetrix last week. He's in their Emeryville office, which is only a few blocks from where he was when I last met him at Neomorphic. Must have been late August 1999 since it was just after the OiB conference in San Jose. I think that was also the last time I drove out to the Bay Area instead of flying.

We chatted about a few things. The big one was the DAS2 proposal, which got a 125 in the review, which I'm told is good. This is the first time I've been on a grant since grad school, and I wasn't that involved then. I enoyed the conversation and look forward to working with him in Lincoln in a few months if this thing really does go through. Should hear about that soon.

Gregg mentioned playing around a bit with the content negotiation idea I proposed it during some of the the DAS2 RFC discussions. His point was that he controls both server and client and they can collude to return data in a more appropriate form, eg, in a form which is easier for the client to process or which reduces the amount of data sent over the wire. He's pretty happy with the result.

I'm glad to hear that. I haven't had time to experiment with it myself so the benefits of conneg were mostly theoretical. I don't think I had considered about Gregg's use case -- I thought about switching between existing formats, and not his idea of letting a client/server combination use a new format.

I think that idea will help encourage tool providers to support DAS on the server. DAS ends up providing two things: a way to get, publish, and ask questions about annotations, and a guaranteed minimal base of how that data is presented, likely in XML. If a company has better ways of exchanging the data they can still support it on top of the existing DAS API. If we do it right, they should even be able to include new types of queries. It also makes DAS more future-proof.

Gregg also suggested that the client and server could collude in making the searches. The DAS/1 API returns all annotation data in a query range, so if you change the range just slightly you end up getting all the overlap data again. I want to change the result to just return URLs for annotations which overlap the range. This calls for extra trips to the server to get the actual data, but the client can cache previous results, and persistent connections in HTTP/1.1 help quite a bit with performance of making multiple requests.

Gregg correctly points out the client can remember which ranges it already asked about. When it asks the server for data in a new range, it could include that range information. The server then omits the data it now knows the client already knows about.

I think there are still advantages to having searches return a list of URLs instead of the raw data. My approach is easier for naive clients, since they don't need to track all the ranges. (But we could provide libraries or example code to help out.) My approach is also more cache friendly, which may help if there are many clients using the server -- stick a Squid cache in front of the web server and let it handle repeated requests for the same annotation instead of going through to the DAS layer, which may require starting a CGI script and making some SQL calls to the back-end database.

I also talked with Gregg about using some of the discussions related to RSS/Atom/Pie/Echo/whatever to help guide us when fleshing out the DAS2 spec. Here's part of a followup email I sent to him on the topic.

which is a proposed successor to RSS.  Quoting from
http://bitworking.org/rfc/draft-gregorio-07.html
> AtomAPI is an application level protocol for publishing, and
> editing web resources.  AtomAPI unifies many disparate
> publishing mechanisms into a single, simple, extensible protocol.
> The protocol at its core is the HTTP transport of an XML payload.

It has several other names, including Echo and Pie, or maybe
Atom is a specification of the Pie API.  It's confusing.
http://webservices.xml.com/pub/a/ws/2003/08/05/salz.html
even says that it might not be Atom, because of trademark concerns.

I think Atom and the concepts discussed on the wiki at
 http://www.intertwingly.net/wiki/pie/RestEchoApiDiscuss
has some bearing on the discussions we had for DAS2.  Many
of the comments looked familiar :)

Salz' article also gives some interesting critique of Atom vs.
WebDAV and REST vs. SOAP.  He's one of the Python SOAP developers
so is somewhat biased in that last comparison.

On a related note, Mark Pilgrim, one of the Atom authors,
rewrote MS's SOAP interface into a REST approach, at
 http://diveintomark.org/archives/2003/09/08/msweb-rest
and has some comments comparing the two approaches.

Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me