Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2007/07/26/conneg

Connection negotiation

Catching up on the LSID wars I saw a post by Eric Jain at the Swiss Institute of Bioinformatics. He wrote:

http://purl.uniprot.org/uniprot/P12345 is linked to a machine-readable representation [http://beta.uniprot.org/uniprot/P12345.rdf] via two mechanisms: 1. there is a link-rel=alternate in the header of the web page, and 2. you can set an Accept header if you want to skip directly to that.
In other words, the server implements content-negotiation (also called "conneg" for short).

This is something I advocated in DAS/2. The summary of my experience with conneg is in the DAS/2 mail archive.

I don't closely track what's going on in bioinformatics these days so Eric's comment is the first real public example I've seen of conneg in use in the life sciences. Affymetrix uses it for some internal work, and while the DAS/2 spec talks about it I don't know of anyone publically supporting it.

I'm curious to know about people's experiences with conneg. It hasn't had wide uptake in the general software world, so I haven't been learned much about the practicalities of using it in real systems. But I don't have a comment system so, hmm, well, email me about it or send me a link to a page describing your experiences.

Getting a specific representation

One of the longer term problems in conneg is that only dispatches on the requested format, and not the meaning. If you request an "image/png" for a chemical compound do you get the 2D or 3D depiction of that compound? But that's theoretical problems that are best solved only after running into problems in practice, I think.

A more immediate problem I had with conneg was trying to link to a specific format for a resource. For example, if you want to link to specifically the RDF version in HTML you have to do something like

Take a look at the <a href="wherever" type="application/rdf+xml">RDF</a>
and in email you want to say
Why are the oxygens colored yellow in the PNG version at http://example.com/image ?
(I haven't tried that to see if the 'type' attribute actually works like this in modern browsers. I just don't have real world experience in using conneg.)

Eric solved that by using a several redirections, with different URLs for each final representations. That is:

I'm beginning to think that multiple final URLs (with an intermediate redirect) is the right solution for this. Though I would like some way for user agents to get from each final representation to the other, or to the "main" URL. For example, I can't "Accept: application/rdf+xml" on the HTML page and be redirected to the other. Shouldn't there be an HTTP header for that?

The conneg spec has a section describing "alternatives", which looks like

     Alternates: {"paper.html.en" 0.9 {type text/html} {language en}},
                 {"paper.html.fr" 0.7 {type text/html} {language fr}},
                 {"paper.ps.en"   1.0 {type application/postscript}
                     {language en}}
It's meant so the server can inform the client (the "user agent") that alternate forms are available, and let the client decide which is the best form. It might be nice for the UniProt server to support this as well, but that's definitely something to hold off doing until there are clients that might actually use that data.

This is where the "chicken and egg" meets YAGNI. And I'm on the YAGNI side of the balance. Except abstractly.

Quality

For me, the hardest thing in conneg was supporting the quality factor in the request. A quality of "q=1.0" is best and "q=0.0" means "do not want." If two content types are requested then the one with the highest quality (after applying the scoring algorithm) wins. As long as the best is not 0.

It seems that Uniprot ignores the "q" field. I think that's reasonable for now. It's annoying to get right and currently no one uses it.

% curl -H "Accept: application/rdf+xml; q=0" -D - 'http://beta.uniprot.org/?query=purl:uniprot/P12345'
HTTP/1.1 302 Moved Temporarily
Date: Thu, 26 Jul 2007 13:51:18 GMT
Server: Apache-Coyote/1.1
Location: http://beta.uniprot.org/uniprot/P12345.rdf
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 0
The "q=0" means "do not send me RDF" and you can see I'm being pointed to the .rdf file.

Here's a request where the RDF should be returned instead of the HTML

% curl -H "Accept: text/html;q=0.1, application/rdf+xml;q=0.2" -D - 'http://beta.uniprot.org/?query=purl:uniprot/P12345'
HTTP/1.1 302 Moved Temporarily
Date: Thu, 26 Jul 2007 14:00:51 GMT
Server: Apache-Coyote/1.1
Location: http://beta.uniprot.org/uniprot/P12345
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 0
Most likely the internal code is using the first format given rather than the best format given. So remember boys and girls, always send your prefered format first, even if the spec says there shouldn't be a difference.

Hmm, and it does look like they do a substring test on the Accept string. This should not (probably, debatably) return RDF.

% curl -H "Accept: application/rdf+xml2" -D - 'http://beta.uniprot.org/?query=purl:uniprot/P12345'
HTTP/1.1 302 Moved Temporarily
Date: Thu, 26 Jul 2007 13:50:36 GMT
Server: Apache-Coyote/1.1
Location: http://beta.uniprot.org/uniprot/P12345.rdf
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 0

All of these problems I just listed? They should be low on the list of things to worry about doing conneg. It's a bunch of details that aren't needed for the first pass at getting conneg working. They are only needed once there are multiple format variants, and clients which except to get the different format types, and which can handle multiple formats.

Accept: */*

Eric also wrote:

The main reason why I don't default to the machine-readable representation (no doubt that would be useful for people writing semweb applications) is that the large majority of resources does not have a machine readable representation, and web pages happen to be the greatest common denominator.
Perfectly cromulent justification. Especially since semantic web apps can be expected to send the correct content type, while browsers (Safari; grrr!) do things like send "Accept: */*".


Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me



Copyright © 2001-2013 Andrew Dalke Scientific AB