Connection negotiation
Catching up on the LSID wars I saw a post by Eric Jain at the Swiss Institute of Bioinformatics. He wrote:
http://purl.uniprot.org/uniprot/P12345 is linked to a machine-readable representation [http://beta.uniprot.org/uniprot/P12345.rdf] via two mechanisms: 1. there is a link-rel=alternate in the header of the web page, and 2. you can set an Accept header if you want to skip directly to that.In other words, the server implements content-negotiation (also called "conneg" for short).
This is something I advocated in DAS/2. The summary of my experience with conneg is in the DAS/2 mail archive.
I don't closely track what's going on in bioinformatics these days so Eric's comment is the first real public example I've seen of conneg in use in the life sciences. Affymetrix uses it for some internal work, and while the DAS/2 spec talks about it I don't know of anyone publically supporting it.
I'm curious to know about people's experiences with conneg. It hasn't had wide uptake in the general software world, so I haven't been learned much about the practicalities of using it in real systems. But I don't have a comment system so, hmm, well, email me about it or send me a link to a page describing your experiences.
Getting a specific representation
One of the longer term problems in conneg is that only dispatches on the requested format, and not the meaning. If you request an "image/png" for a chemical compound do you get the 2D or 3D depiction of that compound? But that's theoretical problems that are best solved only after running into problems in practice, I think.
A more immediate problem I had with conneg was trying to link to a specific format for a resource. For example, if you want to link to specifically the RDF version in HTML you have to do something like
Take a look at the <a href="wherever" type="application/rdf+xml">RDF</a>and in email you want to say
Why are the oxygens colored yellow in the PNG version at http://example.com/image ?(I haven't tried that to see if the 'type' attribute actually works like this in modern browsers. I just don't have real world experience in using conneg.)
Eric solved that by using a several redirections, with different URLs for each final representations. That is:
- The first request to 'http://purl.uniprot.org/uniprot/P12345' does a 303 "See Other" redirect to ...
- 'http://beta.uniprot.org/?query=purl:uniprot/P12345', which
understands the "Accept" header and does a 302 "Found" redirect
to either of:
- 'http://beta.uniprot.org/uniprot/P12345' for html, or
- 'http://beta.uniprot.org/uniprot/P12345.rdf' for RDF
The conneg spec has a section describing "alternatives", which looks like
Alternates: {"paper.html.en" 0.9 {type text/html} {language en}},
{"paper.html.fr" 0.7 {type text/html} {language fr}},
{"paper.ps.en" 1.0 {type application/postscript}
{language en}}
It's meant so the server can inform the client (the "user agent") that
alternate forms are available, and let the client decide which is the
best form. It might be nice for the UniProt server to support this as
well, but that's definitely something to hold off doing until there
are clients that might actually use that data.
This is where the "chicken and egg" meets YAGNI. And I'm on the YAGNI side of the balance. Except abstractly.
Quality
For me, the hardest thing in conneg was supporting the quality factor in the request. A quality of "q=1.0" is best and "q=0.0" means "do not want." If two content types are requested then the one with the highest quality (after applying the scoring algorithm) wins. As long as the best is not 0.
It seems that Uniprot ignores the "q" field. I think that's reasonable for now. It's annoying to get right and currently no one uses it.
% curl -H "Accept: application/rdf+xml; q=0" -D - 'http://beta.uniprot.org/?query=purl:uniprot/P12345' HTTP/1.1 302 Moved Temporarily Date: Thu, 26 Jul 2007 13:51:18 GMT Server: Apache-Coyote/1.1 Location: http://beta.uniprot.org/uniprot/P12345.rdf Content-Type: text/html;charset=ISO-8859-1 Content-Length: 0The "q=0" means "do not send me RDF" and you can see I'm being pointed to the .rdf file.
Here's a request where the RDF should be returned instead of the HTML
% curl -H "Accept: text/html;q=0.1, application/rdf+xml;q=0.2" -D - 'http://beta.uniprot.org/?query=purl:uniprot/P12345' HTTP/1.1 302 Moved Temporarily Date: Thu, 26 Jul 2007 14:00:51 GMT Server: Apache-Coyote/1.1 Location: http://beta.uniprot.org/uniprot/P12345 Content-Type: text/html;charset=ISO-8859-1 Content-Length: 0Most likely the internal code is using the first format given rather than the best format given. So remember boys and girls, always send your prefered format first, even if the spec says there shouldn't be a difference.
Hmm, and it does look like they do a substring test on the Accept string. This should not (probably, debatably) return RDF.
% curl -H "Accept: application/rdf+xml2" -D - 'http://beta.uniprot.org/?query=purl:uniprot/P12345' HTTP/1.1 302 Moved Temporarily Date: Thu, 26 Jul 2007 13:50:36 GMT Server: Apache-Coyote/1.1 Location: http://beta.uniprot.org/uniprot/P12345.rdf Content-Type: text/html;charset=ISO-8859-1 Content-Length: 0
All of these problems I just listed? They should be low on the list of things to worry about doing conneg. It's a bunch of details that aren't needed for the first pass at getting conneg working. They are only needed once there are multiple format variants, and clients which except to get the different format types, and which can handle multiple formats.
Accept: */*
Eric also wrote:
The main reason why I don't default to the machine-readable representation (no doubt that would be useful for people writing semweb applications) is that the large majority of resources does not have a machine readable representation, and web pages happen to be the greatest common denominator.Perfectly cromulent justification. Especially since semantic web apps can be expected to send the correct content type, while browsers (Safari; grrr!) do things like send "Accept: */*".
Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me
Copyright © 2001-2010 Dalke Scientific Software, LLC.


