Job Management

Some analysis programs take a long time to run. BLAST is the usual example of this but there are many others. I'm going to develop a web interface for the following program, called "slow_div".

#!/bin/sh
sleep 30
echo "The answer is:" `expr $1 / $2`

(Don't forget to 'chmod +x'!)

I'll develop a web interface to slow_div. The start page will ask for the two numbers and an email address. When submitted the server will run the job in the background and show a page saying "your job has been submitted; you will receive an email when the results are available."

Eventually this project will support user logins, so a user can view all of his or her submissions. To make life easier on me, when I do the quickstart I'll make sure that "identity" is enabled. This TurboGears project will be called "SlowDiv". Here's my quickstart for it. If your TurboGears session asks if you want to use SQLObject or SQLAlchemy say "SQLObject".

[~/nbn] % tg-admin quickstart SlowDiv
Enter package name [slowdiv]: 
Do you need Identity (usernames/passwords) in this project? [no] yes
Selected and implied templates:
  TurboGears#tgbase      tg base template
  TurboGears#turbogears  web framework

Variables:
  identity:    sqlobject
  package:     slowdiv
  project:     SlowDiv
  sqlalchemy:  False
Creating template tgbase
Creating directory ./SlowDiv
  Recursing into +einame+.egg-info
    Creating ./SlowDiv/SlowDiv.egg-info/
    Copying PKG-INFO to ./SlowDiv/SlowDiv.egg-info/PKG-INFO
    Copying paster_plugins.txt to ./SlowDiv/SlowDiv.egg-info/paster_plugins.txt
    Copying sqlobject.txt_tmpl to ./SlowDiv/SlowDiv.egg-info/sqlobject.txt
  Recursing into +package+
    Creating ./SlowDiv/slowdiv/
    Copying __init__.py_tmpl to ./SlowDiv/slowdiv/__init__.py
    Copying release.py_tmpl to ./SlowDiv/slowdiv/release.py
    Recursing into static
      Creating ./SlowDiv/slowdiv/static/
      Recursing into css
        Creating ./SlowDiv/slowdiv/static/css/
        Copying empty to ./SlowDiv/slowdiv/static/css/empty
      Recursing into images
        Creating ./SlowDiv/slowdiv/static/images/
        Copying favicon.ico to ./SlowDiv/slowdiv/static/images/favicon.ico
        Copying tg_under_the_hood.png to ./SlowDiv/slowdiv/static/images/tg_under_the_hood.png
      Recursing into javascript
        Creating ./SlowDiv/slowdiv/static/javascript/
        Copying empty to ./SlowDiv/slowdiv/static/javascript/empty
    Recursing into templates
      Creating ./SlowDiv/slowdiv/templates/
      Copying __init__.py_tmpl to ./SlowDiv/slowdiv/templates/__init__.py
Creating template turbogears
  Recursing into +package+
    Recursing into config
      Creating ./SlowDiv/slowdiv/config/
/usr/local/lib/python2.4/site-packages/Cheetah-1.0-py2.4-macosx-10.4-ppc.egg/Cheetah/Compiler.py:1112: UserWarning: You supplied an empty string for the source!
  warnings.warn("You supplied an empty string for the source!", )
      Copying __init__.py_tmpl to ./SlowDiv/slowdiv/config/__init__.py
      Copying app.cfg_tmpl to ./SlowDiv/slowdiv/config/app.cfg
      Copying log.cfg_tmpl to ./SlowDiv/slowdiv/config/log.cfg
    Copying controllers.py_tmpl to ./SlowDiv/slowdiv/controllers.py
    Copying json.py_tmpl to ./SlowDiv/slowdiv/json.py
    Copying model.py_tmpl to ./SlowDiv/slowdiv/model.py
    Recursing into sqlobject-history
      Creating ./SlowDiv/slowdiv/sqlobject-history/
      Copying empty to ./SlowDiv/slowdiv/sqlobject-history/empty
    Recursing into templates
      Copying login.kid to ./SlowDiv/slowdiv/templates/login.kid
      Copying master.kid to ./SlowDiv/slowdiv/templates/master.kid
      Copying welcome.kid to ./SlowDiv/slowdiv/templates/welcome.kid
    Recursing into tests
      Creating ./SlowDiv/slowdiv/tests/
      Copying __init__.py_tmpl to ./SlowDiv/slowdiv/tests/__init__.py
      Copying test_controllers.py_tmpl to ./SlowDiv/slowdiv/tests/test_controllers.py
      Copying test_model.py_tmpl to ./SlowDiv/slowdiv/tests/test_model.py
  Copying README.txt_tmpl to ./SlowDiv/README.txt
  Copying dev.cfg_tmpl to ./SlowDiv/dev.cfg
  Copying sample-prod.cfg_tmpl to ./SlowDiv/sample-prod.cfg
  Copying setup.py_tmpl to ./SlowDiv/setup.py
  Copying start-+package+.py_tmpl to ./SlowDiv/start-slowdiv.py
Running /usr/local/bin/python setup.py egg_info
Adding TurboGears to paster_plugins.txt
running egg_info
writing requirements to SlowDiv.egg-info/requires.txt
writing SlowDiv.egg-info/PKG-INFO
writing top-level names to SlowDiv.egg-info/top_level.txt
reading manifest file 'SlowDiv.egg-info/SOURCES.txt'
writing manifest file 'SlowDiv.egg-info/SOURCES.txt'
[~/nbn] %

How it will work

The web server should respond to a request within a few seconds, even if only to say that the job is still being processed. This means the job must be run in the background (or using threads, but I don't think that's the right solution here). Because the program is not in Python it must be run as an external program.

The problem when running any program in the background is knowing when the program finished. Very few programs provide a clear indication that they have stopped, because normally this isn't a problem. Most programs are not run in the background.

What I've decided to do is use a wrapper command called "slow_div_wrapper.py". The TurboGears controller will start the wrapper in the background. The wrapper will call the actual "slow_div" program and wait until the program finishes. It can look at the exit status to figure out of there was an error, like dividing by zero. It will then email the result to the user.

I can write the wrapper script outside of TurboGears. It will take the two numbers and the email address as command-line parameter, run the script, look at the output and send the email. The contents of the email will be different if the program was a success or failure. thie pro

#!/usr/bin/env python

# Given two numbers and an email address as the input arguments,
# divide the first number by the second and email the results to the
# given email address.

import sys
import subprocess
import smtplib
from email import Message 

SLOW_DIV = "/Users/dalke/nbn/slow_div"

# This forwards to my 'dalkescientific.com' SMTP server
SMTP_SERVER = "localhost:1025"

FROM_ADDRESS = "Job Manager <jobs@dalkescientific.com>"

# sys.argv[0] is the name of this program
# sys.argv[1] is the first command-line argument, [2] is the second, ...
# require three and only three arguments
# Assume the inputs will always be in the correct format
arg1, arg2, email = sys.argv[1:]

# Run the program and capture its stdout
p = subprocess.Popen([SLOW_DIV, arg1, arg2],
                     stdout=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     )
# If there was any error text then there was an error
output = p.stdout.read()
err_output = p.stderr.read()

if err_output:
    body = "Could not compute %s / %s:\n  %s" % (arg1, arg2, err_output)
else:
    # On success stdout contains a line like:
    #   The answer is: 4
    # I want the 4th field
    words = output.split()
    body = "It appears that %s / %s = %s" % (arg1, arg2, words[3])

# I have not yet tested this!  No network connectivity here ...

server = smtplib.SMTP(SMTP_SERVER)

msg = Message.Message()
msg.add_header("Subject", "slow_div results")
msg.set_payload(body)

server.sendmail(FROM_ADDRESS, to_address, msg.as_string())
server.quit()

(Don't forget to 'chmod +x'!)

With this in place the controller is very simple. I'll have one page to set up the query and another to get the parameters, start the job and say it's running.

import os
import subprocess
SLOW_DIV_WRAPPER = "/Users/dalke/nbn/slow_div_wrapper.py"

class Root(controllers.RootController):
    @expose(template="slowdiv.templates.homepage")
    def index(self):
        return dict()

    @expose(template="slowdiv.templates.start_job")
    def start_job(self, x, y, email_address):
        # Validate the input
        try:
            int(x)
        except ValueError:
            return dict(error="parameter 'x' is not an integer")
        try:
            int(y)
        except ValueError:
            return dict(error="parameter 'y' is not an integer")
        if "@" not in email_address:
            return dict(error="invalid email address")
        
        # Start the request
        result = os.system("%s %s %s %s &" %
                           (SLOW_DIV_WRAPPER, x, y, email_address))
        if result != 0:
            log.error("Could not start " + SLOW_DIV_WRAPPER)
            return dict(error="Failure: misconfigured server")
            
        return dict(error="")

This uses two templates, one is "homepage.kid", the core of which is

<form action="start_job" method="POST">
Compute <input type="text" name="x" size="5"></input> /
<input type="text" name="y" size="5"></input><br />
When finished send results to: <input type="text" name="email_address"></input><input type="submit" value="Divide"></input>
</form>

which looks like and the other is "start_job.kid", which is

<P py:if="error">Could not start job: <i>${error}</i></P>
<P py:if="not error">You request has been submitted and you will
shortly receive an email with the division results.</P>
<P><a href="/">Start new division job</a></P>

This displays the error message if there was one, otherwise the message that the server will send email when done.

This works but I don't like entering my email address every time. I want the server to remember who I am. I don't need the full identity system for this. I'll start by using a cookie, which is a short string. Every time the browser connects to my server it will send the cookie in the HTTP headers. TurboGears, or rather the CherryPy part of TurboGears, processes the cookie and turns it into a Python Cookie object. To get to CherryPy I need to import the cherrypy module.

What I'll do is modify the "start_job" code to set the "email" cookie based on the user's email address. I'll set it every time even if the value hasn't changed. I'll also modify the index page to pass in the value from the email cookie, if it exists. If it doesn't I'll pass in the empty string.

import os
import subprocess
import cherrypy
SLOW_DIV_WRAPPER = "/Users/dalke/nbn/slow_div_wrapper.py"

class Root(controllers.RootController):
    @expose(template="slowdiv.templates.homepage")
    def index(self):
        email_cookie = cherrypy.request.simpleCookie.get("email", None)
        if email_cookie is None:
            email = ""
        else:
            email = email_cookie.value
        return dict(email=email)

    @expose(template="slowdiv.templates.start_job")
    def start_job(self, x, y, email_address):
        # Validate the input
        try:
            int(x)
        except ValueError:
            return dict(error="parameter 'x' is not an integer")
        try:
            int(y)
        except ValueError:
            return dict(error="parameter 'y' is not an integer")
        if "@" not in email_address:
            return dict(error="invalid email address")

        # Use a cookie to remember this email address for the future
        cherrypy.response.simpleCookie['email'] = email_address

        # Start the request
        result = os.system("%s %s %s %s &" %
                           (SLOW_DIV_WRAPPER, x, y, email_address))
        if result != 0:
            log.error("Could not start " + SLOW_DIV_WRAPPER)
            return dict(error="Failure: misconfigured server")
            
        return dict(error="")

I'll also modify the homepage template to insert the email address

<form action="start_job" method="POST">
Compute <input type="text" name="x" size="5"></input> /
<input type="text" name="y" size="5"></input><br />
When finished send results to: <input type="text" name="email_address" value="${email}"></input><input type="submit" value="Divide"></input>
</form>

By the way, this is not secure. The browser can send, or be configured to send, any piece of text. For in-house projects this isn't a large concern but with large publically accessible sites it might be.

Job directory

Suppose the output is more complicated. For example, it might be the BLAST output and you want to include a link to download the hit table for Excel, and download the hit sequence in FASTA format. You might not be able to send an email with everything in it. Instead, when the job is finished you can email a link to the results page.

That requires persistance, which means either storing the results data in a database or on the file system. I'll use the file system. Each job will have a unique job id and will have a job directory based on the id. I'll store the results in that file.

Suppose the server machine crashes. A limitation of the current system is that a reset like that removes all currently running jobs and the user will never get an email. As the administrator I want some ability to restart all unfinished jobs after the machine starts up again. The restart program does not need to be automatic nor does it need to be part of TurboGears.

To support restart I'll need to save the input parameters in the job directory. I'll also need some way to tell if a job is done. I'll create the file named "SUCCESS" if the program finished successfully and the file "ERROR" if there was an error. These files will be zero-length. Their mere existance is enough to know if there was a success or failure.

Here's what I'll do:

Modify the start_job controller function to:
- create a unique job id
- create a job directory for the job
- save the input parameters and configuration information in the new directory
- pass the job directory name to the wrapper program
Modify the wrapper program to:
- read the configuration file instead of the command-line
- when finished, create the SUCCESS or ERROR file
- send the email with status information and, if success, a URL
Add a "results" controller to see the job results

Of these the trickiest is coming up with a unique job id. The easiest solution is to use the value 1 larger than any existing job id. For many cases that works fine. There's a subtle security problem with that. The numbers are guessable. If I want to see what someone else has done I could get a job id then manually make the number a bit smaller and look at some of the previous jobs. You could use the current time but that's also guessable through automated searches.

If you're developing something for a local group where security isn't a problem then you don't need to worry about these possible attacks. If you have decent authentication (and you check it correctly!) then again you don't need to worry. But I've found it's best to assume the worst and use a random identifier. Even then, making unique randomly based numbers is hard.

Here's one I wrote just now which is okay. It makes a short random string of letters then uses the file system to determine if the job identifier is unique. If the mkdir succeeds then no other job has that identifier. That directory becomes the job directory. If it fails I try again, for a while. I specified that the job is a string of 8 consonant letters. That's long enough that it's hard to guess numbers by random (better than a 1-in-a-million chance) and small enough to read over the phone. I removed the vowels, except y, to make it harder to get swear words. I also removed 'l' so there's no accidental confusion with '1'. Perhaps numbers are better, or perhaps numbers with hypens in them, to make it easier to read.

The code was large enough that I moved it into its own file name "Job.py". Here's module definition:

import os
import random

JOBDIR_BASE = "/Users/dalke/nbn/jobs"

letters = "bcdfghjkmnpqrstvwxyz"

class Job(object):
    def __init__(self, job_id):
        self.job_id = job_id
        self.job_dirname = os.path.join(JOBDIR_BASE, "job-" + job_id)
        self.config_filename = os.path.join(self.job_dirname, "config.ini")

    # convenience function to look up a file inside the job directory
    def fullpath(self, filename):
        return os.path.join(self.job_dirname, filename)

def make_job():
  # try at most 1000 times before giving up
  for i in xrange(1000):
    job_id = "".join([random.choice(letters) for dummy in "12345678"])
    job = Job(job_id)
    try:
      os.mkdir(job.job_dirname)
      return job
    except OSError, err:
      # try again
      pass
  
  # raise the last OEError received
  raise err

Note that all the job directories will be located under "/Users/dalke/nbn/jobs". I hard-coded the path name into the module. Another option is to use the TurboGears configuration system, which I won't talk about.

Updating the start_job controller

I'll do each of the three changes in parts. Here is the new start_job controller. I skipped a few imports - later on I'll show the complete controllers.py file.

    @expose(template="slowdiv.templates.start_job")
    def start_job(self, x, y, email_address):
        # Validate the input
        try:
            int(x)
        except ValueError:
            return dict(error="parameter 'x' is not an integer")
        try:
            int(y)
        except ValueError:
            return dict(error="parameter 'y' is not an integer")
        if "@" not in email_address:
            return dict(error="invalid email address")

        # Use a cookie to remember this email address for the future
        cherrypy.response.simpleCookie['email'] = email_address

        # Make the job, with unique jobid and new job directory
        job = Job.make_job()
        
        # Save the configuration information in "INI" format
        config = ConfigParser.ConfigParser()
        config.add_section("settings")
        config.set("settings", "job_id", job.job_id)
        config.set("settings", "job_dirname", job.job_dirname)
        config.set("settings", "email", email_address)
        config.set("settings", "results_url",
                   "http://localhost:8080/results?job_id=" + job.job_id)
        
        config.add_section("inputs")
        config.set("inputs", "x", x)
        config.set("inputs", "y", y)
        
        config.write(open(job.config_filename, "w"))
        
        
        # Start the request
        result = os.system("%s %s &" %
                           (SLOW_DIV_WRAPPER, job.config_filename))
                           
        if result != 0:
            log.error("Could not start " + SLOW_DIV_WRAPPER)
            return dict(error="Failure: misconfigured server",
                        job=None)
            
        return dict(error="", job=job)

The only unusual thing should be the ConfigParser module. I tried this out as an experiment. There are five common ways these days to store configuration information in a file: XML format, "INI" format, YAML, Python's "pickle" format or a new file format. I haven't used the ConfigParser module before (which parses INI files) and this data was simple so I tried it. Otherwise I would likely use an XML format.

Here's an example of the INI output created by the "start_job" controller:

[inputs]
y = 0
x = 8

[settings]
results_url = http://localhost:8080/results?job_id=tcfnctbq
job_dirname = /Users/dalke/nbn/jobs/job-tcfnctbq
job_id = tcfnctbq
email = dalke@dalkescientific.com

I also updated the associated "start_job.kid" template to link to the expected results page. Someone could follow that link before it's ready so I'll have to handle that case later on. Here's the updated template:

<body>
<P py:if="error">Could not start job: <i>${error}</i></P>
<P py:if="not error">You request has been submitted and you will
shortly receive an email when
<a href="results?job_id=${job.job_id}">the results page</a> is ready.</P>
<hr />
<P><a href="/">Start new division job</a></P>
</body>

Update the wrapper program

The slow_div_wrapper.py program must now takes the configuration information from the "config.ini" file, passed in on the command line.

#!/usr/bin/env python

# Given two numbers and an email address as the input arguments,
# divide the first number by the second and email the results to the
# given email address.

import os, sys
import subprocess
import smtplib
import ConfigParser
from email import Message

SLOW_DIV = "/Users/dalke/nbn/slow_div"

# This forwards to my 'dalkescientific.com' SMTP server
SMTP_SERVER = "localhost:1025"

FROM_ADDRESS = "Job Manager <jobs@dalkescientific.com>"

# Read the configuration file
config_filename = sys.argv[1]
config = ConfigParser.ConfigParser()
config.readfp(open(config_filename))


to_address = config.get("settings", "email")
job_dirname = config.get("settings", "job_dirname")
results_url = config.get("settings", "results_url")

arg1 = config.get("inputs", "x")
arg2 = config.get("inputs", "y")

# Run the program and capture its stdout
p = subprocess.Popen([SLOW_DIV, arg1, arg2],
                     stdout=subprocess.PIPE,
                     stderr=subprocess.PIPE,
                     )
# If there was any error text then there was an error
output = p.stdout.read()
err_output = p.stderr.read()

# Helper function to create/save a file in the job directory
def save(filename, text):
    outfile = open(os.path.join(job_dirname, filename), "w")
    outfile.write(text)
    outfile.close()

if err_output:
    body = "Could not compute %s / %s:\n  %s" % (arg1, arg2, err_output)
    save("err_output.txt", err_output)
    save("ERROR", "")
else:
    # On success stdout contains a line like:
    #   The answer is: 4
    # report the 4th field
    words = output.split()
    body = ("Your job computing %s / %s has finished\n"
            "For the answer see %s") % (arg1, arg2, results_url)
    save("output.txt", words[3])
    save("SUCCESS", "")
    

# Commented out for testing - not connected to the network!
##server = smtplib.SMTP(SMTP_SERVER)

msg = Message.Message()
msg.add_header("Subject", "slow_div results")
msg.set_payload(body)

##server.sendmail(FROM_ADDRESS, to_address, msg.as_string())
##server.quit()

print "Sending", to_address
print msg.as_string()

A results page

The results page must handle three conditions. The job could have finished successfully, finished with an error, or it is still being processed. To figure out which is which I'll have to check for the special files "SUCCESS" and "ERROR". I'll pass a "status" item in the template dictionary which can be "success", "fail", or "running" for each of the three possibile conditions. I'll also load the result or error output as appropriate.

    @expose(template="slowdiv.templates.results")
    def results(self, job_id):
        job = Job.Job(job_id)
        if not os.path.exists(job.job_dirname):
            raise KeyError(job_id)  # should have a better error message

        config = ConfigParser.ConfigParser()
        config.readfp(open(job.config_filename))

        results = dict(job=job, config=config)

        # success or failure?  Or not yet finished?
        if os.path.exists(job.fullpath("SUCCESS")):
            results["status"] = "success"
            results["value"] = open(job.fullpath("output.txt")).read()
        elif os.path.exists(job.fullpath("ERROR")):
            results["status"] = "fail"
            results["err"] = open(job.fullpath("err_output.txt")).read()
        else:
            results["status"] = "running"
            
        return results

That gives enough information to the template so it can display the right text for each case. Here's the template

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://purl.org/kid/ns#"
    py:extends="'master.kid'">

<head>
    <meta content="text/html; charset=UTF-8" http-equiv="content-type" py:replace="''"/>
    <title>SlowDiv results - ${job.job_id}</title>
</head>

<body>

<P>
Results for job '${job.job_id}'
submitted by ${config.get("settings", "email")}.
</P>

<P py:if="status == 'success'">
<i>${config.get("inputs", "x")}</i> / <i>${config.get("inputs", "y")}</i> ==
<b>${value}</b>
</P>

<P py:if="status == 'fail'">
Could not compute
<i>${config.get("inputs", "x")}</i> / <i>${config.get("inputs", "y")}</i>:
<b>${err}</b>
</P>

<P py:if="status == 'running'">
Still working on it.  Come back later.
</P>

</body>
</html>

If I wanted to I could add a refresh 'meta' element to the header when the status is 'running.' That would reload the page every, say, 15 seconds until the status is no longer running. Eventually (hopefully) the slow_div program will end and the page will end up showing the success or failure results.

The full controllers.py file

As promised, the entire controllers.py file

import logging

import cherrypy

import turbogears
from turbogears import controllers, expose, validate, redirect
from turbogears import identity

from slowdiv import json

log = logging.getLogger("slowdiv.controllers")

import os
import subprocess
import cherrypy
import ConfigParser
import Job
SLOW_DIV_WRAPPER = "/Users/dalke/nbn/slow_div_wrapper.py"

class Root(controllers.RootController):
    @expose(template="slowdiv.templates.homepage")
    def index(self):
        email_cookie = cherrypy.request.simpleCookie.get("email", None)
        if email_cookie is None:
            email = ""
        else:
            email = email_cookie.value
        return dict(email=email)

    @expose(template="slowdiv.templates.start_job")
    def start_job(self, x, y, email_address):
        # Validate the input
        try:
            int(x)
        except ValueError:
            return dict(error="parameter 'x' is not an integer")
        try:
            int(y)
        except ValueError:
            return dict(error="parameter 'y' is not an integer")
        if "@" not in email_address:
            return dict(error="invalid email address")

        # Use a cookie to remember this email address for the future
        cherrypy.response.simpleCookie['email'] = email_address

        # Make the job, with unique jobid and new job directory
        job = Job.make_job()

        # Save the configuration information in "INI" format
        config = ConfigParser.ConfigParser()
        config.add_section("settings")
        config.set("settings", "job_id", job.job_id)
        config.set("settings", "job_dirname", job.job_dirname)
        config.set("settings", "email", email_address)
        config.set("settings", "results_url",
                   "http://localhost:8080/results?job_id=" + job.job_id)

        config.add_section("inputs")
        config.set("inputs", "x", x)
        config.set("inputs", "y", y)

        config.write(open(job.config_filename, "w"))
        

        # Start the request
        result = os.system("%s %s &" %
                           (SLOW_DIV_WRAPPER, job.config_filename))
                           
        if result != 0:
            log.error("Could not start " + SLOW_DIV_WRAPPER)
            return dict(error="Failure: misconfigured server",
                        job=None)
            
        return dict(error="", job=job)


    @expose(template="slowdiv.templates.results")
    def results(self, job_id):
        job = Job.Job(job_id)
        if not os.path.exists(job.job_dirname):
            raise KeyError(job_id)  # should have a better error message

        config = ConfigParser.ConfigParser()
        config.readfp(open(job.config_filename))

        results = dict(job=job, config=config)

        # success or failure?  Or not yet finished?
        if os.path.exists(job.fullpath("SUCCESS")):
            results["status"] = "success"
            results["value"] = open(job.fullpath("output.txt")).read()
        elif os.path.exists(job.fullpath("ERROR")):
            results["status"] = "fail"
            results["err"] = open(job.fullpath("err_output.txt")).read()
        else:
            results["status"] = "running"
            
        return results

View all results for a user

Job management through directories works but there are limitations. You may want to add a "list all jobs" page for the users. You can go though all of the directories checking the configuration file for those jobs coming from the correct email address but that search takes time. A clever solution might be to add "user" directories with symbolic links to the project directories for a given user. For example:

~/nbn/jobs/job-nybbyxft
~/nbn/jobs/job-pnwghhpt
~/nbn/jobs/job-tcfnctbq
~/nbn/jobs/job-vmntvppd
~/nbn/jobs/job-vzcqdtyd
~/nbn/jobs/user-dalke@dalkescientific.com/job-nybbyxft -> ../job-nybbyxft
~/nbn/jobs/user-dalke@dalkescientific.com/job-tcfnctbq -> ../job-tcfnctbq
~/nbn/jobs/user-dalke@dalkescientific.com/job-vzcqdtyd -> ../job-vzcqdtyd
~/nbn/jobs/user-dan@nbn.ac.za/job-pnwghhpt -> ../job-pnwghhpt
~/nbn/jobs/user-dan@nbn.ac.za/job-vzcqdtyd -> ../job-vzcqdtyd

Everytime a job is added, also add the correct symbolic link.

That will work, and only requires a few lines of code.

check that the email address doesn't contain a '/', chr(0) or other characters deemed unacceptable
make the user directory if it doesn't exist
make the symbolic link
add a 'creation_date' timestamp to the config.ini file so the list can be in chronological order

I'll leave the details up to you.

Suppose you want more complicated queries, like "your BLAST searches" and "your most recent literature searches." Indexing through the file system might help but then you've got the disease "the filesystem is a database." While it is a type of hierarchical database, it's not that good for complex searches. (Though some file systems are pretty good at medium-level searches, which might be good enough.)

Instead, use a database with real query support. You can go the other way though and get the disease "the database is a filesystem" and end up storing large files, like images and movies, in the database when the filesystem does a really good job of that.

There is another big advantage to databases - integrity. It's actually the "I" in the so-called "ACID" requirements for a database. Atomic, Integrity, .. and two more terms. Suppose the filesystem becomes full or write permissions get screwed up or there was a partial restore of the filesystem that cleared some of the URLs. It's likely that some of the file-system-based data records will get messed up. A real database, on the other hand, implements certain gaurantees, like "the database will never be in an inconsistent state" and "an unfinished transaction rolls back to the previously good state."

One last interesting bit about using a database server. If you use a database you'll have tables like 'Job' (with at least the job id and job status), 'JobInputs' and 'JobResults'. You can also have a table called "JobQueue". After a new job is added to the database, add its job identifier to the job queue. Modify the compute client (the wrapper in my case) so that when it has nothing to do it connects to the database, grabs a job from the queue, changes the job status from "queued" to "running", adds it to the "JobRunning" table, and starts working. When finished it removes the job from the JobRunning table and saves the computed job results into the database. If no jobs are available, wait a few seconds and try again.

This is a basic distributed queuing system. It can have multiple compute clients configured so if there are no queued jobs it waits a few moments and tries again.