Job Management
Some analysis programs take a long time to run. BLAST is the usual example of this but there are many others. I'm going to develop a web interface for the following program, called "slow_div".
#!/bin/sh sleep 30 echo "The answer is:" `expr $1 / $2`(Don't forget to 'chmod +x'!)
I'll develop a web interface to slow_div. The start page will ask for the two numbers and an email address. When submitted the server will run the job in the background and show a page saying "your job has been submitted; you will receive an email when the results are available."
Eventually this project will support user logins, so a user can view all of his or her submissions. To make life easier on me, when I do the quickstart I'll make sure that "identity" is enabled. This TurboGears project will be called "SlowDiv". Here's my quickstart for it. If your TurboGears session asks if you want to use SQLObject or SQLAlchemy say "SQLObject".
[~/nbn] % tg-admin quickstart SlowDiv Enter package name [slowdiv]: Do you need Identity (usernames/passwords) in this project? [no] yes Selected and implied templates: TurboGears#tgbase tg base template TurboGears#turbogears web framework Variables: identity: sqlobject package: slowdiv project: SlowDiv sqlalchemy: False Creating template tgbase Creating directory ./SlowDiv Recursing into +einame+.egg-info Creating ./SlowDiv/SlowDiv.egg-info/ Copying PKG-INFO to ./SlowDiv/SlowDiv.egg-info/PKG-INFO Copying paster_plugins.txt to ./SlowDiv/SlowDiv.egg-info/paster_plugins.txt Copying sqlobject.txt_tmpl to ./SlowDiv/SlowDiv.egg-info/sqlobject.txt Recursing into +package+ Creating ./SlowDiv/slowdiv/ Copying __init__.py_tmpl to ./SlowDiv/slowdiv/__init__.py Copying release.py_tmpl to ./SlowDiv/slowdiv/release.py Recursing into static Creating ./SlowDiv/slowdiv/static/ Recursing into css Creating ./SlowDiv/slowdiv/static/css/ Copying empty to ./SlowDiv/slowdiv/static/css/empty Recursing into images Creating ./SlowDiv/slowdiv/static/images/ Copying favicon.ico to ./SlowDiv/slowdiv/static/images/favicon.ico Copying tg_under_the_hood.png to ./SlowDiv/slowdiv/static/images/tg_under_the_hood.png Recursing into javascript Creating ./SlowDiv/slowdiv/static/javascript/ Copying empty to ./SlowDiv/slowdiv/static/javascript/empty Recursing into templates Creating ./SlowDiv/slowdiv/templates/ Copying __init__.py_tmpl to ./SlowDiv/slowdiv/templates/__init__.py Creating template turbogears Recursing into +package+ Recursing into config Creating ./SlowDiv/slowdiv/config/ /usr/local/lib/python2.4/site-packages/Cheetah-1.0-py2.4-macosx-10.4-ppc.egg/Cheetah/Compiler.py:1112: UserWarning: You supplied an empty string for the source! warnings.warn("You supplied an empty string for the source!", ) Copying __init__.py_tmpl to ./SlowDiv/slowdiv/config/__init__.py Copying app.cfg_tmpl to ./SlowDiv/slowdiv/config/app.cfg Copying log.cfg_tmpl to ./SlowDiv/slowdiv/config/log.cfg Copying controllers.py_tmpl to ./SlowDiv/slowdiv/controllers.py Copying json.py_tmpl to ./SlowDiv/slowdiv/json.py Copying model.py_tmpl to ./SlowDiv/slowdiv/model.py Recursing into sqlobject-history Creating ./SlowDiv/slowdiv/sqlobject-history/ Copying empty to ./SlowDiv/slowdiv/sqlobject-history/empty Recursing into templates Copying login.kid to ./SlowDiv/slowdiv/templates/login.kid Copying master.kid to ./SlowDiv/slowdiv/templates/master.kid Copying welcome.kid to ./SlowDiv/slowdiv/templates/welcome.kid Recursing into tests Creating ./SlowDiv/slowdiv/tests/ Copying __init__.py_tmpl to ./SlowDiv/slowdiv/tests/__init__.py Copying test_controllers.py_tmpl to ./SlowDiv/slowdiv/tests/test_controllers.py Copying test_model.py_tmpl to ./SlowDiv/slowdiv/tests/test_model.py Copying README.txt_tmpl to ./SlowDiv/README.txt Copying dev.cfg_tmpl to ./SlowDiv/dev.cfg Copying sample-prod.cfg_tmpl to ./SlowDiv/sample-prod.cfg Copying setup.py_tmpl to ./SlowDiv/setup.py Copying start-+package+.py_tmpl to ./SlowDiv/start-slowdiv.py Running /usr/local/bin/python setup.py egg_info Adding TurboGears to paster_plugins.txt running egg_info writing requirements to SlowDiv.egg-info/requires.txt writing SlowDiv.egg-info/PKG-INFO writing top-level names to SlowDiv.egg-info/top_level.txt reading manifest file 'SlowDiv.egg-info/SOURCES.txt' writing manifest file 'SlowDiv.egg-info/SOURCES.txt' [~/nbn] %
How it will work
The web server should respond to a request within a few seconds, even if only to say that the job is still being processed. This means the job must be run in the background (or using threads, but I don't think that's the right solution here). Because the program is not in Python it must be run as an external program.
The problem when running any program in the background is knowing when the program finished. Very few programs provide a clear indication that they have stopped, because normally this isn't a problem. Most programs are not run in the background.
What I've decided to do is use a wrapper command called "slow_div_wrapper.py". The TurboGears controller will start the wrapper in the background. The wrapper will call the actual "slow_div" program and wait until the program finishes. It can look at the exit status to figure out of there was an error, like dividing by zero. It will then email the result to the user.
I can write the wrapper script outside of TurboGears. It will take the two numbers and the email address as command-line parameter, run the script, look at the output and send the email. The contents of the email will be different if the program was a success or failure. thie pro
#!/usr/bin/env python # Given two numbers and an email address as the input arguments, # divide the first number by the second and email the results to the # given email address. import sys import subprocess import smtplib from email import Message SLOW_DIV = "/Users/dalke/nbn/slow_div" # This forwards to my 'dalkescientific.com' SMTP server SMTP_SERVER = "localhost:1025" FROM_ADDRESS = "Job Manager <jobs@dalkescientific.com>" # sys.argv[0] is the name of this program # sys.argv[1] is the first command-line argument, [2] is the second, ... # require three and only three arguments # Assume the inputs will always be in the correct format arg1, arg2, email = sys.argv[1:] # Run the program and capture its stdout p = subprocess.Popen([SLOW_DIV, arg1, arg2], stdout=subprocess.PIPE, stderr=subprocess.PIPE, ) # If there was any error text then there was an error output = p.stdout.read() err_output = p.stderr.read() if err_output: body = "Could not compute %s / %s:\n %s" % (arg1, arg2, err_output) else: # On success stdout contains a line like: # The answer is: 4 # I want the 4th field words = output.split() body = "It appears that %s / %s = %s" % (arg1, arg2, words[3]) # I have not yet tested this! No network connectivity here ... server = smtplib.SMTP(SMTP_SERVER) msg = Message.Message() msg.add_header("Subject", "slow_div results") msg.set_payload(body) server.sendmail(FROM_ADDRESS, to_address, msg.as_string()) server.quit()(Don't forget to 'chmod +x'!)
With this in place the controller is very simple. I'll have one page to set up the query and another to get the parameters, start the job and say it's running.
import os import subprocess SLOW_DIV_WRAPPER = "/Users/dalke/nbn/slow_div_wrapper.py" class Root(controllers.RootController): @expose(template="slowdiv.templates.homepage") def index(self): return dict() @expose(template="slowdiv.templates.start_job") def start_job(self, x, y, email_address): # Validate the input try: int(x) except ValueError: return dict(error="parameter 'x' is not an integer") try: int(y) except ValueError: return dict(error="parameter 'y' is not an integer") if "@" not in email_address: return dict(error="invalid email address") # Start the request result = os.system("%s %s %s %s &" % (SLOW_DIV_WRAPPER, x, y, email_address)) if result != 0: log.error("Could not start " + SLOW_DIV_WRAPPER) return dict(error="Failure: misconfigured server") return dict(error="")This uses two templates, one is "homepage.kid", the core of which is
<form action="start_job" method="POST"> Compute <input type="text" name="x" size="5"></input> / <input type="text" name="y" size="5"></input><br /> When finished send results to: <input type="text" name="email_address"></input><input type="submit" value="Divide"></input> </form>which looks like and the other is "start_job.kid", which is
<P py:if="error">Could not start job: <i>${error}</i></P> <P py:if="not error">You request has been submitted and you will shortly receive an email with the division results.</P> <P><a href="/">Start new division job</a></P>This displays the error message if there was one, otherwise the message that the server will send email when done.
This works but I don't like entering my email address every time. I want the server to remember who I am. I don't need the full identity system for this. I'll start by using a cookie, which is a short string. Every time the browser connects to my server it will send the cookie in the HTTP headers. TurboGears, or rather the CherryPy part of TurboGears, processes the cookie and turns it into a Python Cookie object. To get to CherryPy I need to import the cherrypy module.
What I'll do is modify the "start_job" code to set the "email" cookie based on the user's email address. I'll set it every time even if the value hasn't changed. I'll also modify the index page to pass in the value from the email cookie, if it exists. If it doesn't I'll pass in the empty string.
import os import subprocess import cherrypy SLOW_DIV_WRAPPER = "/Users/dalke/nbn/slow_div_wrapper.py" class Root(controllers.RootController): @expose(template="slowdiv.templates.homepage") def index(self): email_cookie = cherrypy.request.simpleCookie.get("email", None) if email_cookie is None: email = "" else: email = email_cookie.value return dict(email=email) @expose(template="slowdiv.templates.start_job") def start_job(self, x, y, email_address): # Validate the input try: int(x) except ValueError: return dict(error="parameter 'x' is not an integer") try: int(y) except ValueError: return dict(error="parameter 'y' is not an integer") if "@" not in email_address: return dict(error="invalid email address") # Use a cookie to remember this email address for the future cherrypy.response.simpleCookie['email'] = email_address # Start the request result = os.system("%s %s %s %s &" % (SLOW_DIV_WRAPPER, x, y, email_address)) if result != 0: log.error("Could not start " + SLOW_DIV_WRAPPER) return dict(error="Failure: misconfigured server") return dict(error="")I'll also modify the homepage template to insert the email address
<form action="start_job" method="POST"> Compute <input type="text" name="x" size="5"></input> / <input type="text" name="y" size="5"></input><br /> When finished send results to: <input type="text" name="email_address" value="${email}"></input><input type="submit" value="Divide"></input> </form>
By the way, this is not secure. The browser can send, or be configured to send, any piece of text. For in-house projects this isn't a large concern but with large publically accessible sites it might be.
Job directory
Suppose the output is more complicated. For example, it might be the BLAST output and you want to include a link to download the hit table for Excel, and download the hit sequence in FASTA format. You might not be able to send an email with everything in it. Instead, when the job is finished you can email a link to the results page.
That requires persistance, which means either storing the results data in a database or on the file system. I'll use the file system. Each job will have a unique job id and will have a job directory based on the id. I'll store the results in that file.
Suppose the server machine crashes. A limitation of the current system is that a reset like that removes all currently running jobs and the user will never get an email. As the administrator I want some ability to restart all unfinished jobs after the machine starts up again. The restart program does not need to be automatic nor does it need to be part of TurboGears.
To support restart I'll need to save the input parameters in the job directory. I'll also need some way to tell if a job is done. I'll create the file named "SUCCESS" if the program finished successfully and the file "ERROR" if there was an error. These files will be zero-length. Their mere existance is enough to know if there was a success or failure.
Here's what I'll do:
- Modify the start_job controller function to:
- create a unique job id
- create a job directory for the job
- save the input parameters and configuration information in the new directory
- pass the job directory name to the wrapper program
- Modify the wrapper program to:
- read the configuration file instead of the command-line
- when finished, create the SUCCESS or ERROR file
- send the email with status information and, if success, a URL
- Add a "results" controller to see the job results
If you're developing something for a local group where security isn't a problem then you don't need to worry about these possible attacks. If you have decent authentication (and you check it correctly!) then again you don't need to worry. But I've found it's best to assume the worst and use a random identifier. Even then, making unique randomly based numbers is hard.
Here's one I wrote just now which is okay. It makes a short random string of letters then uses the file system to determine if the job identifier is unique. If the mkdir succeeds then no other job has that identifier. That directory becomes the job directory. If it fails I try again, for a while. I specified that the job is a string of 8 consonant letters. That's long enough that it's hard to guess numbers by random (better than a 1-in-a-million chance) and small enough to read over the phone. I removed the vowels, except y, to make it harder to get swear words. I also removed 'l' so there's no accidental confusion with '1'. Perhaps numbers are better, or perhaps numbers with hypens in them, to make it easier to read.
The code was large enough that I moved it into its own file name "Job.py". Here's module definition:
import os import random JOBDIR_BASE = "/Users/dalke/nbn/jobs" letters = "bcdfghjkmnpqrstvwxyz" class Job(object): def __init__(self, job_id): self.job_id = job_id self.job_dirname = os.path.join(JOBDIR_BASE, "job-" + job_id) self.config_filename = os.path.join(self.job_dirname, "config.ini") # convenience function to look up a file inside the job directory def fullpath(self, filename): return os.path.join(self.job_dirname, filename) def make_job(): # try at most 1000 times before giving up for i in xrange(1000): job_id = "".join([random.choice(letters) for dummy in "12345678"]) job = Job(job_id) try: os.mkdir(job.job_dirname) return job except OSError, err: # try again pass # raise the last OEError received raise errNote that all the job directories will be located under "/Users/dalke/nbn/jobs". I hard-coded the path name into the module. Another option is to use the TurboGears configuration system, which I won't talk about.
Updating the start_job controller
I'll do each of the three changes in parts. Here is the new start_job controller. I skipped a few imports - later on I'll show the complete controllers.py file.
@expose(template="slowdiv.templates.start_job") def start_job(self, x, y, email_address): # Validate the input try: int(x) except ValueError: return dict(error="parameter 'x' is not an integer") try: int(y) except ValueError: return dict(error="parameter 'y' is not an integer") if "@" not in email_address: return dict(error="invalid email address") # Use a cookie to remember this email address for the future cherrypy.response.simpleCookie['email'] = email_address # Make the job, with unique jobid and new job directory job = Job.make_job() # Save the configuration information in "INI" format config = ConfigParser.ConfigParser() config.add_section("settings") config.set("settings", "job_id", job.job_id) config.set("settings", "job_dirname", job.job_dirname) config.set("settings", "email", email_address) config.set("settings", "results_url", "http://localhost:8080/results?job_id=" + job.job_id) config.add_section("inputs") config.set("inputs", "x", x) config.set("inputs", "y", y) config.write(open(job.config_filename, "w")) # Start the request result = os.system("%s %s &" % (SLOW_DIV_WRAPPER, job.config_filename)) if result != 0: log.error("Could not start " + SLOW_DIV_WRAPPER) return dict(error="Failure: misconfigured server", job=None) return dict(error="", job=job)
The only unusual thing should be the ConfigParser module. I tried this out as an experiment. There are five common ways these days to store configuration information in a file: XML format, "INI" format, YAML, Python's "pickle" format or a new file format. I haven't used the ConfigParser module before (which parses INI files) and this data was simple so I tried it. Otherwise I would likely use an XML format.
Here's an example of the INI output created by the "start_job" controller:
[inputs] y = 0 x = 8 [settings] results_url = http://localhost:8080/results?job_id=tcfnctbq job_dirname = /Users/dalke/nbn/jobs/job-tcfnctbq job_id = tcfnctbq email = dalke@dalkescientific.com
I also updated the associated "start_job.kid" template to link to the expected results page. Someone could follow that link before it's ready so I'll have to handle that case later on. Here's the updated template:
<body> <P py:if="error">Could not start job: <i>${error}</i></P> <P py:if="not error">You request has been submitted and you will shortly receive an email when <a href="results?job_id=${job.job_id}">the results page</a> is ready.</P> <hr /> <P><a href="/">Start new division job</a></P> </body>
Update the wrapper program
The slow_div_wrapper.py program must now takes the configuration information from the "config.ini" file, passed in on the command line.
#!/usr/bin/env python # Given two numbers and an email address as the input arguments, # divide the first number by the second and email the results to the # given email address. import os, sys import subprocess import smtplib import ConfigParser from email import Message SLOW_DIV = "/Users/dalke/nbn/slow_div" # This forwards to my 'dalkescientific.com' SMTP server SMTP_SERVER = "localhost:1025" FROM_ADDRESS = "Job Manager <jobs@dalkescientific.com>" # Read the configuration file config_filename = sys.argv[1] config = ConfigParser.ConfigParser() config.readfp(open(config_filename)) to_address = config.get("settings", "email") job_dirname = config.get("settings", "job_dirname") results_url = config.get("settings", "results_url") arg1 = config.get("inputs", "x") arg2 = config.get("inputs", "y") # Run the program and capture its stdout p = subprocess.Popen([SLOW_DIV, arg1, arg2], stdout=subprocess.PIPE, stderr=subprocess.PIPE, ) # If there was any error text then there was an error output = p.stdout.read() err_output = p.stderr.read() # Helper function to create/save a file in the job directory def save(filename, text): outfile = open(os.path.join(job_dirname, filename), "w") outfile.write(text) outfile.close() if err_output: body = "Could not compute %s / %s:\n %s" % (arg1, arg2, err_output) save("err_output.txt", err_output) save("ERROR", "") else: # On success stdout contains a line like: # The answer is: 4 # report the 4th field words = output.split() body = ("Your job computing %s / %s has finished\n" "For the answer see %s") % (arg1, arg2, results_url) save("output.txt", words[3]) save("SUCCESS", "") # Commented out for testing - not connected to the network! ##server = smtplib.SMTP(SMTP_SERVER) msg = Message.Message() msg.add_header("Subject", "slow_div results") msg.set_payload(body) ##server.sendmail(FROM_ADDRESS, to_address, msg.as_string()) ##server.quit() print "Sending", to_address print msg.as_string()
A results page
The results page must handle three conditions. The job could have finished successfully, finished with an error, or it is still being processed. To figure out which is which I'll have to check for the special files "SUCCESS" and "ERROR". I'll pass a "status" item in the template dictionary which can be "success", "fail", or "running" for each of the three possibile conditions. I'll also load the result or error output as appropriate.
@expose(template="slowdiv.templates.results") def results(self, job_id): job = Job.Job(job_id) if not os.path.exists(job.job_dirname): raise KeyError(job_id) # should have a better error message config = ConfigParser.ConfigParser() config.readfp(open(job.config_filename)) results = dict(job=job, config=config) # success or failure? Or not yet finished? if os.path.exists(job.fullpath("SUCCESS")): results["status"] = "success" results["value"] = open(job.fullpath("output.txt")).read() elif os.path.exists(job.fullpath("ERROR")): results["status"] = "fail" results["err"] = open(job.fullpath("err_output.txt")).read() else: results["status"] = "running" return resultsThat gives enough information to the template so it can display the right text for each case. Here's the template
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://purl.org/kid/ns#" py:extends="'master.kid'"> <head> <meta content="text/html; charset=UTF-8" http-equiv="content-type" py:replace="''"/> <title>SlowDiv results - ${job.job_id}</title> </head> <body> <P> Results for job '${job.job_id}' submitted by ${config.get("settings", "email")}. </P> <P py:if="status == 'success'"> <i>${config.get("inputs", "x")}</i> / <i>${config.get("inputs", "y")}</i> == <b>${value}</b> </P> <P py:if="status == 'fail'"> Could not compute <i>${config.get("inputs", "x")}</i> / <i>${config.get("inputs", "y")}</i>: <b>${err}</b> </P> <P py:if="status == 'running'"> Still working on it. Come back later. </P> </body> </html>
If I wanted to I could add a refresh 'meta' element to the header when the status is 'running.' That would reload the page every, say, 15 seconds until the status is no longer running. Eventually (hopefully) the slow_div program will end and the page will end up showing the success or failure results.
The full controllers.py file
As promised, the entire controllers.py file
import logging import cherrypy import turbogears from turbogears import controllers, expose, validate, redirect from turbogears import identity from slowdiv import json log = logging.getLogger("slowdiv.controllers") import os import subprocess import cherrypy import ConfigParser import Job SLOW_DIV_WRAPPER = "/Users/dalke/nbn/slow_div_wrapper.py" class Root(controllers.RootController): @expose(template="slowdiv.templates.homepage") def index(self): email_cookie = cherrypy.request.simpleCookie.get("email", None) if email_cookie is None: email = "" else: email = email_cookie.value return dict(email=email) @expose(template="slowdiv.templates.start_job") def start_job(self, x, y, email_address): # Validate the input try: int(x) except ValueError: return dict(error="parameter 'x' is not an integer") try: int(y) except ValueError: return dict(error="parameter 'y' is not an integer") if "@" not in email_address: return dict(error="invalid email address") # Use a cookie to remember this email address for the future cherrypy.response.simpleCookie['email'] = email_address # Make the job, with unique jobid and new job directory job = Job.make_job() # Save the configuration information in "INI" format config = ConfigParser.ConfigParser() config.add_section("settings") config.set("settings", "job_id", job.job_id) config.set("settings", "job_dirname", job.job_dirname) config.set("settings", "email", email_address) config.set("settings", "results_url", "http://localhost:8080/results?job_id=" + job.job_id) config.add_section("inputs") config.set("inputs", "x", x) config.set("inputs", "y", y) config.write(open(job.config_filename, "w")) # Start the request result = os.system("%s %s &" % (SLOW_DIV_WRAPPER, job.config_filename)) if result != 0: log.error("Could not start " + SLOW_DIV_WRAPPER) return dict(error="Failure: misconfigured server", job=None) return dict(error="", job=job) @expose(template="slowdiv.templates.results") def results(self, job_id): job = Job.Job(job_id) if not os.path.exists(job.job_dirname): raise KeyError(job_id) # should have a better error message config = ConfigParser.ConfigParser() config.readfp(open(job.config_filename)) results = dict(job=job, config=config) # success or failure? Or not yet finished? if os.path.exists(job.fullpath("SUCCESS")): results["status"] = "success" results["value"] = open(job.fullpath("output.txt")).read() elif os.path.exists(job.fullpath("ERROR")): results["status"] = "fail" results["err"] = open(job.fullpath("err_output.txt")).read() else: results["status"] = "running" return results
View all results for a user
Job management through directories works but there are limitations. You may want to add a "list all jobs" page for the users. You can go though all of the directories checking the configuration file for those jobs coming from the correct email address but that search takes time. A clever solution might be to add "user" directories with symbolic links to the project directories for a given user. For example:
~/nbn/jobs/job-nybbyxft ~/nbn/jobs/job-pnwghhpt ~/nbn/jobs/job-tcfnctbq ~/nbn/jobs/job-vmntvppd ~/nbn/jobs/job-vzcqdtyd ~/nbn/jobs/user-dalke@dalkescientific.com/job-nybbyxft -> ../job-nybbyxft ~/nbn/jobs/user-dalke@dalkescientific.com/job-tcfnctbq -> ../job-tcfnctbq ~/nbn/jobs/user-dalke@dalkescientific.com/job-vzcqdtyd -> ../job-vzcqdtyd ~/nbn/jobs/user-dan@nbn.ac.za/job-pnwghhpt -> ../job-pnwghhpt ~/nbn/jobs/user-dan@nbn.ac.za/job-vzcqdtyd -> ../job-vzcqdtydEverytime a job is added, also add the correct symbolic link.
That will work, and only requires a few lines of code.
- check that the email address doesn't contain a '/', chr(0) or other characters deemed unacceptable
- make the user directory if it doesn't exist
- make the symbolic link
- add a 'creation_date' timestamp to the config.ini file so the list can be in chronological order
Suppose you want more complicated queries, like "your BLAST searches" and "your most recent literature searches." Indexing through the file system might help but then you've got the disease "the filesystem is a database." While it is a type of hierarchical database, it's not that good for complex searches. (Though some file systems are pretty good at medium-level searches, which might be good enough.)
Instead, use a database with real query support. You can go the other way though and get the disease "the database is a filesystem" and end up storing large files, like images and movies, in the database when the filesystem does a really good job of that.
There is another big advantage to databases - integrity. It's actually the "I" in the so-called "ACID" requirements for a database. Atomic, Integrity, .. and two more terms. Suppose the filesystem becomes full or write permissions get screwed up or there was a partial restore of the filesystem that cleared some of the URLs. It's likely that some of the file-system-based data records will get messed up. A real database, on the other hand, implements certain gaurantees, like "the database will never be in an inconsistent state" and "an unfinished transaction rolls back to the previously good state."
One last interesting bit about using a database server. If you use a database you'll have tables like 'Job' (with at least the job id and job status), 'JobInputs' and 'JobResults'. You can also have a table called "JobQueue". After a new job is added to the database, add its job identifier to the job queue. Modify the compute client (the wrapper in my case) so that when it has nothing to do it connects to the database, grabs a job from the queue, changes the job status from "queued" to "running", adds it to the "JobRunning" table, and starts working. When finished it removes the job from the JobRunning table and saves the computed job results into the database. If no jobs are available, wait a few seconds and try again.
This is a basic distributed queuing system. It can have multiple compute clients configured so if there are no queued jobs it waits a few moments and tries again.
Copyright © 2001-2020 Andrew Dalke Scientific AB