Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2005/04/23/matplotlib_without_gui

matplotlib without a GUI

matplotlib has two primary APIs. The easiest to use is the pylab interface. When that module is imported it checks for a setup file, initializes the GUI, and does a few other things to simplify interactive plotting. pylab is built on top of the matplotlib API. This is a more object oriented API, but still not that difficult to use. They can be used together, as you might have seen in my code from last time.

If you don't want a GUI, or want to specify which GUI to use independent of the user's settings, then you must use the matplotlib API. One example use case is to make an image file from the command-line or, as I'll show in a bit, for a web page.

To use it requires a better understanding of how matplotlib works. All of the graphs and plots are made to a Figure, which stores an abstract description of the graphics. When it's time to view the Figure it is rendered into a FigureCanvas. Matplotlib comes with several canvases, called backends. Some are for GUI toolkits like Qt and Gtk, others for formats liks PostScript and SVG. The more portable one (meaning it looks the same even under different GUIs and operating systems) uses agg. This is the Anti-Grain Geometry toolkit, a 2D drawing package. The AGG-based backends use the AGG library to make an image which is then drawn to the screen or written to a file.

To make the image file using the matplotlib API I need to make my own Figure and FigureCanvas. I'll use the FigureCanvaseAgg for the last. The from pylab import * brought in some numeric functions and the Polygon class automatically, so I'll need to import them myself. Python has two numeric libraries – Numeric and numarray. (Why?) To simplify matters, matplotlib includes a middle layer called numerix which provides one way to access whichever library was configured.

from openeye.oechem import *
import stats

from matplotlib.figure import Figure
from matplotlib.patches import Polygon
from matplotlib.backends.backend_agg import FigureCanvasAgg
import matplotlib.numerix as nx


def Ellipse((x,y), (rx, ry), resolution=20, orientation=0, **kwargs):
    theta = 2*nx.pi/resolution*nx.arange(resolution) + orientation
    xs = x + rx * nx.cos(theta)
    ys = y + ry * nx.sin(theta)
    return Polygon(zip(xs, ys), **kwargs)

# Read up to 'limit' records that have XLogP values
# Return the lists of:
#  identifiers, molecular weights, XLogP values
def read_data(ifs, limit = None):
    cids = []
    weights = []
    xlogps = []
    for i, mol in enumerate(ifs.GetOEGraphMols()):
        # Some of the compounds don't have an XLOGP value
        # Skip those molecules
        if not OEHasSDData(mol, "PUBCHEM_CACTVS_XLOGP"):
            continue
        cid = OEGetSDData(mol, "PUBCHEM_COMPOUND_CID")
        weight = OEGetSDData(mol, "PUBCHEM_OPENEYE_MW")
        xlogp = OEGetSDData(mol, "PUBCHEM_CACTVS_XLOGP")
        if (cid == "" or weight == "" or xlogp == ""):
            raise AssertionError( (cid, weight, xlogp) )

        cids.append(cid)
        weights.append(float(weight))
        xlogps.append(float(xlogp))

        if limit is not None and len(cids) >= limit:
            break
        
    return cids, weights, xlogps

def calculate_ellipse_data(xdata, ydata):
    xcenter = stats.lmean(xdata)
    xradius = stats.lstdev(xdata)
    ycenter = stats.lmean(ydata)
    yradius = stats.lstdev(ydata)
    return (xcenter, ycenter), (xradius, yradius)

def main():
    filename = "/Users/dalke/databases/compounds_500001_510000.sdf.gz"
    ifs = oemolistream(filename)

    # The figure will be 3 inches by 3 inches
    # Ths size is important because the text is defined relative to
    # inches and not pixels.  In a smaller image the text is more
    # cramped and likely to overlap.  In a larger image the text is
    # not big enough.  This works well for my plot.
    fig = Figure(figsize=(4,4))
    
    ax = fig.add_subplot(111)

    cids, weights, xlogps = read_data(ifs, 100)
    ax.scatter(weights, xlogps)
    center, radii = calculate_ellipse_data(weights, xlogps)
    ax.add_patch(Ellipse(center, radii, fill=0, edgecolor="blue"))
    
    cids, weights, xlogps = read_data(ifs, 100)
    ax.scatter(weights, xlogps, marker = "^", color="red")
    center, radii = calculate_ellipse_data(weights, xlogps)
    ax.add_patch(Ellipse(center, radii, fill=0, edgecolor="red"))
    
    ax.set_xlabel("Atomic weight")
    ax.set_ylabel("CACTVS XLogP")

        # Make the PNG
    canvas = FigureCanvasAgg(fig)
    # The size * the dpi gives the final image size
    #   a4"x4" image * 80 dpi ==> 320x320 pixel image
    canvas.print_figure("mw_v_xlogp_ellipses.png", dpi=80)

if __name__ == "__main__":
    OEThrow.SetLevel(OEErrorLevel_Error)
    main()

As you may have read in the code, one tricky bit is the relationship between the canvas size and the final image size. matplotlib does everything in continuous space with no concept of pixels. The size of the image is in inches, the width of lines and the font sizes are in points. If you specify the figure size to be 1 inch by 1 inch and the font size to tbe 72 (that's 72 points) then a single letter will fill the figure because there are 72 points to the inch.

Most image formats are in pixels, and the number of pixels in an inch depends on the screen size, resolution, and even the distance to the screen. The conversion to pixels is done by the print_figure() call, which is why it takes the dpi parameter. Looking at the library code I see that the Figure() constructor also takes a dpi parameter. I don't know when it's used.

(Update on May 11: John Hunter said "It is used when building the GUI window (eg dpi should reflect your screen dpi). It is not used for hardcopy, because the print_figure dpi setting overrides it.")

The final size is useful for making web pages. The HTML command for inserting an image looks like the following

  <IMG SRC="filename.png" HEIGHT="200" WIDTH="320">
The IMG tag has attributes for the image height and width. The browser uses these to compute the page layout without needing to fetch the image first. This reduces the time needed to get an initial display of the web page.

I can't find any place in matplotlib that gives the final image size in pixels. I think the right place is to have print_figure() return a data structure with information about what it just printed (image size, perhaps the number of bytes). But it doesn't so I'll compute it myself.

dpi = 80
height_in_pixels = fig.get_figheight() * dpi
width_in_pixels = fig.get_figwidth() * dpi


Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me



Copyright © 2001-2013 Andrew Dalke Scientific AB