Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2003/10/07/exceptions

Exceptions to the graph model

Yesterday I said that many compounds could be represented as a graph. The computer scientist in you might ask what can't be represented as graphs. Salts (like good old NaCl) are one. These have ionic bonds caused by the interactions between strong charges, and not covalent ones caused by the overlap of orbitals. Salt dissolves so easily because a charge doesn't care what it interacts with so long as it's the opposite charge. Water has its own partial charges which love to join in on the fun, and the result is entropy heaven.

Crystals are infinite in extent. What's the graph of diamond or graphite?

A graph based on covalent bonds cannot handle metals, which have delocalized electrons and so no real covalent bonds. There's even problems when there's only one metal atom, like in the heme of hemoglobin or in photosynthesis. An atom binds to the delocalized ring of the heme and not to any one atom. There are ways to fake it, but I recall that Cherwell's tookit supported "ring" as a new graph element, in addition to atoms and bonds.

And then there's hydrogen gas absorbed in platinum. Not only is Pt a metal but the hydrogens reversibly bind to the Pt atoms so move about.

Polymers cause more problems. These can have a huge number of different lengths and branchings ("1020 and more" for polysaccharides!). While each one can be represented as a discrete graph, you'll need better than 64 bits to enumerate them all. Even if you do that, it isn't obvious what you can do with that data.

The nice thing about these exceptions is that they rarely make good drugs. Good drugs are small, have well-defined shapes, and only come in one of at most a couple chiral forms. The exceptions are usually protein-based drugs, like insulin, where non-graph models are more appropriate.

The usefulness of the molecular graph decreases with the size of the molecule. Molecules have a three-dimensional structure (called a conformation) as well as the two-dimensional structure (the graph). Small molecules usually have one or only a few common conformations and these can be determined reasonably well directly from the graph using Corina, Omicron, Concord, or similar programs. Even when the prediction isn't exactly the minimum energy state, thermal motion and other flucuations make it likely that the molecule will be close to a predicted conformation at some point.

The larger the molecule the less likely it will be predicted correctly, and the less likely it will be ergodic. (That's fancy for "explore all phase space." That's fancy for "take on all available conformations.") Huge molecules like proteins are extremely hard to predict ab initio and are not ergodic. That's the whole protein folding problem, and the Levinthal paradox is that folding does happen despite the combinitorial problems.

When a molecule gets that large, its conformation plays a much more important role. For example, the serine protease works because of the arrangement of the catalytic triad. You can change a lot of the rest of the molecule, making a very different graph, and still have a serine protease so long as the fold is correct.

A better model for this stage is the bioinformatics one, where proteins and DNA are represented by one-dimensional sequences (a high-level abstraction of the graph). This allows similarity searches, which are extremely hard to do for graphs, and supports concepts like phylogenies to describe the evolutionary relationship between sequences.


Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me



Copyright © 2001-2013 Andrew Dalke Scientific AB