Dalke Scientific Software: More science. Less time. Products
[ previous | next ]     /home/writings/diary/archive/2020/10/09/molfile_precursor

A molfile precursor?

I think I found a precursor to the MDL molfile in a 1973 publication by Gund, Wipke, and Langridge. Here it is:

Background: MDL and MACCS

The SDFile format is one of a suite of related formats which came out of MDL (Molecular Design Limited) starting in the late 1970s. (An SDFile is a molfile followed by a data block followed by the SDFile delimiter.) Quoting Wikipedia:

MDL was the first company to provide interactive graphical registry and full and substructural retrieval. The company's initial products were first-of-their-kind systems for storing and retrieving molecules as graphical structures and for managing databases of chemical reactions and related data. These systems revolutionized the way scientists accessed and managed chemical information in the 1980s.

From its initial pioneering of computer handling of graphical chemical structures with MACCS (Molecular ACCess System) in 1979, MDL continued at the forefront of the field now known as cheminformatics

(Yes, that's the same MACCS as the 166-bit MACCS keys, still widely used for fingerprint similarity. They were developed in the MACCS system as a substructure screen. So many people had a MACCS system, with the screens already computed, that it was a natural data source for the early similarity search implementations.)

Gund, Wipke, and Langridge (1974)

MDL was co-founded by W. Todd Wipke in 1978, who had been working on computer-assisted chemical synthesis design for over a decade.

I came across a 1974 publication, co-authored by Wipke, in proceedings from a 1973 conference in Ljubljana/Zagreb, which appears to contain an early version of what was to become a molfile. Here's the relevant text from the second page of the paper (the page number on the bottom of the page is 5/34):

DATA STRUCTURE. For this preliminary study, an extremely simple format was adopted for the molecular data files, as shown in figure 1. The pattern data files are exactly the same, except the bond order may be zero. An advantage of having the same format is that patterns may be displayed and manipulated as if they were molecules.

and here's Fig. 1, Format for molecular structure files, at the top of the next page (5/35):

You can see the basic structure is similar to a Molfile, though a Molfile has two additional lines for misc. info., and the counts line and the atom and bond lines all have a few more fields.

The full citation is: Gund, P., Wipke, W. T., and Langridge, R., Computer Searching of a Molecular Structure File for Pharmacophoric Patterns, Computers in Chemical Research and Education, vol 3, pp. 33-38 (1974).

The publication contains papers from a conference in in 1973 but Google Scholar says it was published in 1974 so I'm going with that. Also, the above quote is from the second page of the paper and the figure is at the top of the third page. The page numbers on the bottom of the pages are "5/34" and "5/35", respectively.

It does not appear to be based on a previously widely used connection table format. The 1971 textbook Computer handling of chemical structure information by Lynch, Harrison, Town, and Ash does not show any format with coordinate information, though it does mention that some exist.

Do you know of an earlier molfile-like format? Let me know!

Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me

Copyright © 2001-2020 Andrew Dalke Scientific AB