Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2014/07/06/connection_table_origin

The origin of the connection table

Instead of doing real work over the last month, I ended up trying to understand more about the origin of the connection table, and the origin of the phrase "connection table."

(Regarding real work, chemfp-1.2 is in beta testing, and will be released this month. If you're interested in high-performance Tanimoto fingerprint search, take a look at it. Version 1.1 is available right now, at no cost, for you to download, evaluate, and use, before deciding to buy a copy of version 1.2.)

Importance of Calvin Mooers' connection table

In my previous essay, I described some of work of Calvin Mooers in the area of chemical documentation now called cheminformatics. While I don't think he implemented any of his ideas, they were quite influential. My research has lead me to believe that he developed the first practical connection table.

Here are examples from the literature which support my belief:

All of these refer to the Mooers' paper "Ciphering Structural Formulas – the Zatopleg System", Zator Technical Bulletin No. 59 (1951). Unfortunately, I haven't yet been able to get ahold of that paper, but a patent reference in US3476311 and the Meyer and Wenke (1962) citation above give enough of a description to be certain that it's an entirely reasonable and usable connection table format.

George Wheland as the creator of the connection table

My research lead me to believe that Mooers invented the connection table. If you look around, you'll see that some people say it was George Willard Wheland who came up with the connection table, in 1949. These include:

I thought this was odd, since I haven't come across any historic references to Wheland's connection table. It's not unusual though to find that several people come up with an idea independently, and where the earlier publication isn't really discovered until after the later publication becomes widely known. One of the best known examples is the Cooley-Tukey fast Fourier transform, which was published in 1965. Only later was it identified that Carl Friedrich Gauss developed and used the same algorithm in 1805.

What is Wheland's connection table?

The Wheland connection table reference is from the text book Advanced organic chemistry, p87. Thanks to the University of Michican and the HathiTrust for scanning that book and making it available.

The connection table is expressed as the upper-right triangle connection matrix, where position (i, j) contains the covalent bond order, or is 0 if there is no bond. Here's an image of the two example tables:

I don't like this. While I can see how it might be a connection table, it's not a good one. For one, it takes N*(N-1)/2 memory, so a 1,000 atom molecule will take 0.5 MB of memory, almost all of which are zeros. This is possible on modern machines, but there's no way a 1950s/1960s era programmer would choose this approach.

(You can see that concern about space in Lynch's essay, cited above, where they evaluated the Meyer and Wenke (1962) paper, also cited above:

Ernst Meyer at BASF had also introduced a form of connection table, which to our eyes at the time seemed highly redundant and space consuming (Meyer & Wenke, 1962).
Meyer's connection table, based on the Mooers connection table, allocates space for 4 bonds for each atom. This sets an upper limit to the atom valance, but also means a lot of 0s for halogens, oxygens, etc. In any case, it's still much more compact than the connection matrix.)

What's interesting is that Wheland also didn't think this matrix was useful, writing:

The irrelevance of geometrical considerations in the definition of a structure can be shown most conclusively by a discussion of some of the remaining, less convenient and less familiar, ways in which structures can be specified. One of these ways consists in giving a purely verbal description ...
    A further nongeometrical way of describing the structures of acetaldehyde and of ethylene oxide is slightly more illuminating than the verbal one; when this method is adopted, the two structures are expressed as in Tables 4-1 and 4-2, respectively. The numbers in the bodies of these tables represent the number of covalent bonds between the corresponding atoms at the left of the rows and at the tops of the columns. The two tables are easily seen to be different from one another, but to be completely equivalent to the respective conventional diagrams and verbals descriptions. ...
The phrase "some of the remaining" suggests that Wheland did not come up with these non-geometrical descriptions, but really interesting part is on page 88:
... The foregoing alternative ways of describing structures have not been given here with the idea that they would be of practical use, but rather with the hope that they would serve to emphasize the fact that structures, as such, need have no geometrical implications. Since exactly the same information which is contained in a conventional diagram can be given equally well (even though incomparably less conviently) by an obviously nongeometrical verbal description or table, then the diagram, in spite of its appearance, must also be actually nongeometrical. In other words, all geometrical features of the diagrams which are not contained in either the verbal description or the table must be disregarded as of no significance. (Later, when the discussion is of configuration rather than of structure, this extreme statement will require some modification. See Chapters 6-9.)
That sounds like Wheland didn't think it would be useful for practical matters.

That makes Lynch's quote, G. W. Wheland had been the first to show how this could be done (1949) all the more odd. Wheland didn't show how this could be done, in any practical sense, nor did Wheland claim that it was practical. And as I mentioned, I also get the suggestion that Wheland might not have created that representation.

Wheland (1949) should likely be Wheland (1946)

All of the four Wheland quotes I listed reference Wheland (1949). The connection matrix is on p87 in chapter 4. The copyright page says Chapters 1-10 copyrighted as Syllabus for Advanced Organic Chemistry 321 by The Univeristy of Chicago, 1946. Thus, if the earlier version also has the connection matrix, then the correct citation should likely be Wheland (1946).

Only Jarosław Tomczak also referenced Wheland (1946). Tomczak's paper also describes the structure in Table 4-1, so either Tomczak looked at Wheland's text book, or used a very good secondary reference. In either case, I commend the good scholarship.

The question I have is, does the earlier book include the connection matrix? According WorldCat, copies of that book are available from the University of Chicago, Wayne State University, and Mississippi State University libraries. Perchance a reader could get ahold of the book and verify it for me?

Who was Wheland?

It's easiest if I just copy the abstract from the chapter George W. Wheland: Forgotten Pioneer of Resonance Theory:

George W. Wheland, although little remembered by the general chemistry public today, is forever linked to resonance theory through three seminal papers written with Linus Pauling and through two substantial monographs (1944 and 1955) on resonance. At the University of Chicago he carried out research on organic acids and bases, while continuing to publish papers on quantum chemistry. He also wrote three editions of a highly regarded text on "Advanced Organic Chemistry." Sadly, his scientific career ended long before his death when he contracted multiple sclerosis. This chapter gives an overview of his career, writings, and research in quantum chemistry.

One of those papers with Pauling is "The nature of the chemical bond. V. The quantum-mechanical calculation of the resonance energy of benzene and naphthalene and the hydrocarbon free radicals." March 21, 1933. J. Chem. Phys. 1 (June 1933): 362-374. You should take a look at the first few pages to remind yourself that only 90 years ago we still weren't really sure about the structure of benzene.

Apparently here we have a case where the classical ideas of structural organic chemistry are inadequate to account for the observed properties of a considerable group of compounds. With the development of the quantum mechanics and its applications to problems of valence and molecular structure, it became evident to workers [as, for example, Slater in 1931] in this field that the resonance of benzene between the two equivalent Kekeulé structures was an essential feature of the structure of this molecule, accounting for the hexagonal symmetry of the ring and for its remarkable stability; and it seemed probable that the quantum mechanical treatment of aromatic molecules would lead to a completely satisfactory explanation of their existence and characteristic properties.

In the paper they described a simplification of Hückel's work which made it practical to extend valence bond theory to larger structures like naphthalene. I think someone who worked with the secular equation of Hückel could easily come up with the connection matrix, so it's easily possible the Wheland was the first to come up with the idea.

BTW, I don't know much about quantum chemistry, so "A Chemist's Guide to Valence Bond Theory" by Sason S. Shaik and Philippe C. Hiberty, was quite insightful to this outsider. The authors go into some of the history and impact of that paper on the field. I like the quote of Wheland which "explains the resonance hybrid with the biological analogy of mule = donkey + horse."

I really like how the authors describe the historical context of the debate between valence bond theory and molecular orbital theory. My chemistry knowledge is not very deep, and while I know some basic molecular orbital theory, my intuition is more aligned with the classical Lewis dot model of valence bond theory. Shaik and Hiberty describe how valence bond model was popular until the 1950s precisely because it can be seen as a quantum mechanics interpretation of the VB model, which would appeal better to most chemists of that era. One of the many things which helped MO gain ground was "the construction of intuitive MO theories"; I never got to that point in my chemical studies.

I also found the discussion about the "religious war-like rivalry" between valence bond theory and molecular orbital theory quite fascinating. Chemists are, after all, human.

Why is Wheland recognized as the creator of the connection table?

I think Wheland's connection matrix is a precursor to a connection table, but it's not really a usable connection table for chemical informatics. I don't know of any cheminformatics toolkit based on that data structure, either now or back when the field was still called "chemical documentation." Nor did Wheland suggest that it would be practical. Why then do people reference Wheland?

(To be clear; my "connection table" may be more restrictive than others might use. I mean something which is reasonable to use in a chemical information system. It's also possible that early systems did use a connection matrix form for substructure search.)

I know Mireille, so I started by sending her email. She couldn't recall the details, nor had a record of it in her notes, but she believes her knowledge came from Lynch's essay or from an essay from Eugene Garfield.

I sent email to what I think is Jarosław Tomczak's address, but haven't received a reply.

I sent an email to Bob Williams, who doesn't remember that detail after 16 years. Bob got help from Val Metanomski (now deceased), Eugene Garfield, and Mary Ellen Bowden, and suggested I contact them.

I emailed both of the latter. Gene replied, but wasn't able to resolve these details despite searching for a couple of hours. I haven't heard back from Bowden.

Working hypothesis: people are referencing an intermediate publication

My working hypothesis is that a book or essay was published between about 1980 and 1995 with the history of the chemical representations, including line notations and the connection table. This author, through good scholarship, came across Wheland's "highly regarded text", and described it as a connection table, using a broader definition of 'connection table' than I have. Many people read that book, and the ideas in it became part of the collective knowledge.

One such book might have been "Chemical Graph Theory: Introduction and Fundamentals", edited by Danail Bonchev and Dennis H. Rouvray. Quoting from chapter 3, "Nomenclature of Chemical Compounds" page 99, by Alan L. Goodson:

A connection table has been defined [6] as a uniquely ordered list of the node symbols of the structure (or graph) in which the value (atomic symbol) of each node and its attachment (bonding) to the other nodes of the total structure are described ...

Connection tables, notations, and nomenclatures are of value in different ways and an in-depth discussion of their use has been published recently [8]. Connection tables, being atom-by-atom computer records of chemical structures, are useful for machine registration of chemical structure and for substructure searching ...
Under that definition, Wheland's matrix counts as a connection table. I'm surprised though that no one before Wheland considered a matrix representation of a valence bond model.

Google Books' text search failed to find "Wheland" in Bonchev and Rouvray's book, so that's not the intermediate publication.

Reference 8 is R. Lees and A. F. Smith (Ed.) "Chemical Nomenclature Usage", Ellis Horwood, Chinchester (1983). A limited (search only) version is available from the HathiTrust, which is enough to determine that "Wheland" could not be found in the book.

So that's a dead end. Citation [6] is from Morgan, which I describe below. The chapter has an extensive list of references, which may be useful future leads, though they are more closely associated with nomenclature than a connection table.

Perhaps someone here has an idea of how Wheland became known as the creator of the connection table?

Who coined the phrase "connection table"?

I also wondered who was the first to use term "connection table". Lynch says "With Dyson we looked at the random matrix, that is, connection tables", and Meyer and Wenke refer to Mooers's connection table as "topological coding", so it doesn't seem like people used "connection table" during the 1950s and early 1960s.

The earliest reference I've found so far is in from the Cossum et al. quote and citation I mentioned earlier, which was received March 23, 1964.

With the help of Google Scholar, I found a slightly later publication which uses "connection table" in the title:

Cossum, W. E., M. E. Hardenbrook, and R. N. Wolfe, "Computer generation of atom-bond connection tables from hand-drawn chemical structures", Proceedings of the American Documentation Institute, 27th meeting, Philadelphia, Pennsylvania October 5-8, 1964; volume I, pp 269-275.

The term spread quickly. H. L. Morgan's "The Generation of a Unique Machine Description for Chemical Structures – A Technique Developed at Chemical Abstracts Service", J. Chem. Doc. (1965), received January 15, 1965, states on p. 108:

The structure description employed in the CAS registration process is a uniquely ordered list of the node symbols of the structure (or graph) in which the value (atomic symbol) of each node and its attachment (bonding) to the other nodes of the total structure are described. Such as list and description is called a "connection table." Since this paper is not concerned with structure input, the connection table which is described is that stored and manipulated by the computer. The form of the table which is used within the computer is not the most convenient form for input to the system; thus the input form is translated by the computer into the "compact connection table" developed by D. J. Gluck of du Pont2.
where reference 2 is D. J. Gluck, "A Chemical Structure, Storage and Search System. Development at Du Pont" J. Chem. Doc. 5, pp. 43-51 (1965). Unfortunately, I don't seem to have a copy of that paper, but as it's a 1965 citation, it doesn't antecede 1964.

On the other hand, take a look at this citation (which I'm not paying $31.50 to read):

G. M. Dyson, W. E. Cossum, M. F. Lynch, H. L. Morgan, "Mechanical manipulation of chemical structure: Molform computation and substructure searching of organic structures by the use of cipherdirected, extended and random matrices", Information Storage and Retrieval, v1, issues 2-3, July 1963, pp 49-99. DOI: 10.1016/0020-0271(63)90011-1.
The abstract is:
General methods have been devised for generating mechanically (a) from the IUPAC cipher, and (b) from random-numbered structures, three types of matrix from which the molecular formula (molform) of a structure can be computed, and machine searches made for any conceivable substructure or combination of substructures. The machine/matrix language is independent of the generating cipher; such cipher, and other ciphers depending on exact structural delineation can be regenerated from the machine/matrix. This enables the latter to be used as a feasible common language between notations.
Isn't that interesting! All of the authors who used "connection table" in 1964/1965 are in that 1963 paper, all of them were at Chemical Abstracts, and they use the terms "cipherdirected, extended and random matrices", and not "connection table."

This is a pretty clear indication that the term was coined in 1963, and likely by someone at CAS.

Possible antecedents

Google Scholar suggests a couple of antecedents to a 1963 date for "connection table", the most relevant being Marvin Minsky's Steps Toward Artificial Intelligence in Proceedings of the IRE v49 no. 1 (1961): 8-30. and reprinted in Computers and Thought, Ed. E.A. Feigenbaum and J. Feldman, pp. 453-524, McGraw-Hill, 1963.

The quote concerns template-based pattern recognition:

And to recognize the topological equivalence of pairs such as those below is likely beyond any practical kind of iterative local-improvement or hill-climbing matching procedure. (Such recognitions can be mechanized, though, by methods which follow lines, detect vertices, and build up a description in the form, say, of a vertex-connection table.)
I like the idea that one of the CAS group read Minsky's paper in 1963 (AI being a hot topic) and the name stuck. I have no way to verify this, but it's a fun conjecture. [Edited 21 March 2017: As another connection, Minsky learned to program on the SEAC computer. "... Kirsch had sat me down at a desk and said I couldn't get up until I wrote a program for the SEAC." Yes, the same Kirsch. Kirsch was very interested in using computers to understand patterns and images, which I think helped him view chemical search as a topological problem. Minsky's first program recognized a few letters based on their topology, and as far as I can tell this was before 1956, which is when I think the USPTO project started.]

More prosaically, Google Scholar also identified this quote from P.P. Gupta and M.W. Humphrey Davies, Proceedings of the IEE - Part A: Power Engineering, Volume 108, Issue 41, October 1961, pp. 383-398 DOI: 10.1049/pi-a.1961.0077:

The floating busbar is always numbered zero, and all the other junctions are then numbered consecutively. The following tables of data are then prepared and punched on a tape: (a) Connection table. ... The connection table defines the configuration of the network. ...
Again, I haven't paid the $23.94 to read this paper and verify the quote.

Minsky doesn't talk about colored edges (that is, "bonds") in the connection table, nor seemingly does Gupta and Davies, so these aren't connection tables in the chemistry sense. Nonetheless, it's also reasonable to conjecture that the term in chemistry was repurposed from electrical engineering.

Want to leave a comment?

If you know if an earlier use of "connection table", know more about the early history of connection tables, molecular graph representations, Zatopleg, or have anything else to say, please leave a comment, or send email to me at dalke@dalkescientific.com.

Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me

Copyright © 2001-2013 Andrew Dalke Scientific AB