Similarity principle variations
Maggiora, Vogt, Stumpfe, and Bajorath in their 2014 J. Med. Chem
in Medicinal Chemistry write:
In the context of a seminal book publication8 that appeared in the early 1990s when molecular similarity analysis first became popular, the similarity property principle (SPP) emerged, which stated that similar compounds should have similar properties, the most frequently studied property being biological activity.That citation "8" is
Concepts and applications of molecular similarity, edited by Mark A. Johnson and Gerald M. Maggiora (1990), published by Wiley. This is an often-cited reference in the cheminformatics literature. Google Scholar knows about 1411 citations to it.
There's a subtle nuance to the Maggiora et al. miniperspective quote
which I think has been overlooked by many of the people who cite it -
the 1990 book doesn't actually define a
principle! That's why the miniperspective uses the phrases
In the context of and
As it turns out, there isn't even a widely accepted name for this principle. (Defined as having more than 50% of Google Scholar searches.)
I want to be clear - in almost all cases and for most people it's still the correct reference to use. But I think many people aren't aware of the context. At least, I wasn't a few years ago when I first looked at the book.
Some citations from 2020 to Johnson and Maggiora (eds.) (1990)
I used Google Scholar to find papers published since 2020 which cite the book. Here are some of the relevant quotes:
- bioRxiv, Mestres, doi:10.1101/2020.03.30.016485v1:
Under the similarity-property principle, the likely targets of any given molecule should be in consonance with the targets of its chemical neighborhood.
- Front. Pharmacol. Bonanno et al., doi:10.3389/fphar.2019.01675:
Ligand-Based Virtual Screening (LBVS) is underpinned by the concept of similarity as defined in the Similarity Property Principle, which simply states that similar molecules tend to exhibit similar properties (Johnson and Maggiora, 1990).
- Nature, Scientific Reports, Kaushik et al., doi:10.1038/s41598-020-63842-7:
The first one is the ligand-based approach, which is based on the concept that molecules with similar properties usually share their properties and binds with the same kind of proteins31
- ChemRxiv, Menke et al., doi:10.26434/chemrxiv.12894800.v1:
A central dogma in drug design is that similar molecules have similar properties and should bind to the same drug target.5
- J. Med. Chem., Rohall et al., doi:10.1021/acs.jmedchem.9b02130:
A shortcut to indicate whether or not a set of compounds is likely to show effects on a specific target is the similarity principle, by which similar compounds have similar properties.17
- WIREs Comput Mol Sci., Hemmerich et al., 10.1002/wcms.1475:
A consequence of this is the
molecular similarity principle,9 which states that similar structures will exhibit similar biological activities.
- J. Cheminformatics, Cortés-Ciriano et al., doi:10.1186/s13321-020-00444-5:
The underlying idea when these descriptors are used to generate QSAR models is the
Molecular Similarity Principle, which states that the bioactivities of structurally similar compounds tend to be correlated more often than those of dissimilar ones [7, 8].
- These studies are based on the similar property principle of Johnson and Maggiora, which states: similar compounds have similar properties.
similarity-property principle(in various spellings),
Molecular Similarity Principle
(You can also see a couple of citations omit the important qualifier
Wikipedia consider these terms the same. Quoting the Maggiora et al. miniperspective:
Chemical or Molecular Similarity? Although the terms chemical and molecular similarity are often used synonymously, this may not be entirely accurate. Chemical similarity is based primarily on the physicochemical characteristics of compounds (e.g., solubility, boiling point, log P, molecular weight, electron densities, dipole moments, etc.) while molecular similarity focuses primarily on the structural features (e.g., shared substructures, ring systems, topologies, etc.) of compounds and their representation.I am unable to judge if there is widespread agreement with this interpretation. A Google Scholar search for
"chemical similarity" Maggiorafinds about 1,070 matches while
"molecular similarity" Maggiorafinds about 2,520 matches. I tried reading a few, but quickly gave up on trying to figure the nuance of each one and how it applies.
Similarity principles in Johnson and Maggiora
Given the diversity of names, what does the original book use?
The book is out of print. Used copies go for over US$150. Happily for
me, the Chalmers library has
an excellent chemistry collection, including that book. I was able to
scan and OCR it to help me search for phrases related to
similar compounds have similar properties. The
closest I found, citing the author(s) of the relevant contributed chapter, are:
- Rouvray, pp18-19:
Scale modeling rests on the so-called Principle of Similitude , which states that systems constructed similarly on different scales will possess similar properties.
- Rouvray, p27: Table 4 lists many
Major Applications of Similarity Concepts in Twentieth-Century Chemistryincluding:
Concept of isotopy,
Concept of isosterism,
Molecular topology descriptors,
Molecular sequence comparison in evolution studies,
Principle of least nuclear motion,
Principle of minimum chemical distanceand
Molecular charge similarity measure. Rouvray then writes:
All of these are in fact interrelated in the sense that they rest on the same similarity principle, namely, that when molecules undergo transitions or reactions they always do so in a way that minimizes changes in the positions of the nuclei.
- Randić, p94:
Here we use the label
similarloosely, merely to indicate that
adjacentisomers will tend to have similar properties.
- Randić p99:
From a derived ordering, we conclude that the compounds closely positioned in this ordering have close (i.e., similar) properties.
- Hopfinger and Burke, p174:
Even if the key physiochemical properties cannot be identified, it follows that
similarmolecules should have similar biological activities. This is the corollary to the structure-macroscopic-property concept. It is also the reason for the high interest in molecular similarity. Molecules that have similar molecular structures should have similar macroscopic property profiles.
- Ugi, Wochner, Fontain, Bauer, Gruber, Karl, p240:
The advent of the structure theory led to the hypothesis that chemical compounds are similar in their observable properties and behavior if their molecules are structurally similar, and vice versa, a hypothesis that has been, cum grano salis, supported by experimental evidence in many areas. The historical development of molecular similarity as a concept has steadily been paralleled by an increasing ability to correlate molecular similarity with a corresponding similarity of properties and behavior.
- Mezey, p323:
… molecules of similar physical properties are expected to have similar bond skeletons.
But the only use of the term
similarity principle is Rouvray's
molecules undergo transitions or reactions they always do so in a
way that minimizes changes in the positions of the nuclei, there
is no use of
similarity property principle, and the closest
similar compounds should have similar properties
Hardly the standard modern formulation!
Earlier references to the
similarity property principle
The general idea that
similar compounds have similar properties
is very old, and definitely not new with the book. In the preface,
Johnson and Maggiora are very clear they are not trying to claim any
Applications that make use, either explicitly or implicitly, of the concept of molecular similarity in chemistry are numerous, and indeed lie at the heart of a significant body of chemical research. Recently, attempts have been made to place molecular similarity on a more rigorous mathematical and conceptual footing. The fact remains, however, that the principal results lie scattered and isolated in unrelated journals and proceedings from diverse symposia. Moreover, the unifying concept of molecular similarity remains unstated and largely unrecognized. Currently, there is no single source from which one might obtain a reasonable introduction to the broad notion of molecular similarity or to an overview of current developments in the field. Thus, the time appears right for an edited volume of definitive overviews of the topics related to the definition, computation, and application of molecular similarity that emphasizes current research trends and highlights molecular similarity as the unifying concept.
Which means people clearly don't cite Johnson and Maggiora (1990) because it the first to state the similarity principle, nor because it's the first to describe the underlying concept. Let's look for some earlier uses.
My go-to tool to find earlier citations is Google Scholar (because it
doesn't cost me anything.) I searched for
"similar properties" (413 results) and
"similar properties" (98 results). The large majority are of the
we made compound X and measured property Y. We
also tested compounds similar to X and found they had similar
properties. But I'm looking for broader characterizations which
deserve the term principle.
First off, there are publications concerning patent law. I covered
these in two previous, but in short the influential decision
of Paul E. Hoch, 428 F.2d 1341 (C.C.P.A. 1970)) from 1970
Such actual differences in properties are required to overcome a prima facie case of obviousness because the prima facie case, at least to a major extent, is based on the expectation that compounds which are very similar in structure will have similar properties.Second, there are previous publications with Johnson and/or Maggiora as (co-)authors:
- Mathl Comput. Modelling (1988), Johnson et al., doi:10.1016/0895-7177(88)90569-9:
Most attempts at quantifying the notion of molecular similarity have been directed at the problem of predicting chemical properties based on what we shall term the structure-property similarity principle. This principle is embodied in the widely recognized statement that similar structures generally have similar properties (Wilkins and Randic, 1979).
- Mathl Comput. Modelling (1988), Maggiora et al., doi:10.1016/0895-7177(88)90568-7:
Property response surfaces provide a useful means of depicting the behavior of various chemical and bological properties of molecules as a function of the descriptors used to define their location within an appropriate chemical-description space. A key assumption which must hold, at least approximately, for response surface methods to be viabble is that of
surface continuity. Surface continuity requires that property values for similar molecules be similar.
- J. Math. Chem (1989), Johnson, doi:10.1007/BF01166045:
As an intuitive concept, molecular similarity has played a fundamental role in chemistry. It is implicit in Hammond's postulate, in the principle of minimum structure change, and in the assumption that similar structures tend to have similar properties.
Design of Molecules with Desired Properties. Papers (co-)authored by Randić make up a third set of earlier uses of the similarity principle:
- JCICS (1979), Randić et al., doi:10.1021/ci60017a009:
Since many molecular properties, and especially chemical or therapeutic activity, bear some relationship to chemical structure, studies of the similarity of structures, rather then properties, should be the first priority.
- JCICS (1984), Randić, doi:10.1021/ci00043a009:
Both are hardly relevant to such applications, because similar molecules frequently have similar properties; hence, having the same numerical characterization may even be desirable.
- Randić (1992) doi:10.1007/BF01164840
cites Randić Graph Theoretical Approach to Structure-Activity
Studies: Search for Optimal Antitumor Compounds in Molecular
Basis of Cancer (Part A: Macromolecular Structure, Carcinogens, and
Oncogens);Alan R. Liss: New York, 1985. as saying:
Structures or systems that differ little in the mathematical properties will differ little also in their physical, chemical, and biological properties.
- JCICS (1988), Randić, doi:10.1021/ci00059a004:
Because the above invariants are primarily used in structure-property correlations, the occurrence of structures having duplicate descriptors is not necessarily troublesome. Similar molecules may show similar properties! In fact, the expectation that similar compounds have similar properties follows from empirical observations that can be traced to the pioneering work on structure activity as reflected in Emil Fischer's
lock and key model12 for interaction of drugs and enzymes (receptors). Recently this experience has been formulated as one of the fundamental postulates in structure-activity relationships (SAR).13
- International Journal of Quantum Chemistry (1986),
Trinajstić et al., doi:10.1002/qua.560300762:
Statements as: similar compounds have similar properties, often heard, need to be scrutinized and quantified.(p732)
… whilst new renormalized
lawscan emerge. In order to illustrate our positions consider the following. A. Principle of Similarity [201,202]. Structures that have similar mathematical properties will show considerable similarity in their physical, chemical and biological properties.
- J. Comp. Chem (1987) , Herndon et al., doi:
Quantification of the concept of molecular similarity has obvious practical use in any study of how molecular properties are related to structure. A basic premise is that molecules with high degrees of structural homology will exhibit similar properties in both chemical and biological systems. This premise is explicit in pharmaceutical research1-5 where the principles of bioisosterism6 are used to design new compounds which simulate the pharmacological behavior of older, known compounds.(This paper, by the way, define similarity using the edit distance between two linearizations of the molecular structure.)
- AI Applications in Chemistry (1987), Bertz et al., doi:10.1021/bk-1986-0306.ch015:
The concept of the similarity of molecules has important ramifications for physical, chemical, and biological systems. Grunwald (7) has recently pointed out the constraints of molecular similarity on linear free energy relations and observed that(This paper extends Randić's work with linear paths to use all possible substructures.)
Their accuracy depends upon the quality of the molecular similarity.The use of quantitative structure-activity relationships (2-6) is based on the assumption that similar molecules have similar properties. Herein we present a general and rigorous definition of molecular structural similarity. Previous research in this field has usually been concerned with sequence comparisons of macromolecules, primarily proteins and nucleic acids (7-9). In addition, there have appeared a number of ad hoc definitions of molecular similarity (10-15), many of which are subsumed in the present work.
- JCICS (1985), Carhart et al., doi:10.1021/ci00046a002:
It is perhaps not surprising to find this degree of clustering of psychotropic activity, even among non-benzodiazepines, around diazepam; that is consistent with the expectation that similar structures will frequently show similar properties.
- Tetrahedron Computer Methodology (1988), Moock et al., doi:10.1016/0898-5529(88)90016-4:
The application of computers to structure-activity studies has generated a great deal of interest in methods for establishing the degree of similarity between chemical structures. The rationale for the use of such techniques is that structurally similar compounds often display similar properties.
- J. Molecular Structure (1988), Brinn, doi:10.1016/0022-2860(88)80243-6:
A good deal of chemistry
worksbecause similar molecules have similar properties. This is due to the fact that to a certain extent molecular fragments (atoms, functional groups) retain their characteristics within different molecules. In fact, one simplification that is not far from the truth is that chemical intuition is simply
knowingwhich compound to use as a basis of comparison when one wants to estimate an undetermined property of a given compound. This idea forms the basis of all quantitative calculational methods, from the presently out of style parachor  numbers to the various semiempirical and ab initio calculational methods for electronic and vibrational energies and wave functions.
- Chemical Senses (1989), Miyashita et al., doi:10.1093/chemse/14.6.781:
It is expected that similar molecules should show similar properties or activities. The basic idea of the SIMCA method is based on this concept. For structure-taste problems a chemical structure is represented by several physico-chemical variables which may be related to the taste response.
Randić? Or Johnson and Maggiora? … Or Hoch?
As you see, the observation
similar compounds should have similar
properties is not original to Johnson and Maggiora, and nor do
they claim it is. If you really need the first use of that sort of
phrase, see in re Hoch (1970) or Randić (1979 or 1984).
And yes, some people citing prefer the first use of a concept, rather than the use which popularizes it. Which is fine!
Otherwise, in cheminformatics we don't cite Hoch because it's a patent case with no applicability to an underlying point of the 1990 book, which is that we can automate definitions of molecular similarity and apply it to property prediction and optimization.
Randić's work used automated definitions of similarity, with a
focus on correlating graph invariants with molecular properties. This
is much more aligned with the 1990 book, and indeed Randić
wrote one of the chapters of the book. But his earlier work - which
includes the phrase
Principle of Similarity - doesn't tie the
concept together with other approaches to similiarty, which is likely
why most people don't cite Randić - even though in
cheminformatics he appears to be the first to use what is essentially
the modern phrase.
Carhart et al., and Herndon and Bertz, are other possible candidates as a precursor to Johnson and Maggiora (1990), but their treatment of the topic is, like Randić's, more focused on given approaches rather than the larger context, and after Randić.
From second- and third-hand accounts, what I've heard is that in the 1980s Johnson and Maggiora were key figures in a movement to consider similarity more rigorously, and make it more prominent.
And that's why I think their names, as editors of the book, are so often cited, even though others earlier made the same observation.
What is the correct name of the principle?
All that, and oddly, I still don't know what to call it. Different
people use different phrases. If I had to pick a name, I would follow
the lead of Maggiora et al.'s miniperspective and call it the
similarity property principle. But that's a clear miniority
I used Google Scholar to give me citation counts for different 5-year
periods, all of the form
maggiora "$PHRASE", resulting in the
We need to take those numbers with a big grain of salt (
salis, quoting Ugi et al.) because I didn't inspect each one. Some
are low-quality papers, and might have copied from Wikipedia. Some use
similarity principle but in a context where it's clear that
they mean something else, like the following two:
- Theor Chim Acta (1991), Harary et al., doi:
The approach proposed is also suitable for analyzing shape similarity, illustrating the similarity principle suggested in :where
9. Mezey PG (1990) Three-dimensional Topological Aspects of Molecular Similarity. In Maggiora GM, Johnson MA (eds) Concepts and Applications of Molecular Similarity. Wiley, New York. So this is specifically identifying Mezey's principle, which I believe is the one from p323, suggesting that
molecules of similar physical properties are expected to have similar bond skeletons.
- J. Math. Chem. (1992), Randić, doi:10.1007/BF01164840:
The importance of this concept for chemical application is clear if one recollects the universally accepted paradigm that similar molecules have similar properties. A recent book on "Concepts and Applications of Molecular Similarity" offers a fair introduction to the topic .It makes sense that Randić doesn't attribute that quote to that book! Randić also suggests an alternative formation for the similarity principle:
Structures or systems that differ little in the mathematical invariant properties will differ little also in their physical, chemical, and biological properties
Here's another alternative formulation:
- JCICS (1998), Willett et al. doi: 10.1021/ci9800211:
Following earlier work by Adamson and Bush,36 Willett and Winterman37 compared the performance of a range of similarity and distance coefficients by the extent to which they obeyed the similar property principle of Johnson and Maggiora;23 specifically, they assessed the effectiveness of a coefficient by the extent to which it was able to predict correctly a compound's measured property or activity value as the value of the most similar compound in the same dataset.
similar property principleis more often used by people who went to Sheffield, but I haven't looked at the distribution of authors.)
And still others cite Johnson and Maggiora in general for the book's impact (citation 1 in the below quote), and Randić (citation 15 in the below quote) for the earlier scientific publication with the specific phrasing - which I think is perfectly reasonable:
- Croatica Chemica Acta (1998), Podlipnik et al., Hrčak ID: 132379:
Molecular similarity studies have become the focus of intense scientific interest in recent years.1,2 … In drug design, similarity/dissimilarity based methods have been very useful in rational selection of candidates from large databases.11-14 The use of molecular similarity methods is based on the structure-property similarity principle.15.
Take home message?
I don't have one.
I started this essay 10 days ago to point out the oddity that Johnson and Maggiora's 1990 book didn't quite contain the succinct name and phrasing now associated with it. It took a long time for me to get there because the basic similarity principle has been around since the 1800s. I had to show that while the principle appears in earlier contexts (especially patent law), they weren't really the same, as those earlier contexts depend on human judgment for the whole process, while the similarity movement in the 1980s was based on using automated methods to help with property prediction and optimization.
Even then, it seems Randić deserves some credit that is
overshadowed by the ease of saying
Johnson and Maggiora (eds.)
(1990) and the comprehensiveness of that book. But enough to get
people to start citing him instead? I don't know. Probably
not. And probably they shouldn't?
The Journal of Cheminformatics recently
adopted the Citation
Typing Ontology. I'm not even sure what to use as the ontology for
a citation to Johnson and Maggiora. Perhaps
cites as authority?
The citing entity cites the cited entity as one that
provides an authoritative description or definition of the subject
under discussion. But it's not authoritative about the name of the
cites as recommended reading? Or
credits, which is
The citing entity acknowledges
contributions made by the cited entity?
I'm going to go back to talking about chemfp for a while. ;)
Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me
Copyright © 2001-2020 Andrew Dalke Scientific AB