Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2020/09/08/patents_and_molecular_similarity

Patents and molecular similarity

I work in what I'll call algorithmic molecular similarity, where people use an algorithm to characterize if two molecules are similar. There are many such algorithms: 2D and 3D fingerprints, maximum common substructure, edit distance, LINGO, and shape similarity are the first ones that come to mind.

There is almost no overlap between those methods and legal molecular similarity, which includes patent law and drug control law. I know little about the topic, so don't trust what I write here in a court of law! In this essay I'll mostly copy&paste some quotes regarding patent law. The next essay does the same for drug control laws.

Patent law

The US allows patents on molecules with new and useful applications. But what does "new" mean? Chemists have been aware since at least the 1800s that similar molecules often have similar behaviors, and that certain changes are obvious to a "person having ordinary skill in the art", as the legal phrase goes. Few chemists will be surprised that changing an ethyl- to a methyl- probably results in a molecule with similar behavior as the old.

I'll quote from In Re Rita S. Jones, Michael T. Chirchirillo and Johnny L. Burns, 958 F.2d 347 (Fed. Cir. 1992):

The question of "structural similarity" in chemical patent cases has generated a body of patent law unto itself.[1] Particular types or categories of structural similarity without more have, in past cases, given rise to prima facie obviousness; see, e.g., In re Dillon, 919 F.2d 688, 692-94, 16 USPQ2d 1897, 1900-02 (Fed. Cir. 1990) (tri-orthoesters and tetra-orthoesters), cert. denied, --- U.S. ----, 111 S. Ct. 1682, 114 L. Ed. 2d 77 (1991); In re May, 574 F.2d 1082, 197 USPQ 601 (CCPA 1978) (stereoisomers); In re Wilder, 563 F.2d 457, 195 USPQ 426 (CCPA 1977) (adjacent homologs and structural isomers); In re Hoch, 428 F.2d 1341, 166 USPQ 406 (CCPA 1970) (acid and ethyl ester).
(That [1] is: Helmuth A. Wegner, "Prima Facie Obviousness of Chemical Compounds," 6 Am.Pat.L.Assoc.Q.J. 271 (1978).)

Patent law also recognizes that structural similarity does not always imply functional similarity. Here's a quote from the US Patent Office manual section 2143, Examples of Basic Requirements of a Prima Facie Case of Obviousness concerning the structural similarity between rabeprazole and lansoprazole:

Despite the significant similarity between the structures, the Federal Circuit did not find any reason to modify the lead compound. According to the Federal Circuit:
Obviousness based on structural similarity thus can be proved by identification of some motivation that would have led one of ordinary skill in the art to select and then modify a known compound (i.e. a lead compound) in a particular way to achieve the claimed compound. . . . In keeping with the flexible nature of the obviousness inquiry, the requisite motivation can come from any number of sources and need not necessarily be explicit in the art. Rather "it is sufficient to show that the claimed and prior art compounds possess a sufficiently close relationship . . . to create an expectation, in light of the totality of the prior art, that the new compound will have similar properties to the old." Id. at 1357, 87 USPQ2d at 1455. (citations omitted)
The prior art taught that introducing a fluorinated substituent was known to increase lipophilicity, so a skilled artisan would have expected that replacing the trifluoroethoxy substituent with a methoxypropoxy substituent would have reduced the lipophilicity of the compound. Thus, the prior art created the expectation that rabeprazole would be less useful than lansoprazole as a drug for treating stomach ulcers and related disorders because the proposed modification would have destroyed an advantageous property of the prior art compound. The compound was not obvious as argued by Teva because, upon consideration of all of the facts of the case, a person of ordinary skill in the art at the time of the invention would not have had a reason to modify lansoprazole so as to form rabeprazole.

I see no way to encode this sort of similarity into an algorithm outside of the dreams of strong AI.

Markush structures

I started this section with "The US allows patents on molecules with new and useful applications."

Actually, it's a bit broader than that - a chemical patent may include a Markush structure, where parts of the structure are defined not as a molecular structure but with a more generic description of a class of molecules. You can think of it as a core structure or scaffold with one or more attachment points, generally called R-groups. A Markush structure defines a pattern for each of those attachment points. This process may be recursive.

This sort of claim is allowed because it's often clear that a wide variety of related compounds may have roughly equal use, and it's not useful to have one molecule = one patent claim.

Markush was not the first to use this approach. As Helen Cooke explains in "A historical study of structures for communication of organic chemistry information prior to 1950":

Structures of this type were permitted after a ruling by the United States Patent Office, following the publication of a patent by Eugene Markush, which claimed a process for the manufacture of dyes that comprised coupling with a halogen- substituted pyrazolone, a diazotized unsulfonated material selected from the group consisting of aniline, homologues of aniline, and halogen substitution products of aniline. However, Markush's name was not associated with the Markush structure because his patent was the first to include structures of this type. There had already been patents with claims concerned with compounds defined from a selection of variable groups, but a patent examiner stubbornly rejected Markush's claims. Markush appealed, and the commissioner issued a published decision overruling the examiner and approving the claim format. Future applicants argued for patentability on the basis of this precedent, and Markush's name became thereafter attached to the format.

A Markush structure could describe an infinite number of structures. Any new patent must show how it's releated to prior art, so patent searches of Markush structures is important. But how do you tell if there's any overlap between two Markush claims?

Originally it was up to trained patent inspectors. Ray and Kirsch's publication of the first computer-based atomic-level substructure search was motivated by talking with the Patent Office. Their work then lead to HAYSTAQ, designed for use by the Patent Office and with some Markush support.

Most search systems before the 1980s used fragmentation codes for the basic Markush search, followed by manual inspection. Michael Lynch's group at Sheffield did a much more comprehensive study of the problem, leading to the Markush DARC and MARPAT systems in the 1980s. See Berks' Current state of the art of Markush topological search systems in volume 2 of the 2003 book "Handbook of Chemoinformatics" for more historical details. Berks points out that there are some "nasties" ("a term applied by Derwent to abusive Markush claims that were nearly impossible to index, and also very difficult to interpret") but suggests that the more "abusive" of these claims has gone down.

Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me

Copyright © 2001-2020 Andrew Dalke Scientific AB