Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2010/12/18/cheminformatics_ngrams

What's the name of my field?

I say that I work in "cheminformatics." Others use "chemoinformatics". What's going on?

When I entered this field in the late 1990s I said "chemical informatics" and the main journal was the "Journal of Chemical Information and Computer Sciences." Before 1975 it was the "Journal of Chemical Documentation" and in 2005 it became "Journal of Chemical Information and Modeling". You also see the older term in older company names. Daylight's full corporate name, for example, is "Daylight Chemical Information Systems."

Cheminformatics? Chemoinformatics?

"Chemical informatics" is no longer so common. Which of the two newer names do people use? There's a "Journal of Cheminformatics" but no "Journal of Chemoinformatics." In 2006 some 100 scientist from 20 countries in Europe and North America wrote the "Obernai Declaration" (no longer online?) to define and promote the field of "chemoinformatics."

Let's take it to the masses. Molinspiration reports the result counts from Google:

Cheminformatics is now (December 2009) used about 2.5-times more frequently than chemoinformatics. In 2006 this ratio was 1.6, in 2007 1.5 and in 2008 1.9. So it looks like that the term cheminformatics is winning the race!
While this weighs in favor of cheminformatics, it's still somewhat suspect. Google's estimated counts are shaky and not meant as accurate numbers. I've done queries which are estimated to have thousands of hits, only to find that there's about 50. It's also possible that if PubChem and ChemSpider were to put the word "cheminformatics" on every page then it would seriously skew the total number of pages Google finds.

More specifically, if you search for "cheminformatics" you'll see "About 6,960,000 results". Try to go to item 900 and you'll get the message:

In order to show you the most relevant results, we have omitted some entries very similar to the 654 already displayed.
"Chemoinformatics" returns "About 96,500 results". Almost an order of magnitude less! But try going to the end of those and you'll see:
In order to show you the most relevant results, we have omitted some entries very similar to the 671 already displayed.
Shaky indeed!

The other day, Google Labs released the Books Ngram Viewer, which lets you view and compare the publish rate of word use over time.

This gives a different way to compare those three rather distinct terms. Here's the results:

You can clearly see that "chemical informatics" was in the lead when I started, with a time when "chemoinformatics" dominated, and now "cheminformatics" is pulling ahead. If only those graphs could be extended to the present!

These numbers likely have their own bias. I assume Google made no mistakes so the years and counts are correct. Book authors are more likely to have been in the field for a long time, so perhaps they collectively lag or lead the trends, or perhaps there are several prolific authors who simply prefer to use minority terminology. There's also likely few books involved, making for large error estimates, and I use Google's results as a proxy for the number of different books.

In other words, this pretty plot isn't confirmation, only data. But it's data which somewhat reinforces the belief that "cheminformatics" is the dominate term and it's data which corraborates my understanding of the history.

For more discussion or to provide your own input, see the Blue Obelisk/Shapado thread: Chemoinformatics or cheminformatics? Or just go a Google seach for "cheminformatics chemoinformatics" where it responds:

Did you mean: cheminformatics cheminformatics

Some other interesting terms:

Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me

Copyright © 2001-2013 Andrew Dalke Scientific AB