See Assignment #1 for the instructions of how to submit this assignment. The short version is to send me a tar or zip archive of a directory named "assignment5" with answers in the README file. You will also include at least one other file with your answers.
You will use the following SMILES data set (listing various drugs) to answer some of the questions:
N12CCC36C1CC(C(C2)=CCOC4CC5=O)C4C3N5c7ccccc76 Strychnine c1ccccc1C(=O)OC2CC(N3C)CCC3C2C(=O)OC cocaine COc1cc2c(ccnc2cc1)C(O)C4CC(CC3)C(C=C)CN34 quinine OC(=O)C1CN(C)C2CC3=CCNc(ccc4)c3c4C2=C1 lyseric acid CCN(CC)C(=O)C1CN(C)C2CC3=CNc(ccc4)c3c4C2=C1 LSD C123C5C(O)C=CC2C(N(C)CC1)Cc(ccc4O)c3c4O5 morphine C123C5C(OC(=O)C)C=CC2C(N(C)CC1)Cc(ccc4OC(=O)C)c3c4O5 heroin c1ncccc1C1CCCN1C nicotine CN1C(=O)N(C)C(=O)C(N(C)C=N2)=C12 caffeine C1C(C)=C(C=CC(C)=CC=CC(C)=CCO)C(C)(C)C1 vitamin a
Here is a list of fingerprint rules. The given bit is set (meaning it's True/1) if the structure:
- contains two or more oxygens
- has a ring of size 5
- contains elements besides C, N, O, S or H
- has only 1 ring
- there is a linear subgraph of 10 or more non-hydrogens atoms
Some people have asked about the 5th bit. I'm looking for 10 atoms which are bonded in a row, without branches or returning to itself. That is, can you start at one atom and count out 10 atoms in a row without making a loop? The atoms may be in a cycle, it's the subgraph which cannot have a cycle.
- What is the SMARTS pattern for each rule?
What is the fingerprint for ...
- ... nicotine?
- ... caffeine?
- ... vitamin A?
Use OpenEye's depict matcher or Daylight's depict matcher to help answer these questions.
Write a Python function named fp_count which takes two bitstrings, represented as a string containing the characters "0" and "1" and returns the bit counts a, b, c, and d using the Daylight definitions:
a is the count of bits on in object A but not in object B. b is the count of bits on in object B but not in object A. c is the count of the bits on in both object A and object B. d is the count of the bits off in both object A and object B.The return value will be the 4-tuple of (a, b, c, d). Assuming I didn't make a mistake in my code, your code must be able to pass this test:
for (s1, s2, a, b, c, d) in ( ("0", "0", 0, 0, 0, 1), ("1", "0", 1, 0, 0, 0), ("0", "1", 0, 1, 0, 0), ("1", "1", 0, 0, 1, 0), ("01", "00", 1, 0, 0, 1), ("11", "00", 2, 0, 0, 0), ("00", "11", 0, 2, 0, 0), ("01", "11", 0, 1, 1, 0), ("1011001010101", "0101010011011", 4, 4, 3, 2), ): x = fp_count(s1, s2) if x != (a, b, c, d): raise AssertionError( (x, (a,b,c,d) ) )Put the function definition in the file "fp_search.py" and include the above code as a test function which is called when run from the command-line. (Use the if __name__ == "__main__": technique.)
Define two new functions in fp_search.py, "tanimoto" which computes the Tanimoto measure and "yule" which computes the Yule measure. The two functions will take the values a, b, c and d as input. Your functions should be defined like this:
def tanimoto(a, b, c, d): ... def yule(a, b, c, d): ...Be sure to add test code for these functions.
NOTE: By default in Python if you divide an integer by and integer you'll get an integer. In Python, 1/2 == 0. To make Python do what you expect, either convert enough the integers into a float (eg, float(1)/2) or place the following at the top of your file to make all divisions in the file work as you expect.
from __future__ import division
Given your bitstrings from part 1, what is the tanimoto similarity between:
- nicotine and caffeine?
- nicotine and vitamin A?
- caffeine and vitamin A?
OpenEye has a molecular 2D similarity demo page based on the Mesa implementation of the MACCS keys. Use it to do similarity searches of the above SMILES in order to answer the following two questions:
- What are the names and similarity scores for the 3 drugs most similar to CC1=C(C(CCC1)(C)C)C=CC(=CC=CC(=CCO)C)C?
- What are the names and similarity scores for the 3 drugs most similar to CN1CCC23C4C1CC5=C2C(=C(C=C5)OC)OC3C(C=C4)O?
- Use PubChem to find compounds which are more similar to each of the two searches. That is, do a PubChem similarity search and see if one of the the top hits, when run using the OpenEye&Mesa; comparison, is more similar than the drug from the above data base.
- Give me the URL of some place which defines each of the bits in the MACCS keys, either as code or in English.
Copyright © 2001-2013 Andrew Dalke Scientific AB