Floats and doubles

In this essay I describe how using floating-point, and especially how mixing float and double values, can cause subtle problems in doing a threshold search.

Suppose you have a function which takes two objects (call them "query" and "target") and returns a similarity score as a float or double.

score = compute_similarity(query, target)In my case, these are cheminformatics fingerprints. Given a set of targets you can ask how many targets are at least "threshold" similar to the given query:

num_targets = 0 for target in targets: if compute_similarity(query, target) >= threshold: num_targets += 1 # (This can be be written as: # num_targets = sum(1 for target in targets # if compute_similarity(query, target) > threshold) # but unless you are used to that style it's hard to understand.)Simple, right?

## float or double?

I have code which does this. Up until a few days ago it returned a double. However, the score is the fraction N/M where 0<=N<=M<=2**20. There's no need for 8 bytes of precision when a float is clearly enough. For some of the problem I'm dealing with, that would save a few hundred MB of space.

In theory I could even switch to fixed point, but it's easier to work with a 32-bit float than a 32-bit fixed point integer (much less a 20 bit fixed point integer!).

I converted all of my doubles to float, but my unit tests failed. After I fixed the places where I messed up on the size of allocated space, I still had errors.

One failure came down to a difference in converting "0.9" to double, and "0.9f" to float. Take a look at this simple program:

#includewhich outputs (with many more decimals than needed):#include main() { printf("0.9: float %.20f double %.20f\n", 0.9f, 0.9); printf("0.3: float %.20f double %.20f\n", 0.3f, 0.3); }

0.9: float 0.89999997615814208984 double 0.90000000000000002220 0.3: float 0.30000001192092895508 double 0.29999999999999998890The test calls out to C for the score calculations, but does the threshold tests in Python. Python's floating-point type is based on a C double. When the C code returns ((float)9)/10, the Python/C interface converts that into the Python value 0.89999... . The threshold test compares it to against 0.900000... and rejects it for being too low. It's only low by a smidgeon, but still, too low.

## A "smidgeon"

A quick check of all the rational values in my range shows that about 1/2 the time the float value is less than the double, and about 1/2 the time it's the other way. Only about 0.5% of the time are they identical values.

The fundamental problem is that I'm mixing float and double values. If I use only floats or only doubles then this wouldn't be a problem.

One solution I came up with is to lower the input threshold value by a little bit. That is, if the user specifies "0.9", then treat it as the value "0.899999". I know my denominator is at most 2**20 so I could do:

def adjust_input(threshold): if threshold >= 0.0: return threshold - 0.5/2**20 return threshold

That "0.5/2**20" is somewhat ad hoc. Even better, I could use the C99 function "nextafter" ("_nextafter" in Visual C) and lower the floating point value by a ulp, which is the smallest possible value. (Python doesn't implement nextafter in the core libraries but there are other solutions.)

## Possible side effects

This solution works for the "0.9" case. What are the side effects?

For one, about 50% of the time I will lower a threshold value which is already low enough. This effect will only be visible to people who those who specify an abnormally high number of decimals in the first place.

For another, my code is available as a library, and the parameter takes a double. Do I assume that the input has already been nudged down, or do I apply the nextafter to all of the inputs? I think I need to apply it to all of the inputs.

What about the return values? I expect that users of the library will expect the following to never raise an exception:

for hit in threshold_search(query, targets, threshold=0.9): if hit.score < threshold: raise AssertionErrorThis will fail 1/2 the time because the float value I return from C is too small, compared to the double. I could make that invariant work by using nextafter to increase the returned values by a ulp. I don't think that will cause any problems.

That's a lot of "I think"s. I don't yet have any library users to get feedback. The performance doesn't seem to be affected by using float or double, and I'm not running into memory limitations. I'm therefore going to revert all of my changes and keep things as doubles.

## String to float conversion

The mishap occurs because the decimal values 0.9 cannot be written exactly as float or double, which both use base 2. The conversion function (strtod or equivalent) instead returns binary value which is closest to the input decimal number. Similarly, the numbers shown above are the 20-digit decimal numbers closest to the that binary number. This number can be slightly smaller, equal to, or slightly larger than the original input.

You can see this in action, by using an "abnormally high number of decimals." In the following the score between the "nine" case and the "ten" case is exactly 9/10. You can see that the treshold of 0.900000005 and still allows scores of 0.9. This is because 0.900000005 and 0.9 are converted to the same internal number, and because this is my experimental branch which uses floats instead of doubles. If I increase the threshold to 0.900000006 then they are no longer the same, and the threshold test fails.

% cat x.fps FF01 nine FF03 ten % simsearch --queries x.fps x.fps --threshold 0.900000005 #Simsearch/1 #num_bits=16 #type=Tanimoto k=all threshold=0.900000005 #software=chemfp/1.1a1 #queries=x.fps #targets=x.fps 2 nine nine 1.000 ten 0.900 2 ten nine 0.900 ten 1.000 % simsearch --queries x.fps x.fps --threshold 0.900000006 #Simsearch/1 #num_bits=16 #type=Tanimoto k=all threshold=0.900000006 #software=chemfp/1.1a1 #queries=x.fps #targets=x.fps 1 nine nine 1.000 1 ten ten 1.000

Let me say that this is not a serious problem. Other than people like me, who use values like this to probe the internal workings of software, I'm hard-pressed to think of who might be affected by this.

Nonetheless, it's still an intellectual sticking point. If for some reason you specify a threshold of 0.900000004 then I really don't want my software to return scores which I know are 9/10. How can I solve this problem in the abstract?

I've come up with two solutions. In the first, adjust the input (using decimal math) so it's more than 1 ulp away from any rational value which could exist as a score. Easy to say, but I'm not sure how to do that.

In the second, read the input into a rational (like Python's fractions module). Pass the numerator and denominator into the search functions, instead of passing in a floating point value.

My search code score computes the score based on the ratio of two integers, so instead of doing:

# old version def count_targets(query, targets, threshold): num_targets = 0 for target in targets: if compute_similarity(query, target) >= threshold: num_targets += 1 return num_targetsI can do

# new version def count_targets(query, targets, threshold_numerator, threshold_denominator): num_targets = 0 for target in targets: score_numerator, score_numerator = compute_similarity(query, target) if ( score_numerator * threshold_denominator >= threshold_denominator_numerator * score_denominator): num_targets += 1 return num_targetsThis will work, but who wants to use an API where you pass in the threshold as the numerator and denominator, rather than a single threshold value?

So yes, I'm going to stay with doubles throughout my entire system. A consistent floating point representation just makes life easier.

Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me

Copyright © 2001-2013 Andrew Dalke Scientific AB