Here is a small programming brain teaser for the weekend:
uf
is an unsigned integer with 64 bits that holds the IEEE-754 representation for a binary floating point number of that size.
The questions are:
uf
represents an integer number?
2. How to serialize the absolute value of such an integer number in the minimum number of bytes possible, using big-endian ordering and the 8th bit as a continuation flag? For example, float64(1<<70 + 3<<21)
serializes as:
"\x81\x80\x80\x80\x80\x80\x80\x83\x80\x80\x00"
The background for this problem is that the current draft of the strepr specification mentions that serialization. Some languages, such as Python and Ruby, implement transparent arbitrary precision integers, and that makes implementing the specification easier.
For example, here is a simple Python interactive session that arrives at the result provided above exploring the native integer representation.
>>> f = float((1<<70) + (3<<21)) >>> v = int(f) >>> l = [v&0x7f] >>> v >>= 7 >>> while v > 0: ... l.append(0x80 | (v&0x7f)) ... v >>= 7 ... >>> l.reverse() >>> "".join("%02x" % i for i in l) '8180808080808083808000'
Python makes the procedure simpler because it is internally converting the float into an integer of appropriate precision via standard C functions, and then offering bit operations on the resulting value.
The suggested brain teaser can be efficiently solved using just the IEEE-754 representation, though, and it’s relatively easy because the problem is being constrained to the integer space.
A link to an implementation will be provided next week.
UPDATE: The logic is now available as part of the reference implementation of strepr.
If the double does represent an integer, will it be non-negative? If not, should the variable-length encoding use the absolute value?
> If not, should the variable-length encoding use the absolute value?
Right. Note that point 2 does mention it’s the absolute value of the number.
According to here: https://en.wikipedia.org/wiki/Double_precision_floating-point_format
You are guaranteed that if there are less than 15 significant digits in the decimal, you can convert it exactly. So a few things come to mind:
1) Check for special numbers, 0, NaN, +-infinity
2) Check if the exponent is such that the value is fractional. (eg if the exponent is 2^-500 you know that the value is not going to fit in an int.) I’m not sure on the exact cutoff here.
3) Check the mantissa for the last bit set, and compare it to the exponent. eg 2 is encoded as: 4000 0000 0000 0000, there are no bits set in the mantissa and the exponent is 1.
17 is encoded as: 4031 0000 0000 0000
Which gives 0x403-1023 = 4 as the exponent, but the only bit set is in the first 4-bytes of the mantissa as the lowest bit. Which is bit 4 overall. 4-4 = 0, so the value is still a simple digit.
Vs 17.5 is: 4031 8000 0000 0000
You can see that the highest bit of the second 4-bits is set, which is at offset 5. And the exponent is only 4. 5 > 4 so it is a fractional number.
If I was writing code for this it would look something like:
[code moved to gist due to size and formatting]
https://gist.github.com/niemeyer/15c09e5d3a50a8c810ce
If you wanted to get extra tricky, you could use ‘if lastBitSet <= 15' and then just insert 0s until the exponent is low enough that the rest of the bits matter. (In the case of base128 encoding, you insert 0x80 into the stream).
That lets you encode larger-than-2**64 integers directly as long as they don't have too many significant digits. I don't know if that saves you anything, though, since it would be more bytes-on-the-wire than the IEE754 encoding.
So you might just want a 'exp < 52' sort of check.
@John: There are a few details regarding the mantissa and exponent, but that’s certainly the direction.
Post updated with link to implementation.
I looked at the solution and tried to break the answer down so I could understand it. Maybe others will benefit.
http://www.goinggo.net/2013/08/gustavos-ieee-754-brain-teaser.html