Home All Groups Group Topic Archive Search About

Binary representation of a double?

Author
31 Mar 2006 2:21 PM
vecozo@online.nospam
Hello,

We are writing an application in C# that finds roots of a function up to
machine precision. Therefore I have to compare the number of significant bits
in two double variables (the upper- and lowerbound of the interval containing
the root). Currently I am struggling with the binary representation of a
double variable.

I have written some test code and it seems that the binary representation of
a double is given by:

MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE

Where M denotes the mantissa, E the exponent and S the sign.

The location of the least significant bit (*) is seems to be given by:

MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE

And the most significant bit:

MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE

This would mean that the mantissa part is written as:

qrstuvwxy ijklmnop abcdefgh

Where y is the least significant bit and a the most significant bit. This is
kind of unfortunate, because we cannot simply use a bitwise shift operator to
determine if all but the least significant bit are equal. Could you confirm
if we have this bitwise structure correct?

Regards,
Martijn Kaag

We use test code similar to the following to obtain the bitwise
representation:
Show quote
> public string GetBitString (double d) {
>
> byte[] bytes = ToByte(d);
> string rval = “”;
> for (int b=0; b<8;b++) {
> rval += GetBitString (bytes[b]) + “ ”;
> }
> return rval;
> }
> public unsafe Byte[] ToByte ( double d)
> {
>
> *Byte[] byte;
> &byte = (*Byte[])&d;
> return byt;
>
> }
>
> public string GetBitString (byte b)
> {
> string rval = "";
>
>
> for (int i=0; i <8;i++)
> {
> if (b>=126){
> // alleen >=126 als meest significante bit in b = 1
> rval += "1";
> }
> else {
> rval += "0";
> }


______________________________
www.VECOZO.nl

Author
31 Mar 2006 2:27 PM
Egbert Nierop (MVP for IIS)
<vecozo@online.nospam> wrote in message
Show quote
news:89E20A1D-1E20-402E-AAFC-539BC90C53AE@microsoft.com...
> Hello,
>
> We are writing an application in C# that finds roots of a function up to
> machine precision. Therefore I have to compare the number of significant
> bits
> in two double variables (the upper- and lowerbound of the interval
> containing
> the root). Currently I am struggling with the binary representation of a
> double variable.
>
> I have written some test code and it seems that the binary representation
> of
> a double is given by:

Have you been looking at existing code? This seems something that somebody
else might have solved.

Regards

Author
31 Mar 2006 2:59 PM
vecozo@online.nospam
Dear Egbert,

Thanks for your response.

Yes I have been looking [a lot] around for answers before posting. The
question is not about code, it is about standards and the internal
representation of floating point numbers.

I sure some else has already solved this and probably everyone with an
academic degree in computing is able to answer this question. Unfortunatley,
I cannot.

Regards,
Martijn

Show quote
"Egbert Nierop (MVP for IIS)" wrote:

>
> <vecozo@online.nospam> wrote in message
> news:89E20A1D-1E20-402E-AAFC-539BC90C53AE@microsoft.com...
> > Hello,
> >
> > We are writing an application in C# that finds roots of a function up to
> > machine precision. Therefore I have to compare the number of significant
> > bits
> > in two double variables (the upper- and lowerbound of the interval
> > containing
> > the root). Currently I am struggling with the binary representation of a
> > double variable.
> >
> > I have written some test code and it seems that the binary representation
> > of
> > a double is given by:
>
> Have you been looking at existing code? This seems something that somebody
> else might have solved.
>
> Regards
>
> --
>
> http://www.nieropwebconsult.nl
>
>
Author
31 Mar 2006 4:08 PM
Michael Bray
=?Utf-8?B?dmVjb3pvQG9ubGluZS5ub3NwYW0=?= <vecozo@online.nospam> wrote in
Show quote
news:89E20A1D-1E20-402E-AAFC-539BC90C53AE@microsoft.com:

> The location of the least significant bit (*) is seems to be given by:
>
> MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
>
> And the most significant bit:
>
> MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE
>
> This would mean that the mantissa part is written as:
>
> qrstuvwxy ijklmnop abcdefgh
>
> Where y is the least significant bit and a the most significant bit.
> This is kind of unfortunate, because we cannot simply use a bitwise
> shift operator to determine if all but the least significant bit are
> equal. Could you confirm if we have this bitwise structure correct?
>

This is typical of little-endian systems.  I can't speak specifically for
the double byte format - I see no reason to doubt your analysis of what is
where.  But it should be simple to and/or/bitshift this into a number that
you can use more easily.

-mdb
Author
31 Mar 2006 6:21 PM
vecozo@online.nospam
Hi Michael,

Thanks for your response.

It would be simple if the binary representation would be

yxwvutsrq ponmlkji hgfedcba

instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could the
*double to a *long and use shift operators to determine number of different
digits.:

long a
long b

int numDiff = 0;

for (int i = 0; i < 53; i++) {

a = a << 1
b = b << 1

if ( a != b ) {
numDiff++
}
else {
return numDiff;
}

}

Unfortunatly this does not seem to be the case.

Regards
Martijn

______________________________
www.VECOZO.nl



Show quote
"Michael Bray" wrote:

> =?Utf-8?B?dmVjb3pvQG9ubGluZS5ub3NwYW0=?= <vecozo@online.nospam> wrote in
> news:89E20A1D-1E20-402E-AAFC-539BC90C53AE@microsoft.com:
>
> > The location of the least significant bit (*) is seems to be given by:
> >
> > MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
> >
> > And the most significant bit:
> >
> > MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE
> >
> > This would mean that the mantissa part is written as:
> >
> > qrstuvwxy ijklmnop abcdefgh
> >
> > Where y is the least significant bit and a the most significant bit.
> > This is kind of unfortunate, because we cannot simply use a bitwise
> > shift operator to determine if all but the least significant bit are
> > equal. Could you confirm if we have this bitwise structure correct?
> >
>
> This is typical of little-endian systems.  I can't speak specifically for
> the double byte format - I see no reason to doubt your analysis of what is
> where.  But it should be simple to and/or/bitshift this into a number that
> you can use more easily.
>
> -mdb
>
Author
31 Mar 2006 7:02 PM
Michael Bray
=?Utf-8?B?dmVjb3pvQG9ubGluZS5ub3NwYW0=?= <vecozo@online.nospam> wrote in
Show quote
news:BD02E603-7F1A-4BCC-B32C-878CFF0195D8@microsoft.com:

> Hi Michael,
>
> Thanks for your response.
>
> It would be simple if the binary representation would be
>
> yxwvutsrq ponmlkji hgfedcba
>
> instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could
> the *double to a *long and use shift operators to determine number of
> different digits.:
>
> long a
> long b
>
> int numDiff = 0;
>
> for (int i = 0; i < 53; i++) {
>
> a = a << 1
> b = b << 1
>
> if ( a != b ) {
> numDiff++
> }
> else {
> return numDiff;
> }
>
> }

If all you are trying to do is count the number of bits that are the
same or different between the two binary representations, you can use
BitConverter and BitArray classes to do this quite easily:

double pi = Math.PI;
double e = Math.E;

byte[] pib = BitConverter.GetBytes(pi);
byte[] eb = BitConverter.GetBytes(e);

BitArray bapi = new BitArray(pib);
BitArray baeb = new BitArray(eb);

for (int i = 0; i < bapi.Length; i++)
{
    Console.WriteLine(
                string.Format(
                        "PI.Bit({0}) = {1}; E.Bit({0}) = {2}; {3}",
                        i,
                        bapi[i] ? 0 : 1,
                        baeb[i] ? 0 : 1,
                        (bapi[i] == baeb[i]) ? "same" : "different"
                )
        );
}

If you don't want to compare all of the bits, adjust accordingly.

-mdb
Author
2 Apr 2006 6:19 PM
Martijn Kaag
Thanks Michael,

That is actually a more convenient way of handling those bits around. And..
It will help me to find the actual binary representation. To be posted here..
in a few days.

Martijn Kaag


______________________________
Martijn Kaag
VECOZO BV
www.vecozo.nl


Show quote
"Michael Bray" wrote:

> =?Utf-8?B?dmVjb3pvQG9ubGluZS5ub3NwYW0=?= <vecozo@online.nospam> wrote in
> news:BD02E603-7F1A-4BCC-B32C-878CFF0195D8@microsoft.com:
>
> > Hi Michael,
> >
> > Thanks for your response.
> >
> > It would be simple if the binary representation would be
> >
> > yxwvutsrq ponmlkji hgfedcba
> >
> > instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could
> > the *double to a *long and use shift operators to determine number of
> > different digits.:
> >
> > long a
> > long b
> >
> > int numDiff = 0;
> >
> > for (int i = 0; i < 53; i++) {
> >
> > a = a << 1
> > b = b << 1
> >
> > if ( a != b ) {
> > numDiff++
> > }
> > else {
> > return numDiff;
> > }
> >
> > }
>
> If all you are trying to do is count the number of bits that are the
> same or different between the two binary representations, you can use
> BitConverter and BitArray classes to do this quite easily:
>
> double pi = Math.PI;
> double e = Math.E;
>
> byte[] pib = BitConverter.GetBytes(pi);
> byte[] eb = BitConverter.GetBytes(e);
>
> BitArray bapi = new BitArray(pib);
> BitArray baeb = new BitArray(eb);
>
> for (int i = 0; i < bapi.Length; i++)
> {
>     Console.WriteLine(
>                 string.Format(
>                         "PI.Bit({0}) = {1}; E.Bit({0}) = {2}; {3}",
>                         i,
>                         bapi[i] ? 0 : 1,
>                         baeb[i] ? 0 : 1,
>                         (bapi[i] == baeb[i]) ? "same" : "different"
>                 )
>         );
> }
>
> If you don't want to compare all of the bits, adjust accordingly.
>
> -mdb
>
Author
5 Apr 2006 10:36 PM
Thomee Wright
"vecozo@online.nospam" wrote:

> It would be simple if the binary representation would be
>
> yxwvutsrq ponmlkji hgfedcba
>
> instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could the
> *double to a *long and use shift operators to determine number of different
> digits.:
> ...
> Unfortunatly this does not seem to be the case.

Actually, this isn't a problem, and casting from a *double to a *long will
do exactly what you want.  The bytes appear to be "backwards" when you look
at the double byte by byte because of Intel endian conventions, but you have
to keep in mind that the representation of a long is similarly "backwards". 
So if you cast to a long, the lowest order bit of the double's mantissa will
be in the lowest order bit of the long, and the highest order bit of the
exponent will be in the highest order bit of the long.

-Thomee-
Author
31 Mar 2006 9:14 PM
AMercer
> We are writing an application in C# that finds roots of a function up to
> machine precision. Therefore I have to compare the number of significant bits
> in two double variables (the upper- and lowerbound of the interval containing
> the root). Currently I am struggling with the binary representation of a
> double variable.

Double is documented to have 15-16 decimal digits of precision.  The number
of mantissa bits is 52 which yields 53 under the hidden bit storage format. 
With hidden bit, the mantissa is shifted left until there is a 1 bit in the
high order postion, and then that bit is discarded because there is no need
to store the bit if it is always a one bit.  2^52 is 10^15.65, and 2^53 is
10^15.95, hence 15-16 decimal digits of precision.

..Net defines Double.Epsilon, the smallest nonzero positive double value.  As
the relative error between two doubles approaches this Epsilon value,
computational differences between them vanish.  The relative error between x
and y is (x-y)/y or nearly equivalently (y-x)/x.  The purpose of the division
is to take the double's exponent out of the picture.

My recommendation is to test for relative error in the neighborhood of 10 ^
-15.
Author
2 Apr 2006 6:27 PM
Martijn Kaag
Thanx Amercer,

In our current implementation we test if the difference between the two
double is greater than:

        public static double GetLeastSignificantBit ( double d )
        {
            d = Math.Abs (d);
            int power2 = (int)Math.Floor(Math.Log (d)/Constants.Ln2);
            double bit = Math.Pow (2, power2-52);

            System.Diagnostics.Debug.Assert ((d + bit) != d);
            System.Diagnostics.Debug.Assert ((d + bit/4) == d);

            return bit;
        }

This comes very close to your recommendation. Thanks!

--
______________________________
Martijn Kaag
VECOZO BV
www.vecozo.nl


Show quote
"AMercer" wrote:

> > We are writing an application in C# that finds roots of a function up to
> > machine precision. Therefore I have to compare the number of significant bits
> > in two double variables (the upper- and lowerbound of the interval containing
> > the root). Currently I am struggling with the binary representation of a
> > double variable.
>
> Double is documented to have 15-16 decimal digits of precision.  The number
> of mantissa bits is 52 which yields 53 under the hidden bit storage format. 
> With hidden bit, the mantissa is shifted left until there is a 1 bit in the
> high order postion, and then that bit is discarded because there is no need
> to store the bit if it is always a one bit.  2^52 is 10^15.65, and 2^53 is
> 10^15.95, hence 15-16 decimal digits of precision.
>
> .Net defines Double.Epsilon, the smallest nonzero positive double value.  As
> the relative error between two doubles approaches this Epsilon value,
> computational differences between them vanish.  The relative error between x
> and y is (x-y)/y or nearly equivalently (y-x)/x.  The purpose of the division
> is to take the double's exponent out of the picture.
>
> My recommendation is to test for relative error in the neighborhood of 10 ^
> -15.
Author
1 Apr 2006 11:27 AM
José_Manuel_Agüero
Hello vecozo,

The binary representation of a double is, in fact (with the bytes ordered):

SEEEEEEE EEEEMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM

The first bit of the mantissa is omitted and is always 1 for normalized numbers.

The precision P of the representation of a double D is: D * 2 ^ -53  <=  P  <  D * 2 ^ -52.
If you take the exponent E, the precision is exactly 2 ^ (E - 53)

For more information, visit, for example:
IEEE Standard 754 Floating Point Numbers
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
What Every Computer Scientist Should Know About Floating-Point Arithmetic
http://docs.sun.com/source/806-3568/ncg_goldberg.html

Regards.


Show quote
<vecozo@online.nospam> escribió en el mensaje news:89E20A1D-1E20-402E-AAFC-539BC90C53AE@microsoft.com...
| Hello,
|
| We are writing an application in C# that finds roots of a function up to
| machine precision. Therefore I have to compare the number of significant bits
| in two double variables (the upper- and lowerbound of the interval containing
| the root). Currently I am struggling with the binary representation of a
| double variable.
|
| I have written some test code and it seems that the binary representation of
| a double is given by:
|
| MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
|
| Where M denotes the mantissa, E the exponent and S the sign.
|
| The location of the least significant bit (*) is seems to be given by:
|
| MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
|
| And the most significant bit:
|
| MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE
|
| This would mean that the mantissa part is written as:
|
| qrstuvwxy ijklmnop abcdefgh
|
| Where y is the least significant bit and a the most significant bit. This is
| kind of unfortunate, because we cannot simply use a bitwise shift operator to
| determine if all but the least significant bit are equal. Could you confirm
| if we have this bitwise structure correct?
|
| Regards,
| Martijn Kaag
|
| We use test code similar to the following to obtain the bitwise
| representation:
| > public string GetBitString (double d) {
| >
| > byte[] bytes = ToByte(d);
| > string rval = “”;
| > for (int b=0; b<8;b++) {
| > rval += GetBitString (bytes[b]) + “ ”;
| > }
| > return rval;
| > }
| > public unsafe Byte[] ToByte ( double d)
| > {
| >
| > *Byte[] byte;
| > &byte = (*Byte[])&d;
| > return byt;
| >
| > }
| >
| > public string GetBitString (byte b)
| > {
| > string rval = "";
| >
| >
| > for (int i=0; i <8;i++)
| > {
| > if (b>=126){
| > // alleen >=126 als meest significante bit in b = 1
| > rval += "1";
| > }
| > else {
| > rval += "0";
| > }
|
|
| ______________________________
| www.VECOZO.nl
Author
2 Apr 2006 6:32 PM
Martijn Kaag
Thanks José!

That is what my book on numerical mathematics told me as well. Unfortunatly
the real binary representation (on "little endian") systems differs from this
specification.

Hopefully I post the answer here in a few days.

Regards,
Martijn Kaag

Show quote
"José Manuel Agüero" wrote:

> Hello vecozo,
>
> The binary representation of a double is, in fact (with the bytes ordered):
>
> SEEEEEEE EEEEMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM
>
> The first bit of the mantissa is omitted and is always 1 for normalized numbers.
>
> The precision P of the representation of a double D is: D * 2 ^ -53  <=  P  <  D * 2 ^ -52.
> If you take the exponent E, the precision is exactly 2 ^ (E - 53)
>
> For more information, visit, for example:
> IEEE Standard 754 Floating Point Numbers
> http://steve.hollasch.net/cgindex/coding/ieeefloat.html
> What Every Computer Scientist Should Know About Floating-Point Arithmetic
> http://docs.sun.com/source/806-3568/ncg_goldberg.html
>
> Regards.
>
>
> <vecozo@online.nospam> escribió en el mensaje news:89E20A1D-1E20-402E-AAFC-539BC90C53AE@microsoft.com...
> | Hello,
> |
> | We are writing an application in C# that finds roots of a function up to
> | machine precision. Therefore I have to compare the number of significant bits
> | in two double variables (the upper- and lowerbound of the interval containing
> | the root). Currently I am struggling with the binary representation of a
> | double variable.
> |
> | I have written some test code and it seems that the binary representation of
> | a double is given by:
> |
> | MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
> |
> | Where M denotes the mantissa, E the exponent and S the sign.
> |
> | The location of the least significant bit (*) is seems to be given by:
> |
> | MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
> |
> | And the most significant bit:
> |
> | MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE
> |
> | This would mean that the mantissa part is written as:
> |
> | qrstuvwxy ijklmnop abcdefgh
> |
> | Where y is the least significant bit and a the most significant bit. This is
> | kind of unfortunate, because we cannot simply use a bitwise shift operator to
> | determine if all but the least significant bit are equal. Could you confirm
> | if we have this bitwise structure correct?
> |
> | Regards,
> | Martijn Kaag
> |
> | We use test code similar to the following to obtain the bitwise
> | representation:
> | > public string GetBitString (double d) {
> | >
> | > byte[] bytes = ToByte(d);
> | > string rval = “”;
> | > for (int b=0; b<8;b++) {
> | > rval += GetBitString (bytes[b]) + “ ”;
> | > }
> | > return rval;
> | > }
> | > public unsafe Byte[] ToByte ( double d)
> | > {
> | >
> | > *Byte[] byte;
> | > &byte = (*Byte[])&d;
> | > return byt;
> | >
> | > }
> | >
> | > public string GetBitString (byte b)
> | > {
> | > string rval = "";
> | >
> | >
> | > for (int i=0; i <8;i++)
> | > {
> | > if (b>=126){
> | > // alleen >=126 als meest significante bit in b = 1
> | > rval += "1";
> | > }
> | > else {
> | > rval += "0";
> | > }
> |
> |
> | ______________________________
> | www.VECOZO.nl
>
>

AddThis Social Bookmark Button