|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Binary representation of a double?We are writing an application in C# that finds roots of a function up to machine precision. Therefore I have to compare the number of significant bits in two double variables (the upper- and lowerbound of the interval containing the root). Currently I am struggling with the binary representation of a double variable. I have written some test code and it seems that the binary representation of a double is given by: MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE Where M denotes the mantissa, E the exponent and S the sign. The location of the least significant bit (*) is seems to be given by: MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE And the most significant bit: MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE This would mean that the mantissa part is written as: qrstuvwxy ijklmnop abcdefgh Where y is the least significant bit and a the most significant bit. This is kind of unfortunate, because we cannot simply use a bitwise shift operator to determine if all but the least significant bit are equal. Could you confirm if we have this bitwise structure correct? Regards, Martijn Kaag We use test code similar to the following to obtain the bitwise representation: Show quote > public string GetBitString (double d) { ______________________________> > byte[] bytes = ToByte(d); > string rval = “â€; > for (int b=0; b<8;b++) { > rval += GetBitString (bytes[b]) + “ â€; > } > return rval; > } > public unsafe Byte[] ToByte ( double d) > { > > *Byte[] byte; > &byte = (*Byte[])&d; > return byt; > > } > > public string GetBitString (byte b) > { > string rval = ""; > > > for (int i=0; i <8;i++) > { > if (b>=126){ > // alleen >=126 als meest significante bit in b = 1 > rval += "1"; > } > else { > rval += "0"; > } www.VECOZO.nl <vecozo@online.nospam> wrote in message
Show quote news:89E20A1D-1E20-402E-AAFC-539BC90C53AE@microsoft.com... Have you been looking at existing code? This seems something that somebody > Hello, > > We are writing an application in C# that finds roots of a function up to > machine precision. Therefore I have to compare the number of significant > bits > in two double variables (the upper- and lowerbound of the interval > containing > the root). Currently I am struggling with the binary representation of a > double variable. > > I have written some test code and it seems that the binary representation > of > a double is given by: else might have solved. Regards Dear Egbert,
Thanks for your response. Yes I have been looking [a lot] around for answers before posting. The question is not about code, it is about standards and the internal representation of floating point numbers. I sure some else has already solved this and probably everyone with an academic degree in computing is able to answer this question. Unfortunatley, I cannot. Regards, Martijn Show quote "Egbert Nierop (MVP for IIS)" wrote: > > <vecozo@online.nospam> wrote in message > news:89E20A1D-1E20-402E-AAFC-539BC90C53AE@microsoft.com... > > Hello, > > > > We are writing an application in C# that finds roots of a function up to > > machine precision. Therefore I have to compare the number of significant > > bits > > in two double variables (the upper- and lowerbound of the interval > > containing > > the root). Currently I am struggling with the binary representation of a > > double variable. > > > > I have written some test code and it seems that the binary representation > > of > > a double is given by: > > Have you been looking at existing code? This seems something that somebody > else might have solved. > > Regards > > -- > > http://www.nieropwebconsult.nl > > =?Utf-8?B?dmVjb3pvQG9ubGluZS5ub3NwYW0=?= <vecozo@online.nospam> wrote in
Show quote news:89E20A1D-1E20-402E-AAFC-539BC90C53AE@microsoft.com: This is typical of little-endian systems. I can't speak specifically for > The location of the least significant bit (*) is seems to be given by: > > MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE > > And the most significant bit: > > MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE > > This would mean that the mantissa part is written as: > > qrstuvwxy ijklmnop abcdefgh > > Where y is the least significant bit and a the most significant bit. > This is kind of unfortunate, because we cannot simply use a bitwise > shift operator to determine if all but the least significant bit are > equal. Could you confirm if we have this bitwise structure correct? > the double byte format - I see no reason to doubt your analysis of what is where. But it should be simple to and/or/bitshift this into a number that you can use more easily. -mdb Hi Michael,
Thanks for your response. It would be simple if the binary representation would be yxwvutsrq ponmlkji hgfedcba instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could the *double to a *long and use shift operators to determine number of different digits.: long a long b int numDiff = 0; for (int i = 0; i < 53; i++) { a = a << 1 b = b << 1 if ( a != b ) { numDiff++ } else { return numDiff; } } Unfortunatly this does not seem to be the case. Regards Martijn ______________________________ www.VECOZO.nl Show quote "Michael Bray" wrote: > =?Utf-8?B?dmVjb3pvQG9ubGluZS5ub3NwYW0=?= <vecozo@online.nospam> wrote in > news:89E20A1D-1E20-402E-AAFC-539BC90C53AE@microsoft.com: > > > The location of the least significant bit (*) is seems to be given by: > > > > MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE > > > > And the most significant bit: > > > > MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE > > > > This would mean that the mantissa part is written as: > > > > qrstuvwxy ijklmnop abcdefgh > > > > Where y is the least significant bit and a the most significant bit. > > This is kind of unfortunate, because we cannot simply use a bitwise > > shift operator to determine if all but the least significant bit are > > equal. Could you confirm if we have this bitwise structure correct? > > > > This is typical of little-endian systems. I can't speak specifically for > the double byte format - I see no reason to doubt your analysis of what is > where. But it should be simple to and/or/bitshift this into a number that > you can use more easily. > > -mdb > =?Utf-8?B?dmVjb3pvQG9ubGluZS5ub3NwYW0=?= <vecozo@online.nospam> wrote in
Show quote news:BD02E603-7F1A-4BCC-B32C-878CFF0195D8@microsoft.com: If all you are trying to do is count the number of bits that are the > Hi Michael, > > Thanks for your response. > > It would be simple if the binary representation would be > > yxwvutsrq ponmlkji hgfedcba > > instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could > the *double to a *long and use shift operators to determine number of > different digits.: > > long a > long b > > int numDiff = 0; > > for (int i = 0; i < 53; i++) { > > a = a << 1 > b = b << 1 > > if ( a != b ) { > numDiff++ > } > else { > return numDiff; > } > > } same or different between the two binary representations, you can use BitConverter and BitArray classes to do this quite easily: double pi = Math.PI; double e = Math.E; byte[] pib = BitConverter.GetBytes(pi); byte[] eb = BitConverter.GetBytes(e); BitArray bapi = new BitArray(pib); BitArray baeb = new BitArray(eb); for (int i = 0; i < bapi.Length; i++) { Console.WriteLine( string.Format( "PI.Bit({0}) = {1}; E.Bit({0}) = {2}; {3}", i, bapi[i] ? 0 : 1, baeb[i] ? 0 : 1, (bapi[i] == baeb[i]) ? "same" : "different" ) ); } If you don't want to compare all of the bits, adjust accordingly. -mdb Thanks Michael,
That is actually a more convenient way of handling those bits around. And.. It will help me to find the actual binary representation. To be posted here.. in a few days. Martijn Kaag ______________________________ Martijn Kaag VECOZO BV www.vecozo.nl Show quote "Michael Bray" wrote: > =?Utf-8?B?dmVjb3pvQG9ubGluZS5ub3NwYW0=?= <vecozo@online.nospam> wrote in > news:BD02E603-7F1A-4BCC-B32C-878CFF0195D8@microsoft.com: > > > Hi Michael, > > > > Thanks for your response. > > > > It would be simple if the binary representation would be > > > > yxwvutsrq ponmlkji hgfedcba > > > > instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could > > the *double to a *long and use shift operators to determine number of > > different digits.: > > > > long a > > long b > > > > int numDiff = 0; > > > > for (int i = 0; i < 53; i++) { > > > > a = a << 1 > > b = b << 1 > > > > if ( a != b ) { > > numDiff++ > > } > > else { > > return numDiff; > > } > > > > } > > If all you are trying to do is count the number of bits that are the > same or different between the two binary representations, you can use > BitConverter and BitArray classes to do this quite easily: > > double pi = Math.PI; > double e = Math.E; > > byte[] pib = BitConverter.GetBytes(pi); > byte[] eb = BitConverter.GetBytes(e); > > BitArray bapi = new BitArray(pib); > BitArray baeb = new BitArray(eb); > > for (int i = 0; i < bapi.Length; i++) > { > Console.WriteLine( > string.Format( > "PI.Bit({0}) = {1}; E.Bit({0}) = {2}; {3}", > i, > bapi[i] ? 0 : 1, > baeb[i] ? 0 : 1, > (bapi[i] == baeb[i]) ? "same" : "different" > ) > ); > } > > If you don't want to compare all of the bits, adjust accordingly. > > -mdb > "vecozo@online.nospam" wrote: Actually, this isn't a problem, and casting from a *double to a *long will > It would be simple if the binary representation would be > > yxwvutsrq ponmlkji hgfedcba > > instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could the > *double to a *long and use shift operators to determine number of different > digits.: > ... > Unfortunatly this does not seem to be the case. do exactly what you want. The bytes appear to be "backwards" when you look at the double byte by byte because of Intel endian conventions, but you have to keep in mind that the representation of a long is similarly "backwards". So if you cast to a long, the lowest order bit of the double's mantissa will be in the lowest order bit of the long, and the highest order bit of the exponent will be in the highest order bit of the long. -Thomee- > We are writing an application in C# that finds roots of a function up to Double is documented to have 15-16 decimal digits of precision. The number > machine precision. Therefore I have to compare the number of significant bits > in two double variables (the upper- and lowerbound of the interval containing > the root). Currently I am struggling with the binary representation of a > double variable. of mantissa bits is 52 which yields 53 under the hidden bit storage format. With hidden bit, the mantissa is shifted left until there is a 1 bit in the high order postion, and then that bit is discarded because there is no need to store the bit if it is always a one bit. 2^52 is 10^15.65, and 2^53 is 10^15.95, hence 15-16 decimal digits of precision. ..Net defines Double.Epsilon, the smallest nonzero positive double value. As the relative error between two doubles approaches this Epsilon value, computational differences between them vanish. The relative error between x and y is (x-y)/y or nearly equivalently (y-x)/x. The purpose of the division is to take the double's exponent out of the picture. My recommendation is to test for relative error in the neighborhood of 10 ^ -15. Thanx Amercer,
In our current implementation we test if the difference between the two double is greater than: public static double GetLeastSignificantBit ( double d ) { d = Math.Abs (d); int power2 = (int)Math.Floor(Math.Log (d)/Constants.Ln2); double bit = Math.Pow (2, power2-52); System.Diagnostics.Debug.Assert ((d + bit) != d); System.Diagnostics.Debug.Assert ((d + bit/4) == d); return bit; } This comes very close to your recommendation. Thanks! Show quote "AMercer" wrote: > > We are writing an application in C# that finds roots of a function up to > > machine precision. Therefore I have to compare the number of significant bits > > in two double variables (the upper- and lowerbound of the interval containing > > the root). Currently I am struggling with the binary representation of a > > double variable. > > Double is documented to have 15-16 decimal digits of precision. The number > of mantissa bits is 52 which yields 53 under the hidden bit storage format. > With hidden bit, the mantissa is shifted left until there is a 1 bit in the > high order postion, and then that bit is discarded because there is no need > to store the bit if it is always a one bit. 2^52 is 10^15.65, and 2^53 is > 10^15.95, hence 15-16 decimal digits of precision. > > .Net defines Double.Epsilon, the smallest nonzero positive double value. As > the relative error between two doubles approaches this Epsilon value, > computational differences between them vanish. The relative error between x > and y is (x-y)/y or nearly equivalently (y-x)/x. The purpose of the division > is to take the double's exponent out of the picture. > > My recommendation is to test for relative error in the neighborhood of 10 ^ > -15. Hello vecozo,
The binary representation of a double is, in fact (with the bytes ordered): SEEEEEEE EEEEMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM The first bit of the mantissa is omitted and is always 1 for normalized numbers. The precision P of the representation of a double D is: D * 2 ^ -53 <= P < D * 2 ^ -52. If you take the exponent E, the precision is exactly 2 ^ (E - 53) For more information, visit, for example: IEEE Standard 754 Floating Point Numbers http://steve.hollasch.net/cgindex/coding/ieeefloat.html What Every Computer Scientist Should Know About Floating-Point Arithmetic http://docs.sun.com/source/806-3568/ncg_goldberg.html Regards. Show quote <vecozo@online.nospam> escribió en el mensaje news:89E20A1D-1E20-402E-AAFC-539BC90C53AE@microsoft.com... | Hello, | | We are writing an application in C# that finds roots of a function up to | machine precision. Therefore I have to compare the number of significant bits | in two double variables (the upper- and lowerbound of the interval containing | the root). Currently I am struggling with the binary representation of a | double variable. | | I have written some test code and it seems that the binary representation of | a double is given by: | | MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE | | Where M denotes the mantissa, E the exponent and S the sign. | | The location of the least significant bit (*) is seems to be given by: | | MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE | | And the most significant bit: | | MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE | | This would mean that the mantissa part is written as: | | qrstuvwxy ijklmnop abcdefgh | | Where y is the least significant bit and a the most significant bit. This is | kind of unfortunate, because we cannot simply use a bitwise shift operator to | determine if all but the least significant bit are equal. Could you confirm | if we have this bitwise structure correct? | | Regards, | Martijn Kaag | | We use test code similar to the following to obtain the bitwise | representation: | > public string GetBitString (double d) { | > | > byte[] bytes = ToByte(d); | > string rval = “â€; | > for (int b=0; b<8;b++) { | > rval += GetBitString (bytes[b]) + “ â€; | > } | > return rval; | > } | > public unsafe Byte[] ToByte ( double d) | > { | > | > *Byte[] byte; | > &byte = (*Byte[])&d; | > return byt; | > | > } | > | > public string GetBitString (byte b) | > { | > string rval = ""; | > | > | > for (int i=0; i <8;i++) | > { | > if (b>=126){ | > // alleen >=126 als meest significante bit in b = 1 | > rval += "1"; | > } | > else { | > rval += "0"; | > } | | | ______________________________ | www.VECOZO.nl Thanks José!
That is what my book on numerical mathematics told me as well. Unfortunatly the real binary representation (on "little endian") systems differs from this specification. Hopefully I post the answer here in a few days. Regards, Martijn Kaag Show quote "José Manuel Agüero" wrote: > Hello vecozo, > > The binary representation of a double is, in fact (with the bytes ordered): > > SEEEEEEE EEEEMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM > > The first bit of the mantissa is omitted and is always 1 for normalized numbers. > > The precision P of the representation of a double D is: D * 2 ^ -53 <= P < D * 2 ^ -52. > If you take the exponent E, the precision is exactly 2 ^ (E - 53) > > For more information, visit, for example: > IEEE Standard 754 Floating Point Numbers > http://steve.hollasch.net/cgindex/coding/ieeefloat.html > What Every Computer Scientist Should Know About Floating-Point Arithmetic > http://docs.sun.com/source/806-3568/ncg_goldberg.html > > Regards. > > > <vecozo@online.nospam> escribió en el mensaje news:89E20A1D-1E20-402E-AAFC-539BC90C53AE@microsoft.com... > | Hello, > | > | We are writing an application in C# that finds roots of a function up to > | machine precision. Therefore I have to compare the number of significant bits > | in two double variables (the upper- and lowerbound of the interval containing > | the root). Currently I am struggling with the binary representation of a > | double variable. > | > | I have written some test code and it seems that the binary representation of > | a double is given by: > | > | MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE > | > | Where M denotes the mantissa, E the exponent and S the sign. > | > | The location of the least significant bit (*) is seems to be given by: > | > | MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE > | > | And the most significant bit: > | > | MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE > | > | This would mean that the mantissa part is written as: > | > | qrstuvwxy ijklmnop abcdefgh > | > | Where y is the least significant bit and a the most significant bit. This is > | kind of unfortunate, because we cannot simply use a bitwise shift operator to > | determine if all but the least significant bit are equal. Could you confirm > | if we have this bitwise structure correct? > | > | Regards, > | Martijn Kaag > | > | We use test code similar to the following to obtain the bitwise > | representation: > | > public string GetBitString (double d) { > | > > | > byte[] bytes = ToByte(d); > | > string rval = “â€; > | > for (int b=0; b<8;b++) { > | > rval += GetBitString (bytes[b]) + “ â€; > | > } > | > return rval; > | > } > | > public unsafe Byte[] ToByte ( double d) > | > { > | > > | > *Byte[] byte; > | > &byte = (*Byte[])&d; > | > return byt; > | > > | > } > | > > | > public string GetBitString (byte b) > | > { > | > string rval = ""; > | > > | > > | > for (int i=0; i <8;i++) > | > { > | > if (b>=126){ > | > // alleen >=126 als meest significante bit in b = 1 > | > rval += "1"; > | > } > | > else { > | > rval += "0"; > | > } > | > | > | ______________________________ > | www.VECOZO.nl > > |
|||||||||||||||||||||||