|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Byte Array Comparison Not Accurate - MD5CryptoServiceProviderthe code to compare excel files. It works great for considerable changes but when the difference in the excel files is quite minor (for instance if I change only one or less than 10 cells in one file), the comparison fails to pick up the differences. Any thoughts? (Code below) Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load Dim abyt1() As Byte = {12, 55, 88, 32} Dim abyt2() As Byte = {12, 55, 88, 32} Dim fs As IO.FileStream fs = New IO.FileStream("File1.xls", IO.FileMode.Open) ReDim abyt1(fs.Length) fs = New IO.FileStream("File2.xls", IO.FileMode.Open) ReDim abyt2(fs.Length) Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String) System.Windows.Forms.MessageBox.Show(IsDifferent) End Sub Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As Byte) As Boolean Dim Hash1() As Byte = New MD5CryptoServiceProvider().ComputeHash(array1) Dim Hash2() As Byte = New MD5CryptoServiceProvider().ComputeHash(array2) For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1 If Hash1(i) <> Hash2(i) Then Return False Exit Function End If Next Return True End Function Have you also checked the length of those arrays ? (AFAIK they should be of
fixed size). If they can be of different size, here you'll consider that arrays are the same if the elements of the smaller one are the same than those of the bigger one on the length of the smaller one. Else it looks like you found a collision. The hash value is not the real thing so you could find distinct values giving the same hash value. Have you tried the direct approach ? Do you have a problem with this one ? (also keep in mind that you can begin by checking their size and that you have to perform a byte comparison only if they match in length). I would avoid these kind of hacks that tends to introduce subtle problems... What you are trying to do may also help (for example xcopy or robocy uses also the file timestamp). -- Patrice "Kaz" <K**@discussions.microsoft.com> a écrit dans le message de news: 765618F1-95EC-4FC2-AEA7-FDDE990B5***@microsoft.com...Show quote >I picked up the following code posted by a MVP at some newsgroup. I am >using > the code to compare excel files. It works great for considerable changes > but > when the difference in the excel files is quite minor (for instance if I > change only one or less than 10 cells in one file), the comparison fails > to > pick up the differences. Any thoughts? (Code below) > > Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As > System.EventArgs) Handles MyBase.Load > Dim abyt1() As Byte = {12, 55, 88, 32} > Dim abyt2() As Byte = {12, 55, 88, 32} > Dim fs As IO.FileStream > > fs = New IO.FileStream("File1.xls", IO.FileMode.Open) > ReDim abyt1(fs.Length) > > fs = New IO.FileStream("File2.xls", IO.FileMode.Open) > ReDim abyt2(fs.Length) > > Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String) > System.Windows.Forms.MessageBox.Show(IsDifferent) > > End Sub > > > > > Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As > Byte) > As Boolean > Dim Hash1() As Byte = New > MD5CryptoServiceProvider().ComputeHash(array1) > Dim Hash2() As Byte = New > MD5CryptoServiceProvider().ComputeHash(array2) > For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1 > If Hash1(i) <> Hash2(i) Then > Return False > Exit Function > End If > Next > Return True > End Function Also I suppose you omitted the code that reads the Excel file inside the
byte array ? I found since http://www.fastsum.com/. you could likely give it a try to see if you actually have two files with the same hash value... -- Patrice "Patrice" <scr***@chez.com> a écrit dans le message de news: Oo7D1NoeGHA.4***@TK2MSFTNGP05.phx.gbl...Show quote > Have you also checked the length of those arrays ? (AFAIK they should be > of fixed size). If they can be of different size, here you'll consider > that arrays are the same if the elements of the smaller one are the same > than those of the bigger one on the length of the smaller one. > > Else it looks like you found a collision. The hash value is not the real > thing so you could find distinct values giving the same hash value. > > Have you tried the direct approach ? Do you have a problem with this one ? > (also keep in mind that you can begin by checking their size and that you > have to perform a byte comparison only if they match in length). > > I would avoid these kind of hacks that tends to introduce subtle > problems... > > What you are trying to do may also help (for example xcopy or robocy uses > also the file timestamp). > > -- > Patrice > > "Kaz" <K**@discussions.microsoft.com> a écrit dans le message de news: > 765618F1-95EC-4FC2-AEA7-FDDE990B5***@microsoft.com... >>I picked up the following code posted by a MVP at some newsgroup. I am >>using >> the code to compare excel files. It works great for considerable changes >> but >> when the difference in the excel files is quite minor (for instance if I >> change only one or less than 10 cells in one file), the comparison fails >> to >> pick up the differences. Any thoughts? (Code below) >> >> Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As >> System.EventArgs) Handles MyBase.Load >> Dim abyt1() As Byte = {12, 55, 88, 32} >> Dim abyt2() As Byte = {12, 55, 88, 32} >> Dim fs As IO.FileStream >> >> fs = New IO.FileStream("File1.xls", IO.FileMode.Open) >> ReDim abyt1(fs.Length) >> >> fs = New IO.FileStream("File2.xls", IO.FileMode.Open) >> ReDim abyt2(fs.Length) >> >> Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String) >> System.Windows.Forms.MessageBox.Show(IsDifferent) >> >> End Sub >> >> >> >> >> Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As >> Byte) >> As Boolean >> Dim Hash1() As Byte = New >> MD5CryptoServiceProvider().ComputeHash(array1) >> Dim Hash2() As Byte = New >> MD5CryptoServiceProvider().ComputeHash(array2) >> For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1 >> If Hash1(i) <> Hash2(i) Then >> Return False >> Exit Function >> End If >> Next >> Return True >> End Function > > If you change just one bit in the file, you should get a hash code that
is completely different, so it sounds strange if you manage to change several bytes without getting a difference. Show the code that you are actually using instead. The code that you showed doesn't even read the files. Kaz wrote: Show quote > I picked up the following code posted by a MVP at some newsgroup. I am using > the code to compare excel files. It works great for considerable changes but > when the difference in the excel files is quite minor (for instance if I > change only one or less than 10 cells in one file), the comparison fails to > pick up the differences. Any thoughts? (Code below) > > Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As > System.EventArgs) Handles MyBase.Load > Dim abyt1() As Byte = {12, 55, 88, 32} > Dim abyt2() As Byte = {12, 55, 88, 32} > Dim fs As IO.FileStream > > fs = New IO.FileStream("File1.xls", IO.FileMode.Open) > ReDim abyt1(fs.Length) > > fs = New IO.FileStream("File2.xls", IO.FileMode.Open) > ReDim abyt2(fs.Length) > > Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String) > System.Windows.Forms.MessageBox.Show(IsDifferent) > > End Sub > > > > > Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As Byte) > As Boolean > Dim Hash1() As Byte = New > MD5CryptoServiceProvider().ComputeHash(array1) > Dim Hash2() As Byte = New > MD5CryptoServiceProvider().ComputeHash(array2) > For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1 > If Hash1(i) <> Hash2(i) Then > Return False > Exit Function > End If > Next > Return True > End Function Thanks for all your responses. Odd that I actually DID indeed include the
code to read in the files. Anyways, I managed to fix the problem...apparantly I wasnt reading the files into the arrays propertly. FOr the benefit of anyone with the same problem, here's the code that works. Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load Dim abyt1() As Byte Dim abyt2() As Byte Dim fs As IO.FileStream Dim fs1 As IO.FileStream fs = New IO.FileStream("File1.xls", IO.FileMode.Open) Dim reader1 As BinaryReader = New BinaryReader(fs) ReDim abyt1(fs.Length) Dim iCount1 As Integer = reader1.Read(abyt1, 0, fs.Length) fs1 = New IO.FileStream("File2.xls", IO.FileMode.Open) Dim reader2 As BinaryReader = New BinaryReader(fs1) ReDim abyt2(fs1.Length) Dim iCount2 As Integer = reader2.Read(abyt2, 0, fs1.Length) Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String) System.Windows.Forms.MessageBox.Show(IsDifferent) End Sub Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As Byte) As Boolean Dim Hash1() As Byte = New MD5CryptoServiceProvider().ComputeHash(array1) Dim Hash2() As Byte = New MD5CryptoServiceProvider().ComputeHash(array2) If Hash1.Length <> Hash2.Length Then Return False Else For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1 If Hash1(i) <> Hash2(i) Then Return False Exit Function End If Next End If Return True End Function Show quote "Göran Andersson" wrote: > If you change just one bit in the file, you should get a hash code that > is completely different, so it sounds strange if you manage to change > several bytes without getting a difference. > > Show the code that you are actually using instead. The code that you > showed doesn't even read the files. > > Kaz wrote: > > I picked up the following code posted by a MVP at some newsgroup. I am using > > the code to compare excel files. It works great for considerable changes but > > when the difference in the excel files is quite minor (for instance if I > > change only one or less than 10 cells in one file), the comparison fails to > > pick up the differences. Any thoughts? (Code below) > > > > Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As > > System.EventArgs) Handles MyBase.Load > > Dim abyt1() As Byte = {12, 55, 88, 32} > > Dim abyt2() As Byte = {12, 55, 88, 32} > > Dim fs As IO.FileStream > > > > fs = New IO.FileStream("File1.xls", IO.FileMode.Open) > > ReDim abyt1(fs.Length) > > > > fs = New IO.FileStream("File2.xls", IO.FileMode.Open) > > ReDim abyt2(fs.Length) > > > > Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String) > > System.Windows.Forms.MessageBox.Show(IsDifferent) > > > > End Sub > > > > > > > > > > Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As Byte) > > As Boolean > > Dim Hash1() As Byte = New > > MD5CryptoServiceProvider().ComputeHash(array1) > > Dim Hash2() As Byte = New > > MD5CryptoServiceProvider().ComputeHash(array2) > > For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1 > > If Hash1(i) <> Hash2(i) Then > > Return False > > Exit Function > > End If > > Next > > Return True > > End Function > |
|||||||||||||||||||||||