Home All Groups Group Topic Archive Search About

Byte Array Comparison Not Accurate - MD5CryptoServiceProvider

Author
18 May 2006 1:21 PM
Kaz
I picked up the following code posted by a MVP at some newsgroup. I am using
the code to compare excel files. It works great for considerable changes but
when the difference in the excel files is quite minor (for instance if I
change only one or less than 10 cells in one file), the comparison fails to
pick up the differences. Any thoughts? (Code below)

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles MyBase.Load
        Dim abyt1() As Byte = {12, 55, 88, 32}
        Dim abyt2() As Byte = {12, 55, 88, 32}
        Dim fs As IO.FileStream

        fs = New IO.FileStream("File1.xls", IO.FileMode.Open)
        ReDim abyt1(fs.Length)

        fs = New IO.FileStream("File2.xls", IO.FileMode.Open)
        ReDim abyt2(fs.Length)

        Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String)
        System.Windows.Forms.MessageBox.Show(IsDifferent)

    End Sub




    Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As Byte)
As Boolean
        Dim Hash1() As Byte = New
MD5CryptoServiceProvider().ComputeHash(array1)
        Dim Hash2() As Byte = New
MD5CryptoServiceProvider().ComputeHash(array2)
        For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1
            If Hash1(i) <> Hash2(i) Then
                Return False
                Exit Function
            End If
        Next
        Return True
    End Function

Author
18 May 2006 2:00 PM
Patrice
Have you also checked the length of those arrays ? (AFAIK they should be of
fixed size). If they can be of different size, here you'll consider that
arrays are the same if the elements of the smaller one are the same than
those of the bigger one on the length of the smaller one.

Else it looks like you found a collision. The hash value is not the real
thing so you could find distinct values giving the same hash value.

Have you tried the direct approach ? Do you have a problem with this one ?
(also keep in mind that you can begin by checking their size and that you
have to perform a byte comparison only if they match in length).

I would avoid these kind of hacks that tends to introduce subtle problems...

What you are trying to do may also help (for example xcopy or robocy uses
also the file timestamp).

--
Patrice

"Kaz" <K**@discussions.microsoft.com> a écrit dans le message de news:
765618F1-95EC-4FC2-AEA7-FDDE990B5***@microsoft.com...
Show quote
>I picked up the following code posted by a MVP at some newsgroup. I am
>using
> the code to compare excel files. It works great for considerable changes
> but
> when the difference in the excel files is quite minor (for instance if I
> change only one or less than 10 cells in one file), the comparison fails
> to
> pick up the differences. Any thoughts? (Code below)
>
>    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As
> System.EventArgs) Handles MyBase.Load
>        Dim abyt1() As Byte = {12, 55, 88, 32}
>        Dim abyt2() As Byte = {12, 55, 88, 32}
>        Dim fs As IO.FileStream
>
>        fs = New IO.FileStream("File1.xls", IO.FileMode.Open)
>        ReDim abyt1(fs.Length)
>
>        fs = New IO.FileStream("File2.xls", IO.FileMode.Open)
>        ReDim abyt2(fs.Length)
>
>        Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String)
>        System.Windows.Forms.MessageBox.Show(IsDifferent)
>
>    End Sub
>
>
>
>
>    Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As
> Byte)
> As Boolean
>        Dim Hash1() As Byte = New
> MD5CryptoServiceProvider().ComputeHash(array1)
>        Dim Hash2() As Byte = New
> MD5CryptoServiceProvider().ComputeHash(array2)
>        For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1
>            If Hash1(i) <> Hash2(i) Then
>                Return False
>                Exit Function
>            End If
>        Next
>        Return True
>    End Function
Author
18 May 2006 2:08 PM
Patrice
Also I suppose you omitted the code that reads the Excel file inside the
byte array ?

I found since http://www.fastsum.com/. you could likely give it a try to see
if you actually have two files with the same hash value...

--
Patrice

"Patrice" <scr***@chez.com> a écrit dans le message de news:
Oo7D1NoeGHA.4***@TK2MSFTNGP05.phx.gbl...
Show quote
> Have you also checked the length of those arrays ? (AFAIK they should be
> of fixed size). If they can be of different size, here you'll consider
> that arrays are the same if the elements of the smaller one are the same
> than those of the bigger one on the length of the smaller one.
>
> Else it looks like you found a collision. The hash value is not the real
> thing so you could find distinct values giving the same hash value.
>
> Have you tried the direct approach ? Do you have a problem with this one ?
> (also keep in mind that you can begin by checking their size and that you
> have to perform a byte comparison only if they match in length).
>
> I would avoid these kind of hacks that tends to introduce subtle
> problems...
>
> What you are trying to do may also help (for example xcopy or robocy uses
> also the file timestamp).
>
> --
> Patrice
>
> "Kaz" <K**@discussions.microsoft.com> a écrit dans le message de news:
> 765618F1-95EC-4FC2-AEA7-FDDE990B5***@microsoft.com...
>>I picked up the following code posted by a MVP at some newsgroup. I am
>>using
>> the code to compare excel files. It works great for considerable changes
>> but
>> when the difference in the excel files is quite minor (for instance if I
>> change only one or less than 10 cells in one file), the comparison fails
>> to
>> pick up the differences. Any thoughts? (Code below)
>>
>>    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As
>> System.EventArgs) Handles MyBase.Load
>>        Dim abyt1() As Byte = {12, 55, 88, 32}
>>        Dim abyt2() As Byte = {12, 55, 88, 32}
>>        Dim fs As IO.FileStream
>>
>>        fs = New IO.FileStream("File1.xls", IO.FileMode.Open)
>>        ReDim abyt1(fs.Length)
>>
>>        fs = New IO.FileStream("File2.xls", IO.FileMode.Open)
>>        ReDim abyt2(fs.Length)
>>
>>        Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String)
>>        System.Windows.Forms.MessageBox.Show(IsDifferent)
>>
>>    End Sub
>>
>>
>>
>>
>>    Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As
>> Byte)
>> As Boolean
>>        Dim Hash1() As Byte = New
>> MD5CryptoServiceProvider().ComputeHash(array1)
>>        Dim Hash2() As Byte = New
>> MD5CryptoServiceProvider().ComputeHash(array2)
>>        For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1
>>            If Hash1(i) <> Hash2(i) Then
>>                Return False
>>                Exit Function
>>            End If
>>        Next
>>        Return True
>>    End Function
>
>
Author
19 May 2006 4:44 AM
Göran Andersson
If you change just one bit in the file, you should get a hash code that
is completely different, so it sounds strange if you manage to change
several bytes without getting a difference.

Show the code that you are actually using instead. The code that you
showed doesn't even read the files.

Kaz wrote:
Show quote
> I picked up the following code posted by a MVP at some newsgroup. I am using
> the code to compare excel files. It works great for considerable changes but
> when the difference in the excel files is quite minor (for instance if I
> change only one or less than 10 cells in one file), the comparison fails to
> pick up the differences. Any thoughts? (Code below)
>
>     Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As
> System.EventArgs) Handles MyBase.Load
>         Dim abyt1() As Byte = {12, 55, 88, 32}
>         Dim abyt2() As Byte = {12, 55, 88, 32}
>         Dim fs As IO.FileStream
>
>         fs = New IO.FileStream("File1.xls", IO.FileMode.Open)
>         ReDim abyt1(fs.Length)
>
>         fs = New IO.FileStream("File2.xls", IO.FileMode.Open)
>         ReDim abyt2(fs.Length)
>
>         Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String)
>         System.Windows.Forms.MessageBox.Show(IsDifferent)
>
>     End Sub
>
>
>
>
>     Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As Byte)
> As Boolean
>         Dim Hash1() As Byte = New
> MD5CryptoServiceProvider().ComputeHash(array1)
>         Dim Hash2() As Byte = New
> MD5CryptoServiceProvider().ComputeHash(array2)
>         For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1
>             If Hash1(i) <> Hash2(i) Then
>                 Return False
>                 Exit Function
>             End If
>         Next
>         Return True
>     End Function
Author
19 May 2006 3:56 PM
Kaz
Thanks for all your responses. Odd that I actually DID indeed include the
code to read in the files. Anyways, I managed to fix the problem...apparantly
I wasnt reading the files into the arrays propertly. FOr the benefit of
anyone with the same problem, here's the code that works.

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles MyBase.Load


        Dim abyt1() As Byte
        Dim abyt2() As Byte
        Dim fs As IO.FileStream
        Dim fs1 As IO.FileStream

        fs = New IO.FileStream("File1.xls", IO.FileMode.Open)
        Dim reader1 As BinaryReader = New BinaryReader(fs)
        ReDim abyt1(fs.Length)
        Dim iCount1 As Integer = reader1.Read(abyt1, 0, fs.Length)

        fs1 = New IO.FileStream("File2.xls", IO.FileMode.Open)
        Dim reader2 As BinaryReader = New BinaryReader(fs1)
        ReDim abyt2(fs1.Length)
        Dim iCount2 As Integer = reader2.Read(abyt2, 0, fs1.Length)



        Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String)
        System.Windows.Forms.MessageBox.Show(IsDifferent)
    End Sub




    Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As Byte)
As Boolean
        Dim Hash1() As Byte = New
MD5CryptoServiceProvider().ComputeHash(array1)
        Dim Hash2() As Byte = New
MD5CryptoServiceProvider().ComputeHash(array2)

        If Hash1.Length <> Hash2.Length Then
            Return False
        Else
            For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1
                If Hash1(i) <> Hash2(i) Then
                    Return False
                    Exit Function
                End If
            Next
        End If
        Return True
    End Function


Show quote
"Göran Andersson" wrote:

> If you change just one bit in the file, you should get a hash code that
> is completely different, so it sounds strange if you manage to change
> several bytes without getting a difference.
>
> Show the code that you are actually using instead. The code that you
> showed doesn't even read the files.
>
> Kaz wrote:
> > I picked up the following code posted by a MVP at some newsgroup. I am using
> > the code to compare excel files. It works great for considerable changes but
> > when the difference in the excel files is quite minor (for instance if I
> > change only one or less than 10 cells in one file), the comparison fails to
> > pick up the differences. Any thoughts? (Code below)
> >
> >     Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As
> > System.EventArgs) Handles MyBase.Load
> >         Dim abyt1() As Byte = {12, 55, 88, 32}
> >         Dim abyt2() As Byte = {12, 55, 88, 32}
> >         Dim fs As IO.FileStream
> >
> >         fs = New IO.FileStream("File1.xls", IO.FileMode.Open)
> >         ReDim abyt1(fs.Length)
> >
> >         fs = New IO.FileStream("File2.xls", IO.FileMode.Open)
> >         ReDim abyt2(fs.Length)
> >
> >         Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String)
> >         System.Windows.Forms.MessageBox.Show(IsDifferent)
> >
> >     End Sub
> >
> >
> >
> >
> >     Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As Byte)
> > As Boolean
> >         Dim Hash1() As Byte = New
> > MD5CryptoServiceProvider().ComputeHash(array1)
> >         Dim Hash2() As Byte = New
> > MD5CryptoServiceProvider().ComputeHash(array2)
> >         For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1
> >             If Hash1(i) <> Hash2(i) Then
> >                 Return False
> >                 Exit Function
> >             End If
> >         Next
> >         Return True
> >     End Function
>

AddThis Social Bookmark Button