Android Code Snippet Check equality of two files

Skip to Entry #3 for a more efficient way (thank you Erel)

____

Edit: Code to compare two files: One slow way to compare two files:

B4X:
Sub FilesAreIdentical(strFile1 As String, strFile2 As String) As Boolean
    If  File.Size("", strFile1) <> File.Size("", strFile2) Then Return False
    Dim in1 As InputStream
    Dim in2 As InputStream
    in1 = File.OpenInput("", strFile1)
    in2 = File.OpenInput("", strFile2)
    Dim buffer1(File.Size("", strFile1)) As Byte
    Dim buffer2(File.Size("", strFile2)) As Byte
    in1.ReadBytes(buffer1, 0, buffer1.length)
    in2.ReadBytes(buffer2, 0, buffer2.length)
    Dim data1(buffer1.Length) As Byte
    Dim data2(buffer2.Length) As Byte
    Dim md As MessageDigest
    data1 = md.GetMessageDigest(buffer1, "MD5")
    data2 = md.GetMessageDigest(buffer2, "MD5")
    Dim Bconv As ByteConverter
    Return ( Bconv.HexFromBytes(data1) = Bconv.HexFromBytes(data2) )
End Sub

 
Last edited:

Erel

B4X founder
Staff member
Licensed User
Longtime User
I'm sorry to say but this is a very inefficient code. I wouldn't use it unless the files are small.

The best solution is to avoid reading the whole files. However a simple solution that still reads the whole files and is much more efficient:
B4X:
Sub FilesAreIdentical(strFile1 As String, strFile2 As String) As Boolean
    If  File.Size("", strFile1) <> File.Size("", strFile2) Then Return False
   Dim b1() As Byte = File.ReadBytes("", strFile1)
   Dim b2() As Byte = File.ReadBytes("", strFile2)
   For i = 0 To b1.Length - 1
    If b1(i) <> b2(i) Then Return False
   Next
   Return True
End Sub
 
Last edited:

LucaMs

Expert
Licensed User
Longtime User
File.ReadBytes

?

upload_2018-6-27_10-33-6.png
 

Erel

B4X founder
Staff member
Licensed User
Longtime User
If you are using an older version of B4A then you can replace File.ReadBytes with:
B4X:
Bit.InputStreamToBytes(File.OpenInput(...))

@fredo I will explain the three issues that I see in your code as they are quite common mistakes:

1.
B4X:
Dim data1(buffer1.Length) As Byte
Dim data2(buffer2.Length) As Byte
Dim md As MessageDigest
data1 = md.GetMessageDigest(buffer1, "MD5")
data2 = md.GetMessageDigest(buffer2, "MD5")
The first two lines above allocate two large arrays and then assign new arrays to the same variables. The previously allocated arrays are discarded. It should have been written like this:
B4X:
Dim data1() As Byte = md1.GetMessageDigest(buffer1, "MD5")
...

2. GetMessageDigest is not very helpful in this case. Internally it must go over all bytes and calculate the hash. So it will be slower than the code I posted above and worst, if the test must be 100% accurate, then it can only tell us whether the files are different.
It can be useful in some cases if you cache the result and later compare it to other files. Especially if you are fine with %99+ accuracy.

At this point I see that I went over your code too fast and missed the fact that you are testing the hash and not the file content itself. Sorry!!!
Your code is fine. Though the code I posted is better :)


3. Point is not relevant...
 

fredo

Well-Known Member
Licensed User
Longtime User
...I will explain...

Thanks, Erel, for taking the time to teach us about the efficiency of the code every so often.

This almost tempts me to post all my "Quick and dirty" functions of the last 3 years here...
Just kidding. It is certainly better to take the time to build more efficient code in the first place.
 
Top