Check for duplicates

moster67

Expert
Licensed User
Longtime User
I need to verify in a text-file (a word-list which might contain thousands of items) that there are no duplicates therein.

To verify if there are any duplicates, I am using the Hashtable-object from Agraham's Collection-library as follows:

B4X:
FileOpen(c2,"MyTextFile.txt",cRead)
      s=FileRead(c2)
      Do Until s=EOF
           If hash.ContainsKey(s) Then
         'take note of the key (word) and do something
                           s=FileRead(C2)
                     Else       
            hash.Add(s, strAt(s,0))
           End If
      s=FileRead(c2)
      Loop
      FileClose(c2)

Above works since the key must be unique and if I would try to add a duplicate key to the hashtable, then I would get an error.

Do you have any other suggestions, which are faster, to check for duplicates. I thought about loading the text file into two separate arrays and then check one array's words against the other array's words but I think that would be slower.

As mentioned above, I am talking about a lot of words, could be 80000-90000 items.

Any advice would be appreciated.

rgds,
moster67
 
Last edited:

Cableguy

Expert
Licensed User
Longtime User
Although I'm not familiar with it, I think regex would be faster....
 

moster67

Expert
Licensed User
Longtime User
Thank you.

I have heard a lot about RegEx but like yourself I am not familiar with it (to be honest it looks teriibly complicated).

So for the time being, I will probably stick to the Hashtable but I will keep your suggestion in mind.

rgds,
moster67


Although I'm not familiar with it, I think regex would be faster....
 
Top