The keys are strings and there is a sub that calculates their hash code.
Which of the following alternative hashes will work properly and which won't:
A.
B4X:
Private Sub CalcHash (key As String) As Int
Return key.Length
End Sub
B.
B4X:
Private Sub CalcHash (key As String) As Int
Return -5
End Sub
C.
B4X:
Private Sub CalcHash (key As String) As Int
Dim h As Int = Rnd(0, 1000)
For i = 0 To key.Length - 1
h = Bit.ShiftLeft(h, 5) + h + Asc(key.CharAt(i))
Next
Return h
End Sub
D.
B4X:
Private Sub CalcHash (key As String) As Int
key = key.ToLowerCase
Dim h As Int = 5338
For i = 0 To key.Length - 1
h = Bit.ShiftLeft(h, 5) + h + Asc(key.CharAt(i))
Next
Return h
End Sub
Other points I'd like a confirmation about:
an int is 32 bit
the Asc function returns a 32 bit value since it expects a unicode char (so no limits on how the string is formed)
a null string is automatically corrected to the string "null" (with or without quotation marks, I can't recall exactly)
the algorythm which will use the hash function accounts for collisions (I didn't look at the code yet)
Quiz: what will happen if we change CalcHash to always return 5 ? Will it work properly?
If CalcHash returns a non-unique "key", when you use the Set method you will replace an already existing key/value. However, this does not mean that the 4 routines are wrong, unless it is assumed that CalcHash must always return a unique value, never repeated.
Luca this is a yes or no question. For each one of the four methods you should say answer whether the custom Hash Table or Map will work properly or not.
A No
B No
C No
D Yes *
* Yes, just note that keys are case insensitive
* Yes as a pure guess, since I'm assuming that the calculation used is good enough to keep collisions rare as to make this usable. I don't have the math skills to proof that one way or the other, so I'm just guessing.
D:
H is a fixed number, therefore words of equal length, the final value of only H, not taking into account Asc (), will have equal values.
I mean that for Talk and Walk you would have a fixed value of H (always not considering Asc ()).
Asc() does not take into account the position of the letters, so for two words that contain the same letters (anagram) the hashes will be identical
(Team and Mate will have the same hash code).
The only fixed H value is the initial value. It constantly changes then in the loop. And at that point, each ASC value affects it and the position is important due to the multiplication (the shifting).