Other [B4X] Quiz #7 - hash hash

Erel

B4X founder
Staff member
Licensed User
Longtime User
I've implemented a hash table / map as part of the "teachers examples": https://www.b4x.com/android/forum/threads/👩‍🏫-👨‍🏫-examples-for-teachers-algorithms.115894/post-724991

The keys are strings and there is a sub that calculates their hash code.

Which of the following alternative hashes will work properly and which won't:


A.
B4X:
Private Sub CalcHash (key As String) As Int
  Return key.Length
End Sub
B.
B4X:
Private Sub CalcHash (key As String) As Int
  Return -5
End Sub
C.
B4X:
Private Sub CalcHash (key As String) As Int
    Dim h As Int = Rnd(0, 1000)
    For i = 0 To key.Length - 1
        h = Bit.ShiftLeft(h, 5) + h + Asc(key.CharAt(i))
    Next
    Return h
End Sub
D.
B4X:
Private Sub CalcHash (key As String) As Int
key = key.ToLowerCase
  Dim h As Int = 5338
    For i = 0 To key.Length - 1
        h = Bit.ShiftLeft(h, 5) + h + Asc(key.CharAt(i))
    Next
    Return h
End Sub
 
Last edited:

sorex

Expert
Licensed User
Longtime User
that's what I wrote above, it depends what you want to do with the hash.
if you only care about single randomness it will do fine.
 
Upvote 0

udg

Expert
Licensed User
Longtime User
Cheating (looking at Erel's code for Hash table) I see it is based on the djb2 hash algorythm which uses 5381 (not 5388) as a starting point
Searching for djb2 leads to a few results, one being:
And this one shed some light about the magic numbers 33 e 5381
 
Upvote 0

sorex

Expert
Licensed User
Longtime User
as the initial post mentions it's just a map all will work since you don't need the hash to return the value of a key ;)
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
C is the only wrong one. It has no consistent output.
That's the correct answer.
10 points to @OliverA!
0.625 points to @Star-Dust!


I'll try to explain and I recommend you all to download the HashTable example. The code is simple and you will understand how Map works. Map is the number #1 collection :)

The Map collection is made of buckets. The pairs of keys and values are stored in these buckets.
Lets say that there are 10 buckets and the code is:
B4X:
Map.Put("Red", 0xffff0000)
Map.Put("Green", 0xff00ff00)
Map.Put("Blue", 0xff0000ff)
Log(Map.Get("Green"))
We want to get the value that is tied to the key "Green".
We need to find the bucket that stores this key.
First step is to calculate the hash number of "Green". Maybe 234908908.
The bucket number is found with:
B4X:
Dim BucketIndex As Int = Hash Mod NumberOfBuckets '234908908 Mod 10 = 8
In this case it will be 8.
So we go to bucket #8 and now we need to go over all the items in this bucket and compare the given key with their key:
B4X:
Dim Bucket8Items As List = Bucket8.Items
For Each kv As KeyAndValue In Bucket8Items
  If kv.Key = "Green" Then Return kv.Value
Next
Return "Not found"
Previously when we added "Green" to the Map the hash code was the same so it was stored in bucket #8.

Any consistent hash will work. For example if the hash function will always return 0 then all items will be stored in bucket #0. The downside of such hash function is that we will need to compare the key with all other keys each time. It will be slow and act like a List (O(n) complexity).
A good hash function is supposed to be consistent and to more or less have a uniform distribution. This way the number of items in each bucket will be minimal and putting or getting items from the buckets will be quick.
 
Upvote 0

LucaMs

Expert
Licensed User
Longtime User
The Map collection is made of buckets.
1 - what buckets are? 😮
2 - if I had that kind of map:
B4X:
Map.Put("Red", 0xffff0000)
Map.Put("Green", 0xff00ff00)
Map.Put("Blue", 0xff0000ff)
and...
We want to get the value that is tied to the key "Green".
...I wanted to... I simply use:
B4X:
Dim ColorValue As Int = MapGet("Green") ' <--- bad name Map  ^_^
as you know better than me, of course.

So I don't understand what are you talk about (but I will read all again, slower).
 
Upvote 0

udg

Expert
Licensed User
Longtime User
It's explained here:
Any consistent hash will work. For example if the hash function will always return 0 then all items will be stored in bucket #0.

Returning key length makes a lot of keys refer to the same bucket, but that's ok (not efficient, but ok)
Returning a costant (-5 in the example) leads to the same explanation. BTW in HastTable code it goes under the Abs function
C is wrong due to the Random function. That means that even if you try the hash twice with the same key you risk to have back two different hash values and that's no acceptable (think about what could happen if for insert you receive 5 and for a search you receive 7)
 
Upvote 0

LucaMs

Expert
Licensed User
Longtime User
It's explained here:
Any consistent hash will work. For example if the hash function will always return 0 then all items will be stored in bucket #0.
WHAT?

You would have a single value saved, the last added.
 
Upvote 0

LucaMs

Expert
Licensed User
Longtime User
Most likely, almost certainly, you are not talking about the B4A & B4J Maps.

Or you all are crazy (more than me) 😅
 
Last edited:
Upvote 0

CaptKronos

Active Member
Licensed User
Hi Lucas, what, I think, you are missing is that each bucket can contain more than one value. A value added to a bucket does not overwrite anything already in the bucket but is just added to what the bucket is already holding..
 
Upvote 0

agraham

Expert
Licensed User
Longtime User
Slightly tangential comment but it is interesting that both Java and .NET compute a hash code for each object whenever an object is instantiated and saves it to be returned by a method of that object. In Java it is AnObject.hashCode() and in .NET it is AnObject.GetHashCode(). Having this pre-computed hash code saves computation time whenever a hash code for an object is needed, like in a Map.
 
Upvote 0

sorex

Expert
Licensed User
Longtime User
That's the correct answer.

I still disagree here with my "depends on the purpose" :)

if you know it's a new item then you check if the hash already exists and you generate another one if it did.
so the random value is no issue at all.
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
The correct answer is:
The hash table will work correctly with A, B and D.
It will not work at all with C (the random hash).

if you know it's a new item then you check if the hash already exists and you generate another one if it did.
so the random value is no issue at all.
Not correct. The hash table will not work. You will not be able to get anything from the collection.
 
Upvote 0

sorex

Expert
Licensed User
Longtime User
it would in a normal map situation and you didn't mention you went for this bucket method.

you need to check and download that other post/file to figure this out.

is there a real life use for this as you said it's slower in the end?
 
Upvote 0

Erel

B4X founder
Staff member
Licensed User
Longtime User
The question was a theoretical question about hash tables (Map = hash table).
it would in a normal map situation and you didn't mention you went for this bucket method.
All hash tables use such buckets. Including Android or Java Map.

is there a real life use for this as you said it's slower in the end?
There is no "this" here. I think that developers should understand how hash tables work. It will help them write better code.
 
Upvote 0

sorex

Expert
Licensed User
Longtime User
I re-read your post #25.

Is it the speed gain by searching on INT instead of STRINGs that this method is being used?

Also the mod trick splits up lookup loops in large tables/maps which could also be a speed gain if it is all split up in several buckets.

On some algos you read/see the usage of hashing but I didn't know it was working like this.
 
Upvote 0

sorex

Expert
Licensed User
Longtime User
I was refering to how (hash)maps work under the hood tho and their possible speed gain logic.
 
Upvote 0

advansis

Active Member
Licensed User
Longtime User
A,B,D Will work, buy are not efficient.
C Will not work.
A uses few buckets, B only one (thus each bucket will become very long).
C will not work because each time the calchash function starts with a random Number.
D is inefficient because the base number 5338 is not prime.
 
Upvote 0
Top