Android Question [B4X] create a tiny hash/url

JohnC

Well-Known Member
Licensed User
Are you looking for a way to create a shorter checksum to verify the integrity of a larger string?

If so, you could do something like XOR each character of the Md5 one at a time to generate a single character, then add the original Md5 size to it to create a new shorter checksum:

dd4f2b2f700cfe4ba70ae84ba26a0e3d -> (each digit xored one by one)
"d" (first character in the MD5) xor with "d" (second character) = some other character, then xor that to "4" (the third character), etc. and lets say after xor'ing all the characters in the MD5, the resulting character is a "G".

"G", then add 32 (the length of the original Md5 string) to it to produce "G32".

So you reduced the MD5 down to three characters and since it can be repeated, you can use it to do a quick and dirty checksum that would be fairly reliable (but no were near as reliable as the MD5 checksum) to verify that two MD5 strings are the same if both checksums are "G32"
 
Last edited:

emexes

Well-Known Member
Licensed User
"G", then add 32 (the length of the original Md5 string) to it to produce "G32".
Given that MD5 is 128 bits = 32 hexadecimal digits/characters, adding the length seems somewhat redundant.

Also XORing ASCII hexadecimal digits/characters will only result in 32 distinct possible checkvalues, rather than the ca. 128 you might be expecting.

But the general concept of creating a shorter hash from the MD5 hash is good; my first thought was to rehash the data using a shorter hash size, which would usually take thousands or even millions of times longer, so... I feel like a dunce but now I've learned from a master. 🍻
 
Last edited:

Erel

Administrator
Staff member
Licensed User
Hash is not really built for this use case. By definition, multiple strings can return the same hash. The shorter the hash will be the higher the chances that there will be conflicts.

The way such services work is by assigning a random id to the URL and keeping the short id and the original URL in a database.
 

emexes

Well-Known Member
Licensed User
i am looking for a way to make this hash shorter so that it can be written down quickly by hand. My goal is to get a hash that is 6-8 characters long and then serve as url like: www.myurl.de/10gherk
If you want 7 characters and you use Base64, then that'd be 42 bits, so just swipe the first or last 10 hex characters, convert them to 5 bytes, and then base-64 encode them.

But need to watch out for collisions: 40 bits will give you a million-million different hashes, so the chances of two different files ending up with the same 7-character Base64 URL are slim but not zero.
 
Last edited:

emexes

Well-Known Member
Licensed User
Base64 is usually encoded in groups of 3 x 8-bit input bytes = 4 x 6-bit encoded output characters, so perhaps define your URL as being 8 characters rather than 7.
 
Last edited:

JohnC

Well-Known Member
Licensed User
i am looking for a way to make this hash shorter so that it can be written down quickly by hand My goal is to get a hash that is 6-8 characters long and then serve as url like: www.myurl.de/10gherk
OK, now I know what you want to do:

If the redirection (from the short URL to the longer URL) is happening on your website, then that means you will need to create a web routine to do that redirection in the first place.

And while you are already writing code for your website (to do the redirection), you might as well complete the circle and add a routine that generates the shorturl's yourself.

For example, create a webservice maybe like this:

GenShortUrl(LongUrl as string) as String

And this routine will create a new record in a database and store the long URL in it, and then simply return the database record number (maybe as a base-64 or other method to keep the character count low if you plan on storing 10k+ Urls)

So for example, when you first call the routine with http://myurl.de?file=8787fshifncinfdvh.PDF as the long url, the routine will return a "1' because that was the first record to be created in the database. So the short URL would then be www.myurl.de/1

And when that url is later presented to your website, your "DecodeShortURL" routine it will simply lookup to see what the full url is in the database record #1 and redirect to it.

This is obviously a very simple example - and will easily allow any user to simply substitute the number 1 with a different number to gain access to whatever full url is in that record number. So if security is an issue, then you will need to add a little encryption to the generated short URL to prevent "browsing" by rejecting invalid shortcodes.
 
Last edited:

emexes

Well-Known Member
Licensed User
if security is an issue, then you will need to add a little encryption
This is effectively what the hashing does, by hiding the valid shortcode needles randomly about a 42-bit haystack.

And given that a haystack is about 6 billion times bigger than a needle, that would mean looking for needles in 2^42 / 6 billion = 733 haystacks.
 

JohnC

Well-Known Member
Licensed User
This is effectively what the hashing does, by hiding the valid shortcode needles randomly about a 42-bit haystack.
But with hashing, you can't reverse the hash back to the shortcode (record) number.

Encryption will allow you to convert the shortcode/record number 123 into "jd3-djf", then still be able to decrypt it back to 123.

Unless I am not understanding how you would apply hashing to my example in post #11
 

emexes

Well-Known Member
Licensed User
But with hashing, you can't reverse the hash back to the record number.
If we have a Map with original URL and shortened URL (hash) then hashing works fine.

And hashes can be reconstructed from the original URL or data; record numbers can't. 🍻
 

emexes

Well-Known Member
Licensed User
You can just generate a random 5 letters string. It doesn't need to have any relation to the original URL.
And hashes can be reconstructed from the original URL or data; record numbers and other random choices can't. 🍻

But there is the risk of collision, of two different URLs or data hashing to the same point. Happily, with a space of 2^42, that will be an infrequent occurrance.
 

JohnC

Well-Known Member
Licensed User
If we have a Map with original URL and shortened URL (hash) then hashing works fine.
Now I gotcha - and just limit the hash method so it doesn't generate a hash of more then 6-8 characters.
 

Jeffrey Cameron

Well-Known Member
Licensed User
I made a routine to do this for my personal use ages ago (it was for an RSS feed utility), I just stored the URL along with a base-32-or-so (I forget exactly what base I used) sequentially incremented counter in a database. The shortened URL was something along the lines of "websiteurl/2xe14f".

Obviously, with this method the first URL would be "../1" and then "../2" and so on. But, if you're not worried about people randomly scraping your URL database this method would work fine.
 
Top