Accidental hashing

How often do you use raw data structures in your everyday coding routine? No, I don’t mean when you populate a dictionary (hash) or fill an array (list). I mean constructing hash yourself, or making a binary search tree or something? Well, you might be luckier than me and do that regularly – but me, I seldom do.

There was a care recently though, and I was puzzled – how do I get a simple hash function that would handle arbitrary key, would depend on characters position (so “foo” and “oof” wouldn’t produce same result) and produce result of limited range to reduce collisions – because collisions are as likely as a ratio of a key dictionary size to a has size (i.e. size of a list of all possible keys to a number of “slots” in hash).

And I accidentally ran into this:

key_hash = 1.0

for character in key:

key_hash = chr(character) / key_hash

and it works like a charm: it produces square-like results (thus sticking within quite limited dictionary size), but more varied. And if you need to spread it over bigger hash – you just multiply it by 10^n. Number sequence for n(1..100) looks like this (stroked line inside is sqrt(n):

Screenshot 2013-11-21 11.38.54

and results for arbitrary keys are:

“foobar” : 1.129169193450576

“this is supposed to be a long line, long enough to represent some arbitrary dictionary entry” : 98.989898474541334

so you multiply it by 10^n, get the remainder of division by hash size – and you’re done!

Of course it ain’t perfect – for one, “o” and “ooo” would yield the same result, but it’s perfect for the simple case I had (and can be improved if required).