Tag Archives: algorithms

Hash functions

This is “JFTR” – saving few known simple hash functions for further reference (nothing special, really – it’s a “stash” post).

unsigned int HashRs(const char * str)

    static const unsigned int b = 378551;
    unsigned int a = 63689;
    unsigned int hash = 0;

    for(; *str; str++)
        hash = hash * a + (unsigned char)(*str);
        a *= b;

    return hash;

unsigned int HashLy(const char * str)

    unsigned int hash = 0;

    for(; *str; str++)
        hash = (hash * 1664525) + (unsigned char)(*str) + 1013904223;

    return hash;

unsigned int HashRot13(const char * str)

    unsigned int hash = 0;

    for(; *str; str++)
        hash += (unsigned char)(*str);
        hash -= (hash << 13) | (hash >> 19);

    return hash;




# of bits in a number N is floor(log N) + 1

Things like this make me tear off patches of imaginary hair from my bald skull in desperate frustration of “what the hell did I waste my time on in university?” rethorics. For whatever I learned there has nothing to do with CS. Nothing to do with anything, in fact – even my drinking habits are long different from those days.

Heap algorithm in Perl

Another day, another study. Again, nothing fancy – merely a FTR, heap (or minheap, with root containing the minimum) implementation in Perl. extract_top name is to keep main interface similar to existing CPAN modules (like Heap::Simple)

package Heap::Hydra;

sub new {
   return bless {_heap => [undef,], _heap_size => 0}, shift;

sub getParent {
   my $self = shift;
   my $child_pos = shift;

   my $parent_pos = int($child_pos * 0.5) || 1;
   return $self->{_heap}->[$parent_pos], $parent_pos;

sub getMinKid {
   my $self = shift;
   my $parent_pos = shift;

   my $child_pos = $parent_pos * 2;
   return undef if $child_pos >= scalar @{$self->{_heap}};

   my $min_child = $self->{_heap}->[$child_pos];
   if (defined $self->{_heap}->[$child_pos + 1] && $self->{_heap}->[$child_pos + 1] < $min_child) {      $child_pos += 1;      $min_child = $self->{_heap}->[$child_pos];
   return $min_child, $child_pos;

sub extract_top {
   my $self = shift;
   my $top_pos = shift || 1;

   my ($new_top, $new_top_pos) = $self->getMinKid($top_pos);
   if (!defined $new_top) {
     return splice @{$self->{_heap}}, $top_pos, 1;

   my $prev_top = $self->{_heap}->[$top_pos];
   $self->{_heap}->[$top_pos] = $self->extract_top($new_top_pos);

   return $prev_top;

sub insert {
   my $self = shift;
   my $child = shift;

   $self->{_heap}->[$self->{_heap_size}] = $child;

   my $child_pos = $self->{_heap_size};
   my ($parent, $parent_pos) = $self->getParent($child_pos);
   while ($parent > $child) {
     $self->{_heap}->[$parent_pos] = $child;
     $self->{_heap}->[$child_pos] = $parent;
     $child_pos = $parent_pos;

     ($parent, $parent_pos) = $self->getParent($child_pos);

   return $child_pos;

Greatest prime divisor

Just for the record – custom algorithm for searching greatest prime divisor of a number, O(n) worst-case (O(sqrt(n)) taken by prime detection):

def prime(num, get_number=False):
  for divisor in range(2, int(math.sqrt(num)) + 1):
    if num % divisor == 0:
      return False
  return num if get_number else True

def hydra_gpd(num, get_number=None):
  sequence = range(2, int(math.sqrt(num)) + 1)
  gpd = 1
  for divisor in sequence:
    divided = int(num / divisor)
    if num % divisor == 0:
      if prime(divisor) and divisor > gpd:
        gpd = divisor
      if prime(divided) and divided > gpd:
        gpd = divided
      recursive_gpd = hydra_gpd(divided)
      if recursive_gpd > gpd:
        gpd = recursive_gpd
      return gpd
  return gpd

Nothing too special, just to keep it around.

Accidental hashing

How often do you use raw data structures in your everyday coding routine? No, I don’t mean when you populate a dictionary (hash) or fill an array (list). I mean constructing hash yourself, or making a binary search tree or something? Well, you might be luckier than me and do that regularly – but me, I seldom do.

There was a care recently though, and I was puzzled – how do I get a simple hash function that would handle arbitrary key, would depend on characters position (so “foo” and “oof” wouldn’t produce same result) and produce result of limited range to reduce collisions – because collisions are as likely as a ratio of a key dictionary size to a has size (i.e. size of a list of all possible keys to a number of “slots” in hash).

And I accidentally ran into this:

key_hash = 1.0

for character in key:

key_hash = chr(character) / key_hash

and it works like a charm: it produces square-like results (thus sticking within quite limited dictionary size), but more varied. And if you need to spread it over bigger hash – you just multiply it by 10^n. Number sequence for n(1..100) looks like this (stroked line inside is sqrt(n):

Screenshot 2013-11-21 11.38.54

and results for arbitrary keys are:

“foobar” : 1.129169193450576

“this is supposed to be a long line, long enough to represent some arbitrary dictionary entry” : 98.989898474541334

so you multiply it by 10^n, get the remainder of division by hash size – and you’re done!

Of course it ain’t perfect – for one, “o” and “ooo” would yield the same result, but it’s perfect for the simple case I had (and can be improved if required).

Perl sort

You only realize the difference when you compare things. I sometimes thought “how fast built-in Perl sort really is?”, but never got to any comparison (finally got to some sort algorithms research). Well, no surprises, of course (and my implementations are quite sub-optimal), but still (sorta FTR – this is a sort of 10K-element array filled with int representations of rand(10000) values):

  • perl: 0.00242
  • quicksort: 0.097454
  • mergesort: 0.115439
  • insert sort: 8.223053

Another FTR – Perl (since 5.8) has both quicksort and mergesort, although as I understand quicksort is still used by default – just shuffling (big?) arrays  to save from worst-case scenario in pre-sorted case. You can change the sort algorithm though with this (from http://perldoc.perl.org/sort.html):

  1. use sort ‘stable’; # guarantee stability
  2. use sort ‘_quicksort’; # use a quicksort algorithm
  3. use sort ‘_mergesort’; # use a mergesort algorithm
  4. use sort ‘defaults’; # revert to default behavior
  5. no sort ‘stable’; # stability not important

Like, if you need a “stable” sort (which mergesort is) – when same elements preserve their initial order.