Category Archives: Algorithms

Math is beautiful – it’s a sheer magic to me, but now and then some good people publish approachable articles that allow me get a tiny dash of understanding of how things work.

There’s one such article today – https://innovation.vivint.com/introduction-to-reed-solomon-bc264d0794f8 (and complimentary https://medium.com/@jtolds/joseph-louis-lagrange-and-the-polynomials-499cf0742b39) that I could highly recommend. It’s a fairly simplified overview that provides just a basic idea of a particular error correction technique, but it’s simple yet comprehensive.

In fact it was so fascinating I couldn’t stop myself from giving it a (very short and simple) try. I won’t repeat the articles in any way, just post a code with some comments.

Let’s say we have this code in Python (minor comments inline):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
xs = [ 1.0, # float format to make suure calculation precision is not impacted - it fails badly otherwise
       2.0,
       3.0,
       4.0,
]
 
ys = [ 10, # the values here are random - like in "I made them up". But this is about "any numbers", right?
       42,
       74,
       111,
]
 
# it's a direct representation of what's described in the https://medium.com/@jtolds/joseph-louis-lagrange-and-the-polynomials-499cf0742b39 article
def l0(x):
    return ( (x-xs[1]) * (x-xs[2]) * (x-xs[3]) ) / ( (xs[0] - xs[1]) * (xs[0] - xs[2]) * (xs[0] - xs[3]) )
def l1(x):
    return ( (x-xs[0]) * (x-xs[2]) * (x-xs[3]) ) / ( (xs[1] - xs[0]) * (xs[1] - xs[2]) * (xs[1] - xs[3]) )
def l2(x):
    return ( (x-xs[0]) * (x-xs[1]) * (x-xs[3]) ) / ( (xs[2] - xs[0]) * (xs[2] - xs[1]) * (xs[2] - xs[3]) )
def l3(x):
    return ( (x-xs[0]) * (x-xs[1]) * (x-xs[2]) ) / ( (xs[3] - xs[0]) * (xs[3] - xs[1]) * (xs[3] - xs[2]) )
 
# as well as this
def f(num):
    return ys[0] * l0(num) + ys[1] * l1(num) + ys[2] * l2(num) + ys[3] * l3(num)
 
for x in range(0, 10):
    fx = f(x)
    print "{}: {}".format(x, fx)

And we run it and we get this:

0: -27.0
1: 10.0
2: 42.0
3: 74.0
4: 111.0
5: 158.0
6: 220.0
7: 302.0
8: 409.0
9: 546.0

(you can add more point; you could also use matplotlib to plot them)

So as you can see it reflected the pre-defined 4 points (10, 42, 74 and 111), but also calculated other points on a curve. So let’s say we sent 6 point, but client received only points 1,2,5 and 6 (10, 42, 158, 220).

If we adjust the input values to look like this:

xs = [ 1.0,
       2.0,
       5.0,
       6.0,
]
 
ys = [ 10,
       42,
       158,
       220,
]

and run it again, we’d still get all the values, because 4 values are enough to define the cubic function curve, and these were taken from that very curve:

0: -27.0
1: 10.0
2: 42.0
3: 74.0
4: 111.0
5: 158.0
6: 220.0
7: 302.0
8: 409.0
9: 546.0

Magic, right?

Some extra calculations around it are also available at http://mathonline.wikidot.com/linear-lagrange-interpolating-polynomials

Hash functions

This is “JFTR” – saving few known simple hash functions for further reference (nothing special, really – it’s a “stash” post).

unsigned int HashRs(const char * str)
{

    static const unsigned int b = 378551;
    unsigned int a = 63689;
    unsigned int hash = 0;

    for(; *str; str++)
    {
        hash = hash * a + (unsigned char)(*str);
        a *= b;
    }

    return hash;

}
unsigned int HashLy(const char * str)
{

    unsigned int hash = 0;

    for(; *str; str++)
        hash = (hash * 1664525) + (unsigned char)(*str) + 1013904223;

    return hash;

}
unsigned int HashRot13(const char * str)
{

    unsigned int hash = 0;

    for(; *str; str++)
    {
        hash += (unsigned char)(*str);
        hash -= (hash << 13) | (hash >> 19);
    }

    return hash;

}

 

Revelations

# of bits in a number N is floor(log N) + 1

Things like this make me tear off patches of imaginary hair from my bald skull in desperate frustration of “what the hell did I waste my time on in university?” rethorics. For whatever I learned there has nothing to do with CS. Nothing to do with anything, in fact – even my drinking habits are long different from those days.

Heap algorithm in Perl

Another day, another study. Again, nothing fancy – merely a FTR, heap (or minheap, with root containing the minimum) implementation in Perl. extract_top name is to keep main interface similar to existing CPAN modules (like Heap::Simple)

package Heap::Hydra;

sub new {
   return bless {_heap => [undef,], _heap_size => 0}, shift;
}

sub getParent {
   my $self = shift;
   my $child_pos = shift;

   my $parent_pos = int($child_pos * 0.5) || 1;
   return $self->{_heap}->[$parent_pos], $parent_pos;
}

sub getMinKid {
   my $self = shift;
   my $parent_pos = shift;

   my $child_pos = $parent_pos * 2;
   return undef if $child_pos >= scalar @{$self->{_heap}};

   my $min_child = $self->{_heap}->[$child_pos];
   if (defined $self->{_heap}->[$child_pos + 1] && $self->{_heap}->[$child_pos + 1] < $min_child) {      $child_pos += 1;      $min_child = $self->{_heap}->[$child_pos];
   }
   return $min_child, $child_pos;
}

sub extract_top {
   my $self = shift;
   my $top_pos = shift || 1;

   my ($new_top, $new_top_pos) = $self->getMinKid($top_pos);
   if (!defined $new_top) {
     return splice @{$self->{_heap}}, $top_pos, 1;
   }

   my $prev_top = $self->{_heap}->[$top_pos];
   $self->{_heap}->[$top_pos] = $self->extract_top($new_top_pos);
   $self->{_heap_size}--;

   return $prev_top;
}

sub insert {
   my $self = shift;
   my $child = shift;

   $self->{_heap_size}++;
   $self->{_heap}->[$self->{_heap_size}] = $child;

   my $child_pos = $self->{_heap_size};
   my ($parent, $parent_pos) = $self->getParent($child_pos);
   while ($parent > $child) {
     $self->{_heap}->[$parent_pos] = $child;
     $self->{_heap}->[$child_pos] = $parent;
     $child_pos = $parent_pos;

     ($parent, $parent_pos) = $self->getParent($child_pos);
   }

   return $child_pos;
}

Greatest prime divisor

Just for the record – custom algorithm for searching greatest prime divisor of a number, O(n) worst-case (O(sqrt(n)) taken by prime detection):

def prime(num, get_number=False):
  for divisor in range(2, int(math.sqrt(num)) + 1):
    if num % divisor == 0:
      return False
  return num if get_number else True

def hydra_gpd(num, get_number=None):
  sequence = range(2, int(math.sqrt(num)) + 1)
  gpd = 1
  for divisor in sequence:
    divided = int(num / divisor)
    if num % divisor == 0:
      if prime(divisor) and divisor > gpd:
        gpd = divisor
      if prime(divided) and divided > gpd:
        gpd = divided
      recursive_gpd = hydra_gpd(divided)
      if recursive_gpd > gpd:
        gpd = recursive_gpd
      return gpd
  return gpd

Nothing too special, just to keep it around.

Accidental hashing

How often do you use raw data structures in your everyday coding routine? No, I don’t mean when you populate a dictionary (hash) or fill an array (list). I mean constructing hash yourself, or making a binary search tree or something? Well, you might be luckier than me and do that regularly – but me, I seldom do.

There was a care recently though, and I was puzzled – how do I get a simple hash function that would handle arbitrary key, would depend on characters position (so “foo” and “oof” wouldn’t produce same result) and produce result of limited range to reduce collisions – because collisions are as likely as a ratio of a key dictionary size to a has size (i.e. size of a list of all possible keys to a number of “slots” in hash).

And I accidentally ran into this:

key_hash = 1.0

for character in key:

key_hash = chr(character) / key_hash

and it works like a charm: it produces square-like results (thus sticking within quite limited dictionary size), but more varied. And if you need to spread it over bigger hash – you just multiply it by 10^n. Number sequence for n(1..100) looks like this (stroked line inside is sqrt(n):

Screenshot 2013-11-21 11.38.54

and results for arbitrary keys are:

“foobar” : 1.129169193450576

“this is supposed to be a long line, long enough to represent some arbitrary dictionary entry” : 98.989898474541334

so you multiply it by 10^n, get the remainder of division by hash size – and you’re done!

Of course it ain’t perfect – for one, “o” and “ooo” would yield the same result, but it’s perfect for the simple case I had (and can be improved if required).

Perl sort

You only realize the difference when you compare things. I sometimes thought “how fast built-in Perl sort really is?”, but never got to any comparison (finally got to some sort algorithms research). Well, no surprises, of course (and my implementations are quite sub-optimal), but still (sorta FTR – this is a sort of 10K-element array filled with int representations of rand(10000) values):

  • perl: 0.00242
  • quicksort: 0.097454
  • mergesort: 0.115439
  • insert sort: 8.223053

Another FTR – Perl (since 5.8) has both quicksort and mergesort, although as I understand quicksort is still used by default – just shuffling (big?) arrays  to save from worst-case scenario in pre-sorted case. You can change the sort algorithm though with this (from http://perldoc.perl.org/sort.html):

  1. use sort ‘stable’; # guarantee stability
  2. use sort ‘_quicksort’; # use a quicksort algorithm
  3. use sort ‘_mergesort’; # use a mergesort algorithm
  4. use sort ‘defaults’; # revert to default behavior
  5. no sort ‘stable’; # stability not important

Like, if you need a “stable” sort (which mergesort is) – when same elements preserve their initial order.