Monthly Archives: November 2013

Revelations

# of bits in a number N is floor(log N) + 1

Things like this make me tear off patches of imaginary hair from my bald skull in desperate frustration of “what the hell did I waste my time on in university?” rethorics. For whatever I learned there has nothing to do with CS. Nothing to do with anything, in fact – even my drinking habits are long different from those days.

Perl “inheritance”

This is just a random batch of records to keep around.

use base: load modules, add to @ISA (which is a list of “predecessors”)

one can add to @ISA, modules should be pre-used; there’s ->isa() method to check.

NEXT calls next in chain. So if you have D->ISA(A, C) and A->ISA(B), and then each class has sub n { shift->NEXT::n }, n would be called in each class in this chain: D->A->B->C (no matter A and B are unaware of C). FTR SUPER would call first class in @ISA, period.

use parent modules does less than use base, but essentially the same (at user level). Big difference: it tries to load packages from files, so if you have them named differently or (which is “named differently” as well) specified in same class, you need to do use parent -norequire, modules

use is compile-time, require is run-time.

Classes have autoload (when method is unavailable):

  1. our $AUTOLOAD;
  2. sub AUTOLOAD {…}

it conflicts with DESTROY – autoload handles destroy if former is defined. Also errors, thrown in destroy, would not (in most cases) stop the program.

You can bless any kind of data structures (not only hash)

On methods resolution in inheritance: Perl uses DFS, so lookup goes down the one package line, and switches to the next class only when done with previous. MRO (Method Resolution Order) C3 does more like breadth-first, used like use mro; or use mro ‘c3’; (use mro ‘dfs’; for Perl model). Can be redefined for each class.

 

SQL joins

Now this is something that keeps falling out of my head. Took me a good deal of time to remember the simplest thing or left/right joins idea, and it’s just moments ago that I hard-nailed the fact that “inner” and “outer” are just help-words, a decor (here’s the equivalence table, thx this StackOverflow article):

A LEFT JOIN B            A LEFT OUTER JOIN B
A RIGHT JOIN B           A RIGHT OUTER JOIN B
A FULL JOIN B            A FULL OUTER JOIN B
A INNER JOIN B           A JOIN B

 and now to fix the rest (thx another StackOverflow article). Note: if you're using MySQL, mind that it doesn't have full joins - although you can organize that as a UNION of left and right joins.

SQL join types

Also, while we’re at it – some normal forms memo:

1NF: each table field should contain only “atomic” values, i.e. single (not combined) values from  some finite domain – like, names. Domain might grow (new names appear), but each value (each name) should be an independent atom. Also each record, each occurrence or that field should contain a single value (of that domain) only

2NF: 1NF + table should not have duplicated records for a whole primary key. For instance, table with product, manufacturer, manufacturer_address is wrong – address would be repeated many times for no reason as it’s tied to manufacturer only (and the key is product+manufacturer). Should be two tables – manufacturer, address and product, manufacturer.

3NF: 2NF + all non-key attributes (fields) should be directly related to the key, and only to the key. For instance, if in the above (2NF) example table with product and manufacturer fields would also have product_factory and product_factory_address field – that would be wrong, because factory address is not related to the key (product+manufacturer) and should be located separately as product_factory, product_factory address table.

And I still fail to construct a proper phrase for what is “relational database”. This is bloody mess, all those descriptions that are there. I  get lost on the third word, no matter how many times I try. From what I can tell, it’s a database with a key->values relation, where the columns bear a “related” knowledge – unlike NoSQLs that have quite loose sense of what is hidden under the key (like, a user session under a hash key). Not sure it’s correct though =)

Heap algorithm in Perl

Another day, another study. Again, nothing fancy – merely a FTR, heap (or minheap, with root containing the minimum) implementation in Perl. extract_top name is to keep main interface similar to existing CPAN modules (like Heap::Simple)

package Heap::Hydra;

sub new {
   return bless {_heap => [undef,], _heap_size => 0}, shift;
}

sub getParent {
   my $self = shift;
   my $child_pos = shift;

   my $parent_pos = int($child_pos * 0.5) || 1;
   return $self->{_heap}->[$parent_pos], $parent_pos;
}

sub getMinKid {
   my $self = shift;
   my $parent_pos = shift;

   my $child_pos = $parent_pos * 2;
   return undef if $child_pos >= scalar @{$self->{_heap}};

   my $min_child = $self->{_heap}->[$child_pos];
   if (defined $self->{_heap}->[$child_pos + 1] && $self->{_heap}->[$child_pos + 1] < $min_child) {      $child_pos += 1;      $min_child = $self->{_heap}->[$child_pos];
   }
   return $min_child, $child_pos;
}

sub extract_top {
   my $self = shift;
   my $top_pos = shift || 1;

   my ($new_top, $new_top_pos) = $self->getMinKid($top_pos);
   if (!defined $new_top) {
     return splice @{$self->{_heap}}, $top_pos, 1;
   }

   my $prev_top = $self->{_heap}->[$top_pos];
   $self->{_heap}->[$top_pos] = $self->extract_top($new_top_pos);
   $self->{_heap_size}--;

   return $prev_top;
}

sub insert {
   my $self = shift;
   my $child = shift;

   $self->{_heap_size}++;
   $self->{_heap}->[$self->{_heap_size}] = $child;

   my $child_pos = $self->{_heap_size};
   my ($parent, $parent_pos) = $self->getParent($child_pos);
   while ($parent > $child) {
     $self->{_heap}->[$parent_pos] = $child;
     $self->{_heap}->[$child_pos] = $parent;
     $child_pos = $parent_pos;

     ($parent, $parent_pos) = $self->getParent($child_pos);
   }

   return $child_pos;
}

Greatest prime divisor

Just for the record – custom algorithm for searching greatest prime divisor of a number, O(n) worst-case (O(sqrt(n)) taken by prime detection):

def prime(num, get_number=False):
  for divisor in range(2, int(math.sqrt(num)) + 1):
    if num % divisor == 0:
      return False
  return num if get_number else True

def hydra_gpd(num, get_number=None):
  sequence = range(2, int(math.sqrt(num)) + 1)
  gpd = 1
  for divisor in sequence:
    divided = int(num / divisor)
    if num % divisor == 0:
      if prime(divisor) and divisor > gpd:
        gpd = divisor
      if prime(divided) and divided > gpd:
        gpd = divided
      recursive_gpd = hydra_gpd(divided)
      if recursive_gpd > gpd:
        gpd = recursive_gpd
      return gpd
  return gpd

Nothing too special, just to keep it around.

Accidental hashing

How often do you use raw data structures in your everyday coding routine? No, I don’t mean when you populate a dictionary (hash) or fill an array (list). I mean constructing hash yourself, or making a binary search tree or something? Well, you might be luckier than me and do that regularly – but me, I seldom do.

There was a care recently though, and I was puzzled – how do I get a simple hash function that would handle arbitrary key, would depend on characters position (so “foo” and “oof” wouldn’t produce same result) and produce result of limited range to reduce collisions – because collisions are as likely as a ratio of a key dictionary size to a has size (i.e. size of a list of all possible keys to a number of “slots” in hash).

And I accidentally ran into this:

key_hash = 1.0

for character in key:

key_hash = chr(character) / key_hash

and it works like a charm: it produces square-like results (thus sticking within quite limited dictionary size), but more varied. And if you need to spread it over bigger hash – you just multiply it by 10^n. Number sequence for n(1..100) looks like this (stroked line inside is sqrt(n):

Screenshot 2013-11-21 11.38.54

and results for arbitrary keys are:

“foobar” : 1.129169193450576

“this is supposed to be a long line, long enough to represent some arbitrary dictionary entry” : 98.989898474541334

so you multiply it by 10^n, get the remainder of division by hash size – and you’re done!

Of course it ain’t perfect – for one, “o” and “ooo” would yield the same result, but it’s perfect for the simple case I had (and can be improved if required).

Developers on support

37signals recently posted a great article on developers doing support: http://37signals.com/svn/posts/3676-everyone-on-support. I think it’s extraordinary good description of why, how and what for (i.e. what good does it do) developers should sometimes be involved in customer support. That’s what I sometimes do now (it’s occasional, unfortunately – mostly when questions are too deep into code for support) and did that a lot beck in RightMedia/Y! days, and I can say it’s all true. To a certain limit, of course (that I faced, too) – but if you do a balanced shifts, it plays nicely.

Bluetooth sound quality issues in OSX

I love the wireless technologies, I truly adore them. One thing that bothers me about Bluetooth though is sound glitches and jumps – it works pretty for some time, and then all of a sudden it starts twitching, small fractions of it missing. Minor, but very annoying.

This is actually broader issue than OSX – for instance, no smartphone works with a Bluetooth headset perfectly if I keep the phone in my jeans pocket. It all goes well and nice till it gets jumpy out of the blue – and then back normal again.

Now with Mac it’s a bit different – at some point sound starts twitching, and it goes on then till I re-initiate Bluetooth connection. And it was kinda alright till it got me. One speculation I found over the Internet was connection bitrate is too low and needs to be raised with this command line:

defaults write com.apple.BluetoothAudioAgent “Apple Bitpool Min (editable)” 60

because default is 40. So I have tried that, it didn’t help much, so I assumed that’s not the right solution and forgot about it. And after a while I did what I should have done right away – I restarted Bluetooth on Mac. And… speaker stopped working at all – it connected, but failed with something like “compressed stream is unacceptable”. Since I forgot I mangled Bluetooth settings, first thought was my speaker broke. And it was kinda alright (it has wired input, quite old and I don’t use it every day) till it got me.

So I googled up the error message and results stroke me with remembrance. Changing the value in aforementioned command to 53 (that was highest suggested working rate) and… it’s all good now! Can’t say about twitches – they were sporadic and never ocurred right away – but at least I got it back working.

One note: that command is for user space, don’t run it with sudo (no harm there, just wouldn’t work).

Perl sort

You only realize the difference when you compare things. I sometimes thought “how fast built-in Perl sort really is?”, but never got to any comparison (finally got to some sort algorithms research). Well, no surprises, of course (and my implementations are quite sub-optimal), but still (sorta FTR – this is a sort of 10K-element array filled with int representations of rand(10000) values):

  • perl: 0.00242
  • quicksort: 0.097454
  • mergesort: 0.115439
  • insert sort: 8.223053

Another FTR – Perl (since 5.8) has both quicksort and mergesort, although as I understand quicksort is still used by default – just shuffling (big?) arrays  to save from worst-case scenario in pre-sorted case. You can change the sort algorithm though with this (from http://perldoc.perl.org/sort.html):

  1. use sort ‘stable’; # guarantee stability
  2. use sort ‘_quicksort’; # use a quicksort algorithm
  3. use sort ‘_mergesort’; # use a mergesort algorithm
  4. use sort ‘defaults’; # revert to default behavior
  5. no sort ‘stable’; # stability not important

Like, if you need a “stable” sort (which mergesort is) – when same elements preserve their initial order.