Category Archives: development

SECR-2013 notes at the end of 2013

Haven’t updated this place for a while – need to get back on this new habit, really. Well, at least
I hope I do so in upcoming year (should be a big one anyways). So a quick update just before I tear the old calendar down from the wall (and I pity that, it’s the Futurama genuine 2013 calendar) – some records that date back as far as late October, some points I squeezed from that SECR conference I attended back then. These are quick and without any verbose explanation – rather “FTR” stuff to keep.

OPs/delivery:

  • monitor user delivery/connection time, detect slow (comparing to others or expected), direct them to closer/… node/instance by default. Meaning, you have logs, so you have the time of requests processing. So if you have some requests of one kind taking (under circumstances – be it another location or another browser or something) – taking longer that requests of the same king in other set of circumstances, you might need to adjust delivery rules – redirect traffic, test under different environment etc.
  • config/(code?) deployment through the Dropbox – as simple as it sounds, “simple parts” (I’m thinking configs, but might be updates as well) distribution via Dropbox.

Initial project stage:

  • “visual brief” – ask the basic associations, like “fast”, “pretty”, … – give some appropriate images to pick from, iterate. Like, ask customer what impression his product is aiming to project. Say the answer is “paramount”. Then suggest few images – mountain, the Sun, the Earth and the Moon, big teacher and small student – and ask customer to pick one (or few). And then iterate with styles and colors and other ideas.
  • moodboards (MoodShare…) – pick lots of stuff on the screen, let customer select, SCAMPER (http://www.mindtools.com/pages/article/newCT_02.htm)

CI

  • try test using DB/ssh connect (separate config) to run local changes against test instance remotely
  • togglers: big-feature switches, allow to disable/enable particular features separately, config- (or environment-) controlled way (triggers in cookies, ENV, config, URL etc.
  • same tests on CI and local, local ran for feature-related validations (branches), CI for default
  • think performance tests

KPI (http://management.about.com/cs/generalmanagement/a/keyperfindic.htm):

  • measure what you want to adjust (meaning parameters selected fro KPI estimation)

Dev points:

  • make user feedback easy (put feedback interface at a clearly visible place, make it simple)
  • get a cool team (people that fit together (this is important), motivated (not just money, also interested in what they’re doing), productive)
  • make internal communication easy (chat, video, whatever – to let teams or team members connect without any hassle)

QA:

  • ask testers to write test cases descriptions right after ticket is created (confirmed, put to backlog) and discuss/adjust along the way. I think this one is really beneficial if used the right way, although no personal experience here yet.

Well, that’s it for now – and probably for this year, too. See me (for I think I’m the only reader – or at least a writer – of this blog) in 2014!

Revelations

# of bits in a number N is floor(log N) + 1

Things like this make me tear off patches of imaginary hair from my bald skull in desperate frustration of “what the hell did I waste my time on in university?” rethorics. For whatever I learned there has nothing to do with CS. Nothing to do with anything, in fact – even my drinking habits are long different from those days.

Perl “inheritance”

This is just a random batch of records to keep around.

use base: load modules, add to @ISA (which is a list of “predecessors”)

one can add to @ISA, modules should be pre-used; there’s ->isa() method to check.

NEXT calls next in chain. So if you have D->ISA(A, C) and A->ISA(B), and then each class has sub n { shift->NEXT::n }, n would be called in each class in this chain: D->A->B->C (no matter A and B are unaware of C). FTR SUPER would call first class in @ISA, period.

use parent modules does less than use base, but essentially the same (at user level). Big difference: it tries to load packages from files, so if you have them named differently or (which is “named differently” as well) specified in same class, you need to do use parent -norequire, modules

use is compile-time, require is run-time.

Classes have autoload (when method is unavailable):

  1. our $AUTOLOAD;
  2. sub AUTOLOAD {…}

it conflicts with DESTROY – autoload handles destroy if former is defined. Also errors, thrown in destroy, would not (in most cases) stop the program.

You can bless any kind of data structures (not only hash)

On methods resolution in inheritance: Perl uses DFS, so lookup goes down the one package line, and switches to the next class only when done with previous. MRO (Method Resolution Order) C3 does more like breadth-first, used like use mro; or use mro ‘c3’; (use mro ‘dfs’; for Perl model). Can be redefined for each class.

 

SQL joins

Now this is something that keeps falling out of my head. Took me a good deal of time to remember the simplest thing or left/right joins idea, and it’s just moments ago that I hard-nailed the fact that “inner” and “outer” are just help-words, a decor (here’s the equivalence table, thx this StackOverflow article):

A LEFT JOIN B            A LEFT OUTER JOIN B
A RIGHT JOIN B           A RIGHT OUTER JOIN B
A FULL JOIN B            A FULL OUTER JOIN B
A INNER JOIN B           A JOIN B

 and now to fix the rest (thx another StackOverflow article). Note: if you're using MySQL, mind that it doesn't have full joins - although you can organize that as a UNION of left and right joins.

SQL join types

Also, while we’re at it – some normal forms memo:

1NF: each table field should contain only “atomic” values, i.e. single (not combined) values from  some finite domain – like, names. Domain might grow (new names appear), but each value (each name) should be an independent atom. Also each record, each occurrence or that field should contain a single value (of that domain) only

2NF: 1NF + table should not have duplicated records for a whole primary key. For instance, table with product, manufacturer, manufacturer_address is wrong – address would be repeated many times for no reason as it’s tied to manufacturer only (and the key is product+manufacturer). Should be two tables – manufacturer, address and product, manufacturer.

3NF: 2NF + all non-key attributes (fields) should be directly related to the key, and only to the key. For instance, if in the above (2NF) example table with product and manufacturer fields would also have product_factory and product_factory_address field – that would be wrong, because factory address is not related to the key (product+manufacturer) and should be located separately as product_factory, product_factory address table.

And I still fail to construct a proper phrase for what is “relational database”. This is bloody mess, all those descriptions that are there. I  get lost on the third word, no matter how many times I try. From what I can tell, it’s a database with a key->values relation, where the columns bear a “related” knowledge – unlike NoSQLs that have quite loose sense of what is hidden under the key (like, a user session under a hash key). Not sure it’s correct though =)

Heap algorithm in Perl

Another day, another study. Again, nothing fancy – merely a FTR, heap (or minheap, with root containing the minimum) implementation in Perl. extract_top name is to keep main interface similar to existing CPAN modules (like Heap::Simple)

package Heap::Hydra;

sub new {
   return bless {_heap => [undef,], _heap_size => 0}, shift;
}

sub getParent {
   my $self = shift;
   my $child_pos = shift;

   my $parent_pos = int($child_pos * 0.5) || 1;
   return $self->{_heap}->[$parent_pos], $parent_pos;
}

sub getMinKid {
   my $self = shift;
   my $parent_pos = shift;

   my $child_pos = $parent_pos * 2;
   return undef if $child_pos >= scalar @{$self->{_heap}};

   my $min_child = $self->{_heap}->[$child_pos];
   if (defined $self->{_heap}->[$child_pos + 1] && $self->{_heap}->[$child_pos + 1] < $min_child) {      $child_pos += 1;      $min_child = $self->{_heap}->[$child_pos];
   }
   return $min_child, $child_pos;
}

sub extract_top {
   my $self = shift;
   my $top_pos = shift || 1;

   my ($new_top, $new_top_pos) = $self->getMinKid($top_pos);
   if (!defined $new_top) {
     return splice @{$self->{_heap}}, $top_pos, 1;
   }

   my $prev_top = $self->{_heap}->[$top_pos];
   $self->{_heap}->[$top_pos] = $self->extract_top($new_top_pos);
   $self->{_heap_size}--;

   return $prev_top;
}

sub insert {
   my $self = shift;
   my $child = shift;

   $self->{_heap_size}++;
   $self->{_heap}->[$self->{_heap_size}] = $child;

   my $child_pos = $self->{_heap_size};
   my ($parent, $parent_pos) = $self->getParent($child_pos);
   while ($parent > $child) {
     $self->{_heap}->[$parent_pos] = $child;
     $self->{_heap}->[$child_pos] = $parent;
     $child_pos = $parent_pos;

     ($parent, $parent_pos) = $self->getParent($child_pos);
   }

   return $child_pos;
}

Greatest prime divisor

Just for the record – custom algorithm for searching greatest prime divisor of a number, O(n) worst-case (O(sqrt(n)) taken by prime detection):

def prime(num, get_number=False):
  for divisor in range(2, int(math.sqrt(num)) + 1):
    if num % divisor == 0:
      return False
  return num if get_number else True

def hydra_gpd(num, get_number=None):
  sequence = range(2, int(math.sqrt(num)) + 1)
  gpd = 1
  for divisor in sequence:
    divided = int(num / divisor)
    if num % divisor == 0:
      if prime(divisor) and divisor > gpd:
        gpd = divisor
      if prime(divided) and divided > gpd:
        gpd = divided
      recursive_gpd = hydra_gpd(divided)
      if recursive_gpd > gpd:
        gpd = recursive_gpd
      return gpd
  return gpd

Nothing too special, just to keep it around.

Accidental hashing

How often do you use raw data structures in your everyday coding routine? No, I don’t mean when you populate a dictionary (hash) or fill an array (list). I mean constructing hash yourself, or making a binary search tree or something? Well, you might be luckier than me and do that regularly – but me, I seldom do.

There was a care recently though, and I was puzzled – how do I get a simple hash function that would handle arbitrary key, would depend on characters position (so “foo” and “oof” wouldn’t produce same result) and produce result of limited range to reduce collisions – because collisions are as likely as a ratio of a key dictionary size to a has size (i.e. size of a list of all possible keys to a number of “slots” in hash).

And I accidentally ran into this:

key_hash = 1.0

for character in key:

key_hash = chr(character) / key_hash

and it works like a charm: it produces square-like results (thus sticking within quite limited dictionary size), but more varied. And if you need to spread it over bigger hash – you just multiply it by 10^n. Number sequence for n(1..100) looks like this (stroked line inside is sqrt(n):

Screenshot 2013-11-21 11.38.54

and results for arbitrary keys are:

“foobar” : 1.129169193450576

“this is supposed to be a long line, long enough to represent some arbitrary dictionary entry” : 98.989898474541334

so you multiply it by 10^n, get the remainder of division by hash size – and you’re done!

Of course it ain’t perfect – for one, “o” and “ooo” would yield the same result, but it’s perfect for the simple case I had (and can be improved if required).

Developers on support

37signals recently posted a great article on developers doing support: http://37signals.com/svn/posts/3676-everyone-on-support. I think it’s extraordinary good description of why, how and what for (i.e. what good does it do) developers should sometimes be involved in customer support. That’s what I sometimes do now (it’s occasional, unfortunately – mostly when questions are too deep into code for support) and did that a lot beck in RightMedia/Y! days, and I can say it’s all true. To a certain limit, of course (that I faced, too) – but if you do a balanced shifts, it plays nicely.

Perl sort

You only realize the difference when you compare things. I sometimes thought “how fast built-in Perl sort really is?”, but never got to any comparison (finally got to some sort algorithms research). Well, no surprises, of course (and my implementations are quite sub-optimal), but still (sorta FTR – this is a sort of 10K-element array filled with int representations of rand(10000) values):

  • perl: 0.00242
  • quicksort: 0.097454
  • mergesort: 0.115439
  • insert sort: 8.223053

Another FTR – Perl (since 5.8) has both quicksort and mergesort, although as I understand quicksort is still used by default – just shuffling (big?) arrays  to save from worst-case scenario in pre-sorted case. You can change the sort algorithm though with this (from http://perldoc.perl.org/sort.html):

  1. use sort ‘stable’; # guarantee stability
  2. use sort ‘_quicksort’; # use a quicksort algorithm
  3. use sort ‘_mergesort’; # use a mergesort algorithm
  4. use sort ‘defaults’; # revert to default behavior
  5. no sort ‘stable’; # stability not important

Like, if you need a “stable” sort (which mergesort is) – when same elements preserve their initial order.

Project features switch trick (aka Feature Toggle)

OK, so you probably know about feature branches. And DVCS. And CI (see that feature branches article again, there’s also PI idea I wasn’t aware of before). So do I (more or less). What I wasn’t aware of (till recent visit to CEE SEC(R) conference – which was OK, even good at some points) is Feature Toggle idea (sometimes referred as “switches”, “flags” etc.), which is quite useful in stand-alone software model, or if your service is big and world-wide.

The description Martin Fowler provides is pretty complete and verbose (who could have doubted), but still to squeeze the idea: for each feature you can consider a separate chunk of changes – such as a new part in UI that connects to the rest of the interface through a single link, new API method, a switch in data processing algorithm, that kind of stuff – you can do a configuration-level flag that, when given a true value, would enable that feature – or disable otherwise.

I see it useful in many cases – when you can try out new functionality on a limited group of customers but don’t want to build them a separate version, when you want your brand new user profile disguise appear only when customer system is ready for that without blocking other system updates (I know Facebook is using that model to deploy their changes world-wide). This also works quite well with CI idea – you can easily do CI on your main branch, and disable failing features without a need to rollback or quick-fix.

However this Toggling technique (like, well, any other) doesn’t suit all cases. It won’t do for bugfixes; It doesn’t fit small changes, or a patch that alters all occurrences of some method; it’s an overkill for (relatively) small SaS model where releases are easy to fix or rollback. So… Be advised, use wisely. And if you do, don’t forget to remove togglers when the appropriate code reaches stable-prod-level state – otherwise you eventually get yourself another ugly mess of no-one-remembers-why rules.

FTR there’s also this “branch by abstraction” idea – but I think it’s somewhat overcomplicated. I mean , to build an abstraction layer (sort of facade/adapter fusion) and switch internal methods used is alright, but why not just doing those method changes in branches? Although the perspective I see it from is a SaS, and rather tiny one – so there might be valid cases for this one, too.