Git hooks

Git hooks are lovely. Consider automated syntax check before committing changes:

  • git config –global init.templatedir ‘~/.git-templates’
  • mkdir -p ~/.git-templates/hooks
  • vi ~/git-pre-commit-hook.sh and put, for instance, following:


#!/bin/bash

  retcode=0
  for f in $( git diff --cached --name-status|awk '$1 != "R" { print $2 }' ); do
    echo "Veryfying $f..."
    filename=$(basename "$f")
    extension="${filename##*.}"
    if [ "$extension" = "pl" ] || [ "$extension" = "pm" ]; then
      perl -c $f
      lineretcode=$?
      if [ $lineretcode != 0 ]; then
        retcode=$lineretcode
      fi
    fi
  done

  if [ $retcode != 0 ]; then
    echo
    echo "Pre-commit validation failed. Please fix issues or run 'git commit --no-verify' if you're certain in skipping validation step."
    exit 1
  fi

  exit 0

  • chmod a+x ~/git-pre-commit-hook.sh
  • ln -sn ~/git-pre-commit-hook.sh ~/.git-templates/hooks/pre-commit (point of this to have it centralised – on next step global template is copied to repository, so if it’s a symlink – it’s easier to adjust for all later)
  • cd to your repository and run git init there – it’d re-instantiate the repository with copying global hook templates there

That’s it – now you have yourself neat little safeguard. There are better techniques of course – flymake, for one, or some IDE build/check configuration. That works, too – but gets too tricky when you have to patch the code on remote instances. In my case, I have to edit files remotely – and while Emacs has no problem with that (Sublime has plugin for that, too, and you can always add some fuse SSH mount), my flymake settings suck, and I’m not keen to dive into a bloodthirsty piragna pool of its remote execution configuration.

So… git hooks it is.

Basic DNS records list

This is a “you learn better when you write it down” sort of post. Never actually got into DNS record types – as a lot of things I’ve missed, there was just no need and I wasn’t curious enough. Although curiosity without regular application of that knowledge is rather pointless – “you soon will forget the tune that you play”, if you play it just once or twice.

That said, I’m gonna be needing this knowledge soon (I presume), so I thought I better do me a hint page (a “crib sheet”, as the dictionary suggests).

  •  A record – “Address”, a connection of a name to an IP address like, for instance, “example.com. IN A 69.9.64.10” – where IN is for the Internet, i.e. “Internet Address…” Wildcards could be used for “all subdomains”
  • AAAA – “four times the size”, A-address for IPV6 addresses (see a note on IPV6 below)
  • CNAME – Canonical Name, specifies an alias for existing A record, like “subdomain.example.com CNAME example.com“. Useful to make sure you only have one IP address in A record, and others rely on A name – so if IP changes, it’s one place you have to change it at. Note: do not use CNAME aliases in MX records.
  • MX – Mail eXchange, specifies which server serves zone’s mail exchange purposes – like, for instance, “mydomain.com IN MX 0 mydomain.com.“; final dot is important, 0 is for priority: ther could be multiple MX records for the zone, and they processed in priority order (the lower the number the higher the priority). Same-priority records are processes in random order. Right-side name should be an A record.
  • PTR – specify pointer for a reverse DNS lookup, required to validate hostname identity in some cases – “16.3.0.122.in-addr.arpa. IN PTR name.net” (note that IP of name.net is 122.0.3.16)
  • NS – Name Server, specifies a (list of) authoritative DNS server for the domain, for instance: “example.com. IN NS ns1.live.secure.com“. This should be specified at authoritative server as well.
  • SOA – State Of Authority, an important record with zone’s name server details – “authoritative information about an Internet domain, the email of the domain administrator, the domain serial number, and several timers relating to refreshing the zone“. Example:  mydomain.com. 14400 IN SOA ns.mynameserver.com. root.ns.mynameserver.com. (
    2004123001 ; Serial number
    86000 ; Refresh rate in seconds
    7200 ; Update Retry in seconds
    3600000 ; Expiry in seconds
    600 ; minimum in seconds )
  • SRV – an option to specify a server for a Service, like “_http._tcp.example.com. IN SRV 0 5 80 www.example.com.” – here’s the service name (_http), priority (0), weight (5) for services with the same priority, and port (80) for the service.
  • NAPTR – recent and complex regexp-based name resolution I’m not keen to into.
  • There’s MUCH MORE of this crap, hope I won’t need to ever dig that deep
  • There’s also a number of decentralized DNS initiatives

Oh, and on IPV6:

  • it’s 128-bit (IPV4 is 32)
  • it’s recorded in hex numbers, 8 quads
  • it has following structure:
2001:0db8:3c4d:0015:0000:0000:abcd:ef12
______________|____|___________________
global prefix subnet  Interface ID
  • local address is 0000:0000:0000:0000:0000:0000:0000:0001
  • and IPV4 record in that case would look like 0000:0000:0000:0000:0000:0000:192.168.1.25
  • zeroes could be omitted: ::1 or ::192.168.1.25
  • to make sure address is shortened correctly, use ipv6calc util: ipv6calc –in ipv6addr –out ipv6addr –printuncompressed ::1

Retrieving multi-line sequences from text files

Had some free time and had a need to parse out a number if similar (but not the same) blocks from a log file. There are tools for that – it could be done with a mixture of grep, sed, bash and some arcane magic – but I’m afraid to find the right toolset, learn required keys for each and experiment with their values and combination would take me longer than to just write me a tool. And it was another opportunity to write a few lines in Python, which I don’t do often enough. And I do love text processing.

So I made me a neat little tool that essentially does one simple thing – starts printing the input stream when some (start) trigger is found there, and stops when another (end) occurs. There are some additions – like, print some lines before/after the block, print couns, unique blocks only etc. – but those are glitter, mostly.

Available in a BitBucket repository.

WordPress plugins etc.

I’ve been (quite subconsciously) using WordPress for quite some time now, mostly for my alcoholic beverages blog (it’s in Russian, sorry). Subconsciously because it was the first option GoDaddy offered me a “automated install” blogging platform – and also because I’ve heard the name a number of moons back, so it should’ve been well documented and supported at that point. It’s on PHP, but who cares. I’ve spent years writing PHP code.

So I had this problem: my articles all have a rating (I use Author Post Ratings plugin by Philip Newcomer), but it’s not possible to see all the high-rated articles, nor it is possible to order articles by rating within a category – and this feature made a lot of sense, because when you go to a site with a bunch of reviews, you usually look for the best stuff within some category.

So I gave it a thought and just went and added required functionality – now it’s there on bitbucket, https://bitbucket.org/hydralien/author-post-ratings/src

Turned out writing WordPress plugins is a no-brainer if you need something simple (I started with a post-by-rating list) – you just add directory, create a PHP file with a proper header, and you’re done. Well, after you add your functionality, that is. WordPress has some lovely documentation on that.

It gets trickier if you need to change “internal behavior” – such as category sort order – but documentation helps there as well, there are filter hooks for that.

I guess this is worth a slogan – something like “Better drinking with no hassle” or “Drinking better just got easier”. Or whatever.

SECR-2013 notes at the end of 2013

Haven’t updated this place for a while – need to get back on this new habit, really. Well, at least
I hope I do so in upcoming year (should be a big one anyways). So a quick update just before I tear the old calendar down from the wall (and I pity that, it’s the Futurama genuine 2013 calendar) – some records that date back as far as late October, some points I squeezed from that SECR conference I attended back then. These are quick and without any verbose explanation – rather “FTR” stuff to keep.

OPs/delivery:

  • monitor user delivery/connection time, detect slow (comparing to others or expected), direct them to closer/… node/instance by default. Meaning, you have logs, so you have the time of requests processing. So if you have some requests of one kind taking (under circumstances – be it another location or another browser or something) – taking longer that requests of the same king in other set of circumstances, you might need to adjust delivery rules – redirect traffic, test under different environment etc.
  • config/(code?) deployment through the Dropbox – as simple as it sounds, “simple parts” (I’m thinking configs, but might be updates as well) distribution via Dropbox.

Initial project stage:

  • “visual brief” – ask the basic associations, like “fast”, “pretty”, … – give some appropriate images to pick from, iterate. Like, ask customer what impression his product is aiming to project. Say the answer is “paramount”. Then suggest few images – mountain, the Sun, the Earth and the Moon, big teacher and small student – and ask customer to pick one (or few). And then iterate with styles and colors and other ideas.
  • moodboards (MoodShare…) – pick lots of stuff on the screen, let customer select, SCAMPER (http://www.mindtools.com/pages/article/newCT_02.htm)

CI

  • try test using DB/ssh connect (separate config) to run local changes against test instance remotely
  • togglers: big-feature switches, allow to disable/enable particular features separately, config- (or environment-) controlled way (triggers in cookies, ENV, config, URL etc.
  • same tests on CI and local, local ran for feature-related validations (branches), CI for default
  • think performance tests

KPI (http://management.about.com/cs/generalmanagement/a/keyperfindic.htm):

  • measure what you want to adjust (meaning parameters selected fro KPI estimation)

Dev points:

  • make user feedback easy (put feedback interface at a clearly visible place, make it simple)
  • get a cool team (people that fit together (this is important), motivated (not just money, also interested in what they’re doing), productive)
  • make internal communication easy (chat, video, whatever – to let teams or team members connect without any hassle)

QA:

  • ask testers to write test cases descriptions right after ticket is created (confirmed, put to backlog) and discuss/adjust along the way. I think this one is really beneficial if used the right way, although no personal experience here yet.

Well, that’s it for now – and probably for this year, too. See me (for I think I’m the only reader – or at least a writer – of this blog) in 2014!

Revelations

# of bits in a number N is floor(log N) + 1

Things like this make me tear off patches of imaginary hair from my bald skull in desperate frustration of “what the hell did I waste my time on in university?” rethorics. For whatever I learned there has nothing to do with CS. Nothing to do with anything, in fact – even my drinking habits are long different from those days.

Perl “inheritance”

This is just a random batch of records to keep around.

use base: load modules, add to @ISA (which is a list of “predecessors”)

one can add to @ISA, modules should be pre-used; there’s ->isa() method to check.

NEXT calls next in chain. So if you have D->ISA(A, C) and A->ISA(B), and then each class has sub n { shift->NEXT::n }, n would be called in each class in this chain: D->A->B->C (no matter A and B are unaware of C). FTR SUPER would call first class in @ISA, period.

use parent modules does less than use base, but essentially the same (at user level). Big difference: it tries to load packages from files, so if you have them named differently or (which is “named differently” as well) specified in same class, you need to do use parent -norequire, modules

use is compile-time, require is run-time.

Classes have autoload (when method is unavailable):

  1. our $AUTOLOAD;
  2. sub AUTOLOAD {…}

it conflicts with DESTROY – autoload handles destroy if former is defined. Also errors, thrown in destroy, would not (in most cases) stop the program.

You can bless any kind of data structures (not only hash)

On methods resolution in inheritance: Perl uses DFS, so lookup goes down the one package line, and switches to the next class only when done with previous. MRO (Method Resolution Order) C3 does more like breadth-first, used like use mro; or use mro ‘c3’; (use mro ‘dfs’; for Perl model). Can be redefined for each class.

 

SQL joins

Now this is something that keeps falling out of my head. Took me a good deal of time to remember the simplest thing or left/right joins idea, and it’s just moments ago that I hard-nailed the fact that “inner” and “outer” are just help-words, a decor (here’s the equivalence table, thx this StackOverflow article):

A LEFT JOIN B            A LEFT OUTER JOIN B
A RIGHT JOIN B           A RIGHT OUTER JOIN B
A FULL JOIN B            A FULL OUTER JOIN B
A INNER JOIN B           A JOIN B

 and now to fix the rest (thx another StackOverflow article). Note: if you're using MySQL, mind that it doesn't have full joins - although you can organize that as a UNION of left and right joins.

SQL join types

Also, while we’re at it – some normal forms memo:

1NF: each table field should contain only “atomic” values, i.e. single (not combined) values from  some finite domain – like, names. Domain might grow (new names appear), but each value (each name) should be an independent atom. Also each record, each occurrence or that field should contain a single value (of that domain) only

2NF: 1NF + table should not have duplicated records for a whole primary key. For instance, table with product, manufacturer, manufacturer_address is wrong – address would be repeated many times for no reason as it’s tied to manufacturer only (and the key is product+manufacturer). Should be two tables – manufacturer, address and product, manufacturer.

3NF: 2NF + all non-key attributes (fields) should be directly related to the key, and only to the key. For instance, if in the above (2NF) example table with product and manufacturer fields would also have product_factory and product_factory_address field – that would be wrong, because factory address is not related to the key (product+manufacturer) and should be located separately as product_factory, product_factory address table.

And I still fail to construct a proper phrase for what is “relational database”. This is bloody mess, all those descriptions that are there. I  get lost on the third word, no matter how many times I try. From what I can tell, it’s a database with a key->values relation, where the columns bear a “related” knowledge – unlike NoSQLs that have quite loose sense of what is hidden under the key (like, a user session under a hash key). Not sure it’s correct though =)

Heap algorithm in Perl

Another day, another study. Again, nothing fancy – merely a FTR, heap (or minheap, with root containing the minimum) implementation in Perl. extract_top name is to keep main interface similar to existing CPAN modules (like Heap::Simple)

package Heap::Hydra;

sub new {
   return bless {_heap => [undef,], _heap_size => 0}, shift;
}

sub getParent {
   my $self = shift;
   my $child_pos = shift;

   my $parent_pos = int($child_pos * 0.5) || 1;
   return $self->{_heap}->[$parent_pos], $parent_pos;
}

sub getMinKid {
   my $self = shift;
   my $parent_pos = shift;

   my $child_pos = $parent_pos * 2;
   return undef if $child_pos >= scalar @{$self->{_heap}};

   my $min_child = $self->{_heap}->[$child_pos];
   if (defined $self->{_heap}->[$child_pos + 1] && $self->{_heap}->[$child_pos + 1] < $min_child) {      $child_pos += 1;      $min_child = $self->{_heap}->[$child_pos];
   }
   return $min_child, $child_pos;
}

sub extract_top {
   my $self = shift;
   my $top_pos = shift || 1;

   my ($new_top, $new_top_pos) = $self->getMinKid($top_pos);
   if (!defined $new_top) {
     return splice @{$self->{_heap}}, $top_pos, 1;
   }

   my $prev_top = $self->{_heap}->[$top_pos];
   $self->{_heap}->[$top_pos] = $self->extract_top($new_top_pos);
   $self->{_heap_size}--;

   return $prev_top;
}

sub insert {
   my $self = shift;
   my $child = shift;

   $self->{_heap_size}++;
   $self->{_heap}->[$self->{_heap_size}] = $child;

   my $child_pos = $self->{_heap_size};
   my ($parent, $parent_pos) = $self->getParent($child_pos);
   while ($parent > $child) {
     $self->{_heap}->[$parent_pos] = $child;
     $self->{_heap}->[$child_pos] = $parent;
     $child_pos = $parent_pos;

     ($parent, $parent_pos) = $self->getParent($child_pos);
   }

   return $child_pos;
}

Greatest prime divisor

Just for the record – custom algorithm for searching greatest prime divisor of a number, O(n) worst-case (O(sqrt(n)) taken by prime detection):

def prime(num, get_number=False):
  for divisor in range(2, int(math.sqrt(num)) + 1):
    if num % divisor == 0:
      return False
  return num if get_number else True

def hydra_gpd(num, get_number=None):
  sequence = range(2, int(math.sqrt(num)) + 1)
  gpd = 1
  for divisor in sequence:
    divided = int(num / divisor)
    if num % divisor == 0:
      if prime(divisor) and divisor > gpd:
        gpd = divisor
      if prime(divided) and divided > gpd:
        gpd = divided
      recursive_gpd = hydra_gpd(divided)
      if recursive_gpd > gpd:
        gpd = recursive_gpd
      return gpd
  return gpd

Nothing too special, just to keep it around.