January 27, 2012

Okay awk is slow...

(The performance data were updated in a more recent post.)

Here is what I got from counting occurrences of substring in a string

bash-3.2$ time -p for i in {1..10}; do\
                     gawk -F 的 '{s=s+NF-1}END{print FS"     "s}'\
                          /Volumes/ramdisk/lotsoftext.txt;\
                  done > /dev/null
real 6.02
user 5.60
sys 0.36
bash-3.2$ time -p for i in {1..10}; do\
                     ./test.pl 的 < /Volumes/ramdisk/lotsoftext.txt;\
                  done > /dev/null
real 3.63
user 3.42
sys 0.18

And… here is the test.pl:

#!/usr/bin/env perl -w
use strict;
my $query=$ARGV[0];
my $size = 0;
while (<STDIN>) {
   $size++ while $_ =~ /$query/g;
}
print "$query       $size\n";
exit;

I thought gawk script is still a thing of beauty, but well.

Posted by mjhsieh at January 27, 2012 07:21 AM