Datareign

Awk comes into its own when you need to do something just a little more sophisticated than standard text editing but don't have the time nor inclination to use Perl or another programming language.

Finding the Length of a Line

A colleague needed to find the length of a particular line in a file. He discovered that using 'wc' gave the wrong result (as in “head -2 filename | tail -1 | wc -c”). Here's what he came up with instead. Note the parentheses…

 cat filename | awk '{ if ( NR == 2 ) {print length($0); exit; } } '

Sizing a Directory

This uses Awk's ability to do arithmetic across multiple input lines to produce a count, total and average file size for a directory or a supplied pattern. It's a usefull tool for quick 'n' dirty system admin…

 echo "Harvey's file counter and sizer"
 echo "-------------------------------"
 if [[ -z $1 ]]
 then
    echo "Sizing entire directory"
 else
    echo "Sizing files for pattern [$1]"
 fi
 
 ls -l >/tmp/fsz.$$_1
 
 # -------------------------------
 # Remove any directory entries...
 # -------------------------------
 grep -v ^total /tmp/fsz.$$_1 | grep -v ^d >/tmp/fsz.$$
 rm /tmp/fsz.$$_1
 # ------------------------
 # Set up the search job...
 # ------------------------
 if [[ -z $1 ]]
 then
    cat /tmp/fsz.$$ | awk '{s += $5}; END {printf "\nThere are %d files matching pattern\nAverage size is %f\nTotal size is %f\n", NR, s/NR, s}'
 else
    grep $1 /tmp/fsz.$$ | awk '{s += $5}; END {printf "\nThere are %d files matching pattern\nAverage size is %f\nTotal size is %f\n", NR, s/NR, s}'
 fi
 rm /tmp/fsz.$$

Don't use awk - Use nawk!

I couldn't work out why this wouldn't work when I ran it using awk (as it worked fine on another machine). It turned out that it would perform admirably if I ran it using nawk instead. It's worth trying this out on your own machine and seeing what happens…

 nawk '{ if(substr($0,405,2)=="LS") print $0 }' sourcefile.dat | head -10000 > targetfile.dat

Flushing Buffers and Disappearing Output

Using Awk at the end of a pipe can lead to unexpected behaviour, because Awk, unlike most Unix software, does not flush its buffers automatically. It is possible to create a script that appears to work normally, when output is sent to the screen but seems to freeze, when the output is sent to a file. Thus code like this…

cat $PatternList | while read Pattern
do
  Result=$(cat AgentList.txt | grep -c "${Pattern}")
  let TotalRobots+=$Result
  echo "\"${Pattern}\",${Result},${TotalRobots}"
done | awk 'BEGIN { FS = "," }; {if($2 != 0) print $1 "," $2 "," $3 }'

…appears faulty if the output is sent to a file and the file examined while running, such as with tail -f. If you then test the script by letting it write directly to the calling terminal, the buffers are flushed as you would expect, so the output rolls out. Considerable time can be wasted in investigation of the 'defect'.

It turns out, however, that the solution is simple: add a null call to the system() function, at the beginning of the Awk loop, which forces a flush of all buffers in the current process…

awk 'BEGIN { FS = "," }; {system(""); if($2 != 0) print $1 "," $2 "," $3 }'

…and Awk will output to the file as each pass through the loop completes.

Subtotalling on a field

It's a common requirement to take a file containing a non-unique key field and a numeric field, then total the numbers for each group of keys. Assuming that the file contains lines of the sort…

keyfield,number

…then the following script will output lines of the form keyfield, sub-total

awk -F, '{ if (keyfield!="" & keyfield!=$1) {print subtotal","keyfield; subtotal=0;} subtotal=subtotal+$2; keyfield=$1; }'

Merging a variable number of fields

This is very useful one liner. It joins as many fields as required, leaving only two fields in the output.

awk 'BEGIN {FS = ","; OFS = ","} {for( i = 2; i < NF; i++) $1 = $1 "%C2" $i; print $1, $NF }'

The initial value of i could be varied to leave fields at the beginning of the line unmerged, which would require that the concatenation ($1 = $1 ”%C2” $i;) and the print statement be changed.

Last modified: 2009/12/07 09:52