Awk comes into its own when you need to do something just a little more sophisticated than standard text editing but don't have the time nor inclination to use Perl or another programming language.
A colleague needed to find the length of a particular line in a file. He discovered that using 'wc' gave the wrong result (as in “head -2 filename | tail -1 | wc -c”). Here's what he came up with instead. Note the parentheses…
cat filename | awk '{ if ( NR == 2 ) {print length($0); exit; } } '
This uses Awk's ability to do arithmetic across multiple input lines to produce a count, total and average file size for a directory or a supplied pattern. It's a usefull tool for quick 'n' dirty system admin…
echo "Harvey's file counter and sizer"
echo "-------------------------------"
if [[ -z $1 ]]
then
echo "Sizing entire directory"
else
echo "Sizing files for pattern [$1]"
fi
ls -l >/tmp/fsz.$$_1
# -------------------------------
# Remove any directory entries...
# -------------------------------
grep -v ^total /tmp/fsz.$$_1 | grep -v ^d >/tmp/fsz.$$
rm /tmp/fsz.$$_1
# ------------------------
# Set up the search job...
# ------------------------
if [[ -z $1 ]]
then
cat /tmp/fsz.$$ | awk '{s += $5}; END {printf "\nThere are %d files matching pattern\nAverage size is %f\nTotal size is %f\n", NR, s/NR, s}'
else
grep $1 /tmp/fsz.$$ | awk '{s += $5}; END {printf "\nThere are %d files matching pattern\nAverage size is %f\nTotal size is %f\n", NR, s/NR, s}'
fi
rm /tmp/fsz.$$
I couldn't work out why this wouldn't work when I ran it using awk (as it worked fine on another machine). It turned out that it would perform admirably if I ran it using nawk instead. It's worth trying this out on your own machine and seeing what happens…
nawk '{ if(substr($0,405,2)=="LS") print $0 }' sourcefile.dat | head -10000 > targetfile.dat
Using Awk at the end of a pipe can lead to unexpected behaviour, because Awk, unlike most Unix software, does not flush its buffers automatically. It is possible to create a script that appears to work normally, when output is sent to the screen but seems to freeze, when the output is sent to a file. Thus code like this…
cat $PatternList | while read Pattern
do
Result=$(cat AgentList.txt | grep -c "${Pattern}")
let TotalRobots+=$Result
echo "\"${Pattern}\",${Result},${TotalRobots}"
done | awk 'BEGIN { FS = "," }; {if($2 != 0) print $1 "," $2 "," $3 }'
…appears faulty if the output is sent to a file and the file examined while running, such as with tail -f. If you then test the script by letting it write directly to the calling terminal, the buffers are flushed as you would expect, so the output rolls out. Considerable time can be wasted in investigation of the 'defect'.
It turns out, however, that the solution is simple: add a null call to the system() function, at the beginning of the Awk loop, which forces a flush of all buffers in the current process…
awk 'BEGIN { FS = "," }; {system(""); if($2 != 0) print $1 "," $2 "," $3 }'
…and Awk will output to the file as each pass through the loop completes.
It's a common requirement to take a file containing a non-unique key field and a numeric field, then total the numbers for each group of keys. Assuming that the file contains lines of the sort…
keyfield,number
…then the following script will output lines of the form keyfield, sub-total…
awk -F, '{ if (keyfield!="" & keyfield!=$1) {print subtotal","keyfield; subtotal=0;} subtotal=subtotal+$2; keyfield=$1; }'
This is very useful one liner. It joins as many fields as required, leaving only two fields in the output.
awk 'BEGIN {FS = ","; OFS = ","} {for( i = 2; i < NF; i++) $1 = $1 "%C2" $i; print $1, $NF }'
The initial value of i could be varied to leave fields at the beginning of the line unmerged, which would require that the concatenation ($1 = $1 ”%C2” $i;) and the print statement be changed.