Thursday, September 8, 2011

Awesome 'awk' : variables and conditions

I was working on generating some graphs for my research work and one of the data I needed to collect was from 'tcpdump' output. I had collected all the TCP requests between two simulated VMs using 'tcpdump' where one VM was running LAMP server and other was generating requests to the server. The output of the dump was like below.
18:04:02.898609 IP > 
18:04:02.898636 IP > 
18:04:03.121414 IP > 
18:04:03.121439 IP > 
From that I needed to collect traffic (number of TCP packets per minute) between server and client. Each line was using same format that has time stamp at the start. The challange was to count the number of TCP packet received/sent by the server in a minute. Ofcourse it can be done in python by reading the file, and iterate over each line, split the line data to get the current time and increment the counter for that minute. But I thought that python is an overkill for this simple task. So I decided to use awesome 'awk'.

awk '
BEGIN {print "Time, Requests"; hr=0; min=-1; count=0} \
   { split($1,a,":"); \
     if (a[2] != min) {\
        if(min >=0) {print hr ":" min ", " count;} \
       min=a[2]; count=0; hr=a[1] \
     } \
     count++; \
   } \
' $1

Within couple of minutes I came up with above bash script that uses variables within awk to store data over multiple lines and print the output at every minute interval using condition statements. The code shows how easy it is to declare and initialize variables in 'BEGIN' statement. Also by using conditional if statement you can easily manipulate the output of your script. Next time when you run into similar task that requires some basic calculation from text files, let the 'awk' be your swiss knife.

Bonus: If you are using 'awk' in a bash script and want to pass a variable to 'awk' then use -v command line option to declare and initialize a variable and use it within your 'awk' script.