Tricks and Tweaks of Open-Source World: 2011

Tuesday, December 20, 2011

Git patches for subtree

Git is arguably the most useful and popular tool for source control. I have been using git for more than 3 years now and one feature I like the most is 'subtree'. Its a very useful feature when you work on large project which includes multiple projects from different people and you can merge the changes from those remote the projects from one repository.

I have used subtree for pulling changes from QEMU releases into MARSS . Sometimes when we want to make change to QEMU and send patches upstream, then using 'git format-patch' doesn't work by default because the patch is created with 'marss' has top directory. As shown in the 'git diff' output below:

diff --git a/qemu/target-i386/cpu.h b/qemu/target-i386/cpu.h
index 7f2103f..7047115 100644
--- a/qemu/target-i386/cpu.h
+++ b/qemu/target-i386/cpu.h
@@ -636,6 +636,7 @@ typedef struct CPUX86State {
 #ifdef MARSS_QEMU
     target_ulong cr[8]; /* NOTE: cr1 is unused */
     uint8_t handle_interrupt; /* Simulater managed int enable flag */
+    uint64_t simpoint_decr;
 #else
     target_ulong cr[5]; /* NOTE: cr1 is unused */
 #endif

As highlighted lines 3 and 4 the diff starts with 'qemu' folder. If we want to submit this patch to qemu mainline then it wont work as it should not start with 'qemu'. To solve this issue git diff provides a command line flag --relative=[path]. Now with this flag we can tell the git to generate 'diff' with relative folder. For example,

$ git diff --relative=qemu/

will show the diff as below:

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 7f2103f..7047115 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -636,6 +636,7 @@ typedef struct CPUX86State {
 #ifdef MARSS_QEMU
     target_ulong cr[8]; /* NOTE: cr1 is unused */
     uint8_t handle_interrupt; /* Simulater managed int enable flag */
+    uint64_t simpoint_decr;
 #else
     target_ulong cr[5]; /* NOTE: cr1 is unused */
 #endif

So use '--relative' to generate patches to submit upstream. Bonus Tip: You can also use '--relative' with 'git format-patch'.

Thursday, September 8, 2011

Awesome 'awk' : variables and conditions

I was working on generating some graphs for my research work and one of the data I needed to collect was from 'tcpdump' output. I had collected all the TCP requests between two simulated VMs using 'tcpdump' where one VM was running LAMP server and other was generating requests to the server. The output of the dump was like below.

18:04:02.898609 IP foxhound.cs.binghamton.edu.40080 > foxhound.cs.binghamton.edu.50847: 
18:04:02.898636 IP foxhound.cs.binghamton.edu.50847 > foxhound.cs.binghamton.edu.40080: 
18:04:03.121414 IP foxhound.cs.binghamton.edu.40080 > foxhound.cs.binghamton.edu.50847: 
18:04:03.121439 IP foxhound.cs.binghamton.edu.50847 > foxhound.cs.binghamton.edu.40080:

From that I needed to collect traffic (number of TCP packets per minute) between server and client. Each line was using same format that has time stamp at the start. The challange was to count the number of TCP packet received/sent by the server in a minute. Ofcourse it can be done in python by reading the file, and iterate over each line, split the line data to get the current time and increment the counter for that minute. But I thought that python is an overkill for this simple task. So I decided to use awesome 'awk'.

#!/bin/bash

awk '
BEGIN {print "Time, Requests"; hr=0; min=-1; count=0} \
   { split($1,a,":"); \
     if (a[2] != min) {\
        if(min >=0) {print hr ":" min ", " count;} \
       min=a[2]; count=0; hr=a[1] \
     } \
     count++; \
   } \
' $1

Within couple of minutes I came up with above bash script that uses variables within awk to store data over multiple lines and print the output at every minute interval using condition statements. The code shows how easy it is to declare and initialize variables in 'BEGIN' statement. Also by using conditional if statement you can easily manipulate the output of your script. Next time when you run into similar task that requires some basic calculation from text files, let the 'awk' be your swiss knife.

Bonus: If you are using 'awk' in a bash script and want to pass a variable to 'awk' then use -v command line option to declare and initialize a variable and use it within your 'awk' script.

Wednesday, February 9, 2011

chmod to copy user permission to group

Issue I had was simple,
- wanted to give read/write/execute permission to group
- but did not want to end up with 'executable' .cpp and .h files

If I give 'chmod g+rwx -R dir' then all users belong to my group will be able to access these files but it looked ugly to me. What I wanted was to copy user permissions to group so all users of my group can have same permission as me.

Solution was so simple (now I think that why I didn't check man pages earlier..)

chmod g+u -R dir

Thats it and its done.

Wednesday, January 12, 2011

Setting up github ssh access behind proxy servers

At work to access/update public repo of Marss I finally setup a rather complicated proxy setting for Git which finally worked :).

I followed the steps from this tutorial and it worked without a glitch. The only change I did is instead of using 'connect.c' file provided I used standard 'nc' for basic proxy support. So my 'socks-gw' file looks like following:

#!/bin/sh
# File ~/bin/socks-gw
# Connect a SOCKS 5 proxy using 'nc'
nc -X 5 -x proxy.server:1080 $@

Monday, January 3, 2011

Good Introductory Book on Issues of Parallel Programming

Found out a link of this book from Reddit Programming, called a 'Is Parallel Programming Hard, And, If So, What Can You Do About It?'.

I skimmed through this book and trust me if you are into system programming you'll get hooked. I liked the way Paul (author of this book) has explained key obstacles of hardware in parallel programming. Its not too detailed but he has explained the basic issues in a very simple language.

Here is a link to author's official announcement of this book.