Saturday, December 5, 2015

TCP 102: Sniffing TCP Traffic

In part 1, we went over the TCP handshake and some basic structure on how a client and a server will communicate with each other over a network. In this section I will be reviewing the structure of an actual TCP packet, and going over how we sniff this traffic on our local interface.

The Structure of a TCP/IP

You may have heard of TCP sometimes referred to TCP/IP. This is due to TCP only being one part (or one frame) of what allows you to talk to other systems over an information network. Without IP (Internet Protocol) TCP would be an incomplete connection. Networks would get confused as to where they're supposed to route your connection. What makes up a TCP packet are actually two layers of encapsulation. There are two 'headers': The IP Header and the TCP Header. To get technical for a second with something not many people care about, TCP is part of the Protocol Data Unit (PDU) within IP. You could also say a UDP packet is the same - it lies in the PDU of IP. 

Imagine that TCP is layered inside of IP, like this:

The IP Header

It's easiest to imagine TCP/IP headers like a grid. You have 32 bits across as columns (or 4 bytes), and 5 rows of these (making up 20 bytes of information total). It's much easier to imagine in a visual:
Don't get overwhelmed with all the bells and whistles in there. Stick to what is familiar to help you understand it: The source and destination address (at offset 12 and 16 respectively).
A quick refresher on Bits and Bytes: 
You may or may not know that IPv4 addresses can only go up to 255 per octet. For example: If we changed it to we would get an error and no connection could be established, because that type of addressing doesn't exist. You would get an overflow in an 8-bit unsigned and your interfacing module (like socket) would toss out an exception.

If we were a programmer from 1979, how would we get the source and destination address for an IPv4 address?

By starting at offset 12 in our array and grabbing 1 byte at a time for 4 bytes to get the source, and the same at offset 16 for our destination! I will provide an example below.

Ping contains no PDU (as in no TCP or UDP header). It just uses pure IP. I fired up wireshark again and did a ping to Google's DNS server at

Wireshark has the source address highlighted. This would be at offset 12 in our grid. We see the information: 0a 00 02 0f as our source address. So how is this information populated? It doesn't look like an IP address at all.

First, let's take our source address, You can see it's IP address highlighted in the main window (at No. 3).

Let's convert each octet of to binary first:
10:   0000 1010
0:     0000 0000
2:     0000 0010
15:   0000 1111
Next convert these binary values into Hex (a base 16 number):
0000 1010:   0A (or 0x0a)
0000 0000:   00 (or 0x00)
0000 0010:   02 (or 0x02)
0000 1111:   0F (or 0x0f)
If it's not making sense now I'll explain it. Decimal (base 10) has a total of 10 numbers ranging from 0 to 9:

Hexadecimal or Hex (base 16) is alphanumeric. It's digits range from 0 to F:
A represents 10 in decimal...
B represents 11 in decimal...
F represents 15 in decimal
The 0000 value is the first part of the byte. Hence why hex has two characters. If 0000 1111 is 0F and represents 15, then 1111 1111 is FF and represents 255, our maximum value.

Now that we have our source address, we can count the next 4 bytes in our packet capture. You can see at the above information this is 08 08 08 08 and would mark our destination address.

In wireshark try clicking on the start of the Internet Protocol Version 4 information and manually counting and mapping out the bytes. It will be displayed in hex, so 1 series of 2 characters separated by spaces represents 1 byte. See if you can align up all the bytes to the IPv4 chart up top to help you get familiar with it.


The TCP Header

Hopefully now you have an idea on how the headers are constructed. Below is a TCP header:

You can see there is no addressing information contained in a TCP header. Just port information. Again if none of this information is making sense stick to what is familiar: The Source Port, Destination Port and Flags. If you remember from part 1 (and did the homework), the flags are what determine what type of TCP packet is being sent or received (like SYN or ACK).

You'll see our source and destination ports are 2 bytes, or 16 bit unsigned values. Which give a range of numbers from 0 to 65,535. If you haven't noticed, we can only have a maximum port value of 65535. :) Now you know why. If we went over that amount we'd throw an exception because the value would overflow and no longer make sense. It would turn into a negative number. A list of well-known ports can be found here.

Coding a Sniffer

Let's put all this new information to use and build a sniffer for our local interface. The code to do this is below:

Before this can run, you need to set your interface to promiscuous mode: 
ifconfig eth0 promisc 
or ifconfig interface promisc

This code may look daunting at first. But let's break it down one by one.

On line 6, you'll notice that we have a new type of family and socket used for our socket creation. PF_PACKET and SOCK_RAW, as well as an additional argument ntohs(0x0800).

This is self explanatory. Capture everything in a raw format.
Oddly enough, there is no Python documentation on this. PF means we want to capture things from the Protocol Family. AF_INET for example would mean Address Family, Internet Protocol Version 4. By specifying PF_PACKET we are telling Python we want to manipulate the packets at the lowest level we can. Note that this only works on Linux. It was introduced in Python 2.0.
We are telling python to convert 16-bit numbers (like port info) into host-byte order. Since we are working low enough level we need to worry about Big Endian and Little Endian. Or the order the bytes stream into the interface, to put it plainly. Our argument 0x0800 is telling socket we only are interested in ETH_P_IP (Internet Protocol Packets). To see what the heck I'm talking about, run: nano /usr/include/linux/if_ether.h and look for the "ETH_P_IP" or 0x0800 entry: 
note the entry for ARP. Might be important later :D

Our next section sets a variable pkt, which receives a maximum 2048 bytes from the socket. 
Before we move beyond line 10 in our code uncomment line 11 (prints pkt in raw format), and comment out every line below line 11. Run the program and ping

This will print out our packet data in raw format. As you can see pkt returns a tuple data type.
Interacting with them works similar to an array, but we can slice into specific parts of specific array addresses. 

In line 13 we do exactly that. We are going to the first section of our tuple, where all of our packed C-type data is, and slicing bytes 14 thru 34 out into a variable: ipheader.

Now we need to unpack this data on line 14. This is where our knowledge of the byte-level structure of the IP header becomes super important.

We need to break our argument, !8sB3s4s4s down piece by piece for it to make sense.

First up is the "!" character:
This specifies the endianness. In this case we set it to network. You can see a list of endianness we can set with this chart here. The chart that maps C types to Python data types is here.
Next is 8s:
This specifies we want to read in the next 8 bytes as a string. If you skip up to the IPv4 header chart, this would be the first 2 lines - or first 8 bytes. In our split array we assign this to address 0 in our ip_hdr array.
This specifies we want to read in the next byte as an unsigned char, or an integer in python. If you need a road map of where we're out now in our IPv4 header, we are at TTL. We assign this to address 1 in our array.
This specifies we want the next 3 chars to be assigned to a string. In our IPv4 header this would address the protocol and header checksum into a single string and set it to address 2 in our array.
You may have guessed now this is the Source Address. We take the next 4 bytes (chars in C) and assign it to address 3 in our ip_hdr array.
And if you look at the last 4 of the 20 bytes (20-4 = 16, or offset 16) for our IPv4 header - that's right. It's the Destination Address! We set these next 4 bytes (chars) to address 4 in our ip_hdr array.
Then we print some stuff and now we are down to line 16. Remember from the prvious section we set our TTL to address 1? Well we can print it out here with ip_hdr[1].

You might be dreading this next line because we have to deal with more C types, but that's not the case. On lines 17 and 18 we do the same thing: convert some information into an IP address.

Remember we set the ! flag in struct object? This sets it to network. You could read ntoa as Network to Address. We are taking a packed IP and converting it to the standard dot notation for an IPv4 address that we're familiar with. In this case we set ip_hdr[3] as our Source Address and ip_hdr[4] as our Destination Address. If we pass them in as arguments, we can print them as a string. 
And Bob's yer Uncle. Now we've unpacked information about the IPv4 header. Let's do the TCP header next.


Unpacking the TCP Header:

Our interface is pretty much sending a stream of data to us. We know right after the IPv4 header, the TCP header hits our interface. We can unpack it in the same way we did the IPv4 header.

I'll walk through how we got the Source and Destination Ports first out of "!HH9ss6s":
!:  Network flag, you know, since we're working with network stuff.
H:  In a TCP header the Source Port comes first. This takes the next two bytes (which are unsigned shorts) and sets them to a Python integer. We set this to address 0 in our tcp_hdr array.
H:  The next series of 2 bytes in offset 0 of our TCP header is the Destination Port. This is also an unsigned short, so we assign it to an integer value in Python. We set this to address 1 in our tcp_hdr array.
9s:  We take the next 9 bytes and set them to a string type in Python. This would be the sequence and acknowledgement numbers, as well as the "offset" and "reserved" bytes. This gets set to address 2 in our tcp_hdr array.
s:  This one takes the next byte and assigns it to a string type in python. In this case we are taking our TCP flags and setting as a string and storing it at address 3 in our tcp_hdr array.
6s:  This sets the rest of the TCP header to address 4 in our tcp_hdr array. We don't care about them for now.

Now we just print out the source and destination ports on line 24 and 25.

On line 26, we convert the flags information (which got unpacked as binary) into a hexadecimal amount, so it makes sense to us.

Doing some Homework:

Phew. That was a lot of data to digest. Don't give up here though. I recommend practicing with struct: See if you can unpack the entire IPv4 header and TCP header by each of their respective items as outlined in their respective charts and print them.

As always, if you have any questions - please ask! You can PM me @ /u/Brave_Little_Roaster on Reddit, or comment here.


  1. This is very helpful! I do have a question though about the sniffer script. Since the ICMP packets don't have source or destination ports in them, how does the sniffer correctly extract them? It looks like the code should only be used for TCP headers, and I believe the ICMP packets don't have TCP headers.

    1. Ping actually will have a TCP header if it's large enough. Ping 60000 bytes at something. That being said there is no TCP header so when you scrape the data it's going to fill it with junk. Essentially after the IPv4 header the script will unpack the ICMP echo request as a TCP header and display that random information, so you'll see random port numbers. Obviously there is no port information with TCP ;)

      If you capture with wireshark you'll see the raw hex. Try copying that data over and extracting it with struct. You should see the exact same numbers displayed on the sniffer as the unpacked data from the wireshark capture.