What’s a better way to spend a Sunday afternoon than digging into an spec from over forty years ago? Digging into that spec and verifying your TCP packets do conform to it. Spoiler alert: of course they do, but in the name of education… Let’s check.
The actual layout of a TCP packet is described in the header format section of the RFC. Here’s a copy of it for sake of visibility.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |C|E|U|A|P|R|S|F| | | Offset| Rsrvd |W|C|R|C|S|S|Y|I| Window | | | |R|E|G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | [Options] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | : : Data : : | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Note that one tick mark represents one bit position.
To understand this layout, we are going to be looking at bits, as marked in the horizontal row, second from the top. Take note that the full width of the diagram amounts to 32 bits, therefore every row is going to be 32 bit wide, so 4 bytes.
For example, if we are going to be looking for a source port, we need to look at bytes from the first one in the packet, bit
0, to bit
15. Source port is 16 bits long (two bytes).
TAKE NOTE: Network protocols are one of very few areas of standards where endianness is (kind of) always guaranteed to be big. That’s the assumption we’re making while parsing this packet!
acquiring the packet
In order to get the packet, I’m gonna need to capture it. I am using Wireshark to capture a random packet, and using its export functionality to save it to my disk. I’m going to save it in a raw binary format, simply to have more work (fun) decoding it. In order to do that, I have highlighted the entire frame in the frame window, and used the
File->Export Packet Bytes functionality to save raw bits onto my disk.
ASIDE: (If you really want to have a frustrating time, I suggest you try r-clicking the packet and copying
...as raw binaryand then trying to extract it from your pastebin.) Hint: dip into pastebin libraries in your language of your choice.
To verify my parsing attempts, I am also copying the textual representation, as Wireshark sees it, to check myself. Here it is:
Frame 22: 66 bytes on wire (528 bits), 66 bytes captured (528 bits) on interface en0, id 0 Ethernet II, Src: Apple_be:5b:17 (38:f9:d3:be:5b:17), Dst: AlphaNet_75:36:93 (88:6a:e3:75:36:93) Internet Protocol Version 4, Src: 192.168.0.6, Dst: 220.127.116.11 Transmission Control Protocol, Src Port: 60356, Dst Port: 443, Seq: 1, Ack: 55, Len: 0
Let’s see if I can find some of those fields in my packet!
NOTE: Please notice how the captured packet description as per Wireshark only lists the TCP packet as the 4th line in the textual summary. It will become apparent why later.
looking at the packet on disk
The packet sits clutch on my disk, and is a binary file. First thing I want to check is whether we exported it in its entirety. There are a few ways one can accomplish that. The output from Wireshark below claims we capture 66 bytes. Verify.
cat the output.
$ cat raw_packet_data.bin àj„u6ì8˘”æ[ E 4 @ @M¿® h∑ÎƒªıJ4Sï[wÁÄ MÙ Ùp?h˜È≈
Doesn’t look good. Of course it doesn’t. We’re looking at binary data, and naturally, the utf-8 representation is messed up. Wrong avenue.
$ ls -l ... (output truncated to only include the packet) ... -rw-r--r-- 1 daniel staff 66 Sep 12 21:40 raw_packet_data.bin
As we can see, the size is indeed
For an alternative way to check that, I’m going to use
xxd. If you pass the
-p flag to xxd, it will output the binary file as a stream of hexadecimal digits. That way we can count characters (remember 2 hex digits = 1 byte).
xxd -p raw_packet_data.bin | wc -c 135
Great, we’ve got the count! But something is wrong. We’re counting 135 chars, that divided by two is 67.5. Output of ls promised 66…
xxd -p returns with a newline at the end! We can push the result of it to
echo -n that will echo out the output without.
echo -n `xxd -p raw_packet_data.bin` | wc -c 134
We’re getting closer! Definitely lost the extra byte, but we’re still off. 134 / 2 is still 67. What gives? Let’s inspect the output of
$ xxd -p raw_packet_data.bin 886ae375369338f9d3be5b170800450000340000400040060b4dc0a80006 681206b7ebc401bbf54a3453955b77e7801008004df400000101080a09f4 703f68f7e9c5
There’s our culprit.
xxd is printing the octets on three separate lines.
echo -n was helpful with removing the last newline, but it doesn’t help us with the two newlines at the end of the first and second line. Let’s remove it using
$ xxd -p raw_packet_data.bin | tr -d '\n' 886ae375369338f9d3be5b170800450000340000400040060b4dc0a80006681206b7ebc401bbf54a3453955b77e7801008004df400000101080a09f4703f68f7e9c5%
Voila! Now let’s
wc -c it. Remember, I am expecting to see 132, since that is char count, and each char is half a byte.
$ xxd -p raw_packet_data.bin | tr -d '\n' | wc -c 132
Perfect! I like things aligned so I will use a little trick with
xargs. Just piping the result into it, I get whitespace stripping.
$ xxd -p raw_packet_data.bin | tr -d '\n' | wc -c | xargs 132
That’s what we wanted! I feel like since I’m on this detour talking about various bashisms, why not go all the way? Why do the division in my head if I can make bash do it?
$ echo $((`xxd -p raw_packet_data.bin | tr -d '\n' | wc -c | xargs` / 2)) 66
All is good in the world.
actually parsing the tcp packet
I’ve taken quite a foray into verifying what was already known - the length of 66 bytes. Let’s resume our verification of things that are already known in pursuit of understanding how they work! By actually parsing the packet itself.
NOTE Remember when I mentioned that the TCP packet starts at offset 34? The reason is that the TCP header is prefixed with more packets from the network stack. In my case there are ethernet and ip headers first. How do I know that the TCP packet is starting at byte 34? I sourced this information from Wireshark.
Knowing this, let’s start with a little
python code that will parse out the first section of the TCP header - the source port.
with open('./raw_packet_data.bin', 'rb') as binary_file: data = bytearray(binary_file.read()) packet_offset = 34 * 2 data_length = 36 * 2 port = data.hex()[packet_offset:data_length] print(int(port, 16))
Notice the multiplication of the offsets. Since I converted the binary data into a bytearray, I need to be advancing by an order of two to make sure that instead of traversing it one hex char at a time, I traverse it by a byte.
$ python3 bz.py 60356
That’s the answer we’ve been looking for. As a reminder, here’s the textual output I got from Wireshark for comparison (truncated to only include the TCP header line):
Transmission Control Protocol, Src Port: 60356, Dst Port: 443, Seq: 1, Ack: 55, Len: 0
Let’s try an alternative way of extracting this data. There must be a better way than writing your own scripts? For that, I’m going to use our friend
It’s a pretty simple invocation in xxd. All we need is to specify the
-s flag for start, and
-l for length. Let’s start at byte 34 and read two of them.
$ xxd -p -s 34 -l 2 raw_packet_data.bin ebc4
Just so happens that conversion of
0xEBC4 into an integer gives us the right answer. To verify that, I’ll use another common shell util,
$ echo "ibase=16; EBC4" | bc 60356
Using this method one can grab every part of the packet for analysis. I hope this provided you with some useful information. I suggest trying parsing out some more parts of the packet, playing with capturing more varied data, or using Wireshark to capture completely different packets from different parts of the stack. I’ve learned a ton doing so.