Troubleshooting Common Networking Problems with Wireshark, Pt. 1: TCP Checksum Incorrect ‘Error’


Author’s Note: This is the first part in a six-part series about finding and solving many networking anomalies using the Wireshark network protocol analyzer. I originally wrote these for the Microsoft Enterprise Networking team back when I worked there to help new engineers understand and resolve common cases, so they are written for people with fairly advanced networking knowledge and a decent basic understanding of protocol analyzers. If you are new to Wireshark, here’s an intro to it I wrote for Ars Technica.

At times, you may hear a client or another engineer point to a ‘TCP Checksum Incorrect’ error in a trace as the source of a problem. You may have been told that you can ignore these ‘errors’, as they are due to checksum offloading by the NIC. However, you may have cases in which you need to empirically prove that checksum offloading is the cause of the ‘error’ in the trace, and explain this to the customer.

First, let’s examine what a ‘TCP Checksum Incorrect’ error looks like:

Source                Destination           Protocol Info
192.168.1.2           192.168.1.135         TCP      [TCP CHECKSUM INCORRECT] 
Frame 3949 (1230 bytes on wire, 1230 bytes captured)
Ethernet II, Src: file-srv1.corp.local (00:11:43:58:1c:86), Dst: Intel_97:b0:52 (00:0c:f1:97:b0:52)
Internet Protocol, Src: 192.168.1.2 (192.168.1.2), Dst: 192.168.1.135 (192.168.1.135)
Transmission Control Protocol, Src Port: ms-sql-s (1433), Dst Port: 2080 (2080), Seq: 4995, Ack: 3576, Len: 1176
    Source port: ms-sql-s (1433)
    Destination port: 2080 (2080)
    Sequence number: 4995    (relative sequence number)
    Next sequence number: 6171    (relative sequence number)
    Acknowledgement number: 3576    (relative ack number)
    Header length: 20 bytes
    Flags: 0x0018 (PSH, ACK)
    Window size: 65503
    Checksum: 0x888c [incorrect, should be 0x4eb4]
Tabular Data Stream 

In this trace, you can see the problem. Wireshark computes the TCP checksum for each TCP segment and is notifying you that the checksum listed in the captured segment does not equal what it computes as the correct checksum. If this checksum is truly incorrect, then the receiving system will discard the packet at the Transport layer, according to Standard 7.

However, in the vast majority of the cases, these ‘errors’ will not be errors at all, but simply a side effect of checksum offloading on the network card. Checksum offloading is when the network card is capable of calculating the TCP checksum in hardware, thus freeing the OS from processing the checksum. This means that the checksum sent from the OS to the NIC driver is necessarily incorrect, as the OS is not processing the checksum.

Since protocol analyzers hook in above the network card driver, they are only capable of capturing data sent to the driver (from the source system) and received from the driver (on the destination system). For this reason, if checksum offloading is implemented on the source system, all TCP segments sent from that system will show up with this ‘TCP Checksum Incorrect’ error on traces taken from that system.

For this reason, when looking through traces where the ‘TCP Checksum Incorrect’ error is present, be aware of which packets do and do not contain the error. If you only see the error in packets sourced from the system where the traces were taken, checksum offloading is the most likely culprit.
In Wireshark, you can easily determine if this is the case by using the following filter:

tcp.checksum_bad == 1

This filter will remove all frames with a correct TCP checksum. If the remaining frames all have source addresses which correspond to the system the traces were taken on, checksum offloading is the most likely cause.

However, you again may be asked to prove empirically that this is the case. One method of doing this would be to simply disable offloading and take a new trace. However, this may reduce system performance somewhat and is a bit intrusive. Another method would be to take simultaneous traces from the problem system and another system while they are involved in TCP communications. When looking at simultaneous traces, you should see the following from the source system:

Source                Destination           Protocol Info
192.168.1.2           192.168.1.195         TCP      [TCP CHECKSUM INCORRECT] 

Frame 17383 (54 bytes on wire, 54 bytes captured)
Ethernet II, Src: file-srv1.corp.local (00:11:43:58:1c:86), Dst: Intel_8b:9b:2e (00:03:47:8b:9b:2e)
Internet Protocol, Src: 192.168.1.2 (192.168.1.2), Dst: 192.168.1.195 (192.168.1.195)
Transmission Control Protocol, Src Port: ms-sql-s (1433), Dst Port: 1262 (1262), Seq: 4335, Ack: 2702, Len: 0
    Source port: ms-sql-s (1433)
    Destination port: 1262 (1262)
    Sequence number: 4335    (relative sequence number)
    Acknowledgement number: 2702    (relative ack number)
    Header length: 20 bytes
    Flags: 0x0010 (ACK)
    Window size: 64373
    Checksum: 0x8430 [incorrect, should be 0x0ff0]
    SEQ/ACK analysis 

However, on the remote side, you will see:

Source                Destination           Protocol Info
192.168.1.2           192.168.1.195         TCP      [TCP Keep-Alive ACK]

Frame 458 (60 bytes on wire, 60 bytes captured)
Ethernet II, Src: 192.168.1.2 (00:11:43:58:1c:86), Dst: 192.168.1.195 (00:03:47:8b:9b:2e)
Internet Protocol, Src: 192.168.1.2 (192.168.1.2), Dst: 192.168.1.195 (192.168.1.195)
Transmission Control Protocol, Src Port: ms-sql-s (1433), Dst Port: 1262 (1262), Seq: 4335, Ack: 2702, Len: 0
    Source port: ms-sql-s (1433)
    Destination port: 1262 (1262)
    Sequence number: 4335    (relative sequence number)
    Acknowledgement number: 2702    (relative ack number)
    Header length: 20 bytes
    Flags: 0x0010 (ACK)
    Window size: 64373
    Checksum: 0x0ff0 [correct]    SEQ/ACK analysis 


You will notice that these frames are matched as being the same by the sequence and acknowledgement numbers. You will also note that the remote side received a correct checksum, which is the exact same number (0x0ff0) as the correct checksum listed on the source trace. This proves that the remote system received a correct checksum, which means the NIC sent the correct checksum, regardless of what the traces from the source side show.

, , , ,

  1. No comments yet.
(will not be published)