Categories
Homelab

e1000e eth0: Detected Hardware Unit Hang

Recently my home server and VM host randomly started losing network connectivity. On the outside it seems that it was still working, but I was unable to access it in any remote way. The ethernet adapter seemed to be on, according to the switch, so the issue must have been somewhere in software.

It wouldn’t be the first time a driver would be the issue of hanging network connection. In the past I’ve been burned by buggy WiFi drivers on Linux and Windows computers.

After digging a bit into the system logs, I stumbled on the following:

vmhost kernel: e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
 vmhost kernel:   TDH                  <0>
 vmhost kernel:   TDT                  <1>
 vmhost kernel:   next_to_use          <1>
 vmhost kernel:   next_to_clean        <0>
 vmhost kernel: buffer_info[next_to_clean]:
 vmhost kernel:   time_stamp           <10fbc2f81>
 vmhost kernel:   next_to_watch        <0>
 vmhost kernel:   jiffies              <10fbc3871>
 vmhost kernel:   next_to_watch.status <0>
 vmhost kernel: MAC Status             <40080083>
 vmhost kernel: PHY Status             <796d>
 vmhost kernel: PHY 1000BASE-T Status  <7800>
 vmhost kernel: PHY Extended Status    <3000>
 vmhost kernel: PCI Status             <10>
 vmhost kernel: e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
 vmhost kernel: vmbr0: port 1(eth0) entered disabled state

Then it would reset the network adapter and after a bit do it again, until the machine completely goes offline.

Looking for answers online I stumbled upon this ServerFault thread.

Cause

Reading upon different sources and bug reports list, it seems the best way to reproduce the issue is to have high-bandwidth situation on the device, i.e. streaming large amounts of data that would saturate the interface.

In my case it was usually happening when I’m streaming media from the local Plex server to a device. Due to the way the network is set up, the Windows VM that runs the Plex instance has to fetch the media file from a NFS network share on a separate device, transcode it in the VM and then send it to the playback device.

This adds up to a lot of network traffic, usually ~40-100mbit/s, depending on the device that plays the media file and the source media file quality.

The same issue manifested itself when streaming games via Steam Link to our Apple TV. The connection is wired, but it’s not uncommon for the network to drop. I think it’s correlated with the same issue, but will keep an eye for it to see if it will happen in the future after the fix.

Possible fix

Seems a possible fix would be to disable GSO (Generic Segmentation Offload), TSO (TCP Segmentation Offload) and GRO (Generic Receive Offload) on the network interface*:

ethtool -K eth0 gso off gro off tso off

I have applied this to my setup and I’m waiting to see if this will actually solve the issue in the long run.

Footnotes

* These options are related to offloading package segmentation to the network interface controller to reduce CPU usage on the machine. More details can be found in this Wikipedia article.

By Biser Perchinkov

Look, a coder!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s