Thursday, 12 February 2015

Troubleshooting intermittent pings

Troubleshooting intermittent pings can be a pain in the rear. Here are some steps I have used to isolate the issue.
I am working with a customer who is having a Hyper-V cluster, after getting the Network Adapters sorted out I noticed the RDP connectivity to the server was poor. A ping to the default gateway revealed high ping latency and a good handful of request time-outs.  

  • Updated the drivers - No change.
  • Changed the CAT6 cable - No change.
  • Disabled VMQ - No change.
  • Plugged my laptop in to the same cable - Same issue, Bingo! This turned out to be a red herring. I updated my drivers and plugged the power in and that resolved the laptop ping issue.
  • Tried another laptop on the same cable - It was fine, 1ms.
  • Tried another 1GbE NIC to eliminate the LBFO driver - still the same.
  • Tried a 10GbE - Seems to be stable, odd.
  • Tried another switch - Seems to be stable, odd.
  • Flatted the server, installed Windows 2012 R2 again and test the ping, it was fine, odd!
  • Repeated all the configuration steps one at a time each time testing the ping, still OK.
  • Joined the domain, BAM! Pings all over the shop.
  • Tried safe mode, Seems to be stable 1ms!
  • Removed the server from the domain, seems to be stable 1ms.
  • Added the server back to the domain, pings latent and intermittent again.
  • Stop the Base Filter Engine (stop-service BFE - force), seems to be stable 1ms.
  • Uninstall the Broadcom driver and use the OOB driver from Microsoft, seems to be stable 1ms.
Update: After creating some VMs the problem moved inside some VMs. The solution was to change the Broadcom 1GbE NICs for Intel x350-t NICs.

Hardware, Dell PowerEdge R720, 1GbE & 10GbE Broadcom NICs. Switch HP 5400