Network Troubleshooting: A Complex Process Made Simple

Network Troubleshooting: A Complex Process Made Simple
Gideon T. Rasmussen - CISSP, CISM, CFSO, SCSA

The most efficient manner to troubleshoot a network issue is to approach it in a systematic way. Start by gathering background information; then troubleshoot following the Open System Interconnection (OSI) networking model.

GATHER BACKGROUND INFORMATION

It is critical to obtain a complete picture of the issue. Carefully consider how the problem manifests itself. For example, does it apply to inbound traffic, outbound traffic or both?

Try to determine when the issue started and consider how often the issue occurs. Is this a constant or intermittent problem? Is this issue reproducible? If so, how?

The cause may be an unforeseen side effect of maintenance. Has anyone made any changes to the firewall or the networking equipment that it connects to?

Perhaps this is a symptom of a larger issue. Has anything else strange happened recently?

If this is a new initiative involving a series of complex configurations, there may be a better solution. In that case, consider what the ultimate goal is and work from there.

TROUBLESHOOT UP THROUGH THE OSI MODEL

Now that you have a firm understanding of the issue, track down its source. Conduct network troubleshooting following the OSI model. Start with the physical layer and work up to the application layer. Network problems are usually associated with the first three layers. This section of the article provides troubleshooting tips for firewalls, networking gear and the systems that connect to them. Unless otherwise noted, commands apply to both Windows and UNIX/Linux.

Physical Layer

The physical layer is one of the easiest to troubleshoot. It is also frequently overlooked. If there is a network connectivity problem, consider the following:

1. Ensure the equipment at the distant end is powered on. Don't laugh. It happens.

2. Examine the cabling. Check for defects or damage. If a cable has been cut or stretched, it may not pass traffic. Check the connectors too. The cable may not be properly inserted into the connector. If the connector is not crimped properly, the wiring may not be making contact with it.

3. Keep in mind that the maximum length of an Ethernet segment is 100 meters. If a cable is too long, there may be intermittent connectivity problems, or it may not work at all.

4. Ensure that each cable connector clicks as it is inserted into the network port.

5. Check the network port indicator lights on each system. If a link light is out, there is an issue with either the network card or cabling.

6. Ensure that the proper type of cabling is in use:

a. Cabling between computer systems and network devices use a "straight through" cable. To examine the wiring, look closely at the clear connectors at each end of the cable. A straight through cable has an identical wiring layout on both sides.

b. Direct cabling between computer systems requires a crossover cable. For example, if you connect a laptop directly to a server. A crossover cable has two wires flipped so the wiring on each side has a slightly different layout.

7. Finally, try swapping out the network cable with one that is known to be good or test it with different equipment.

8. If you suspect hardware issues with the network card, use a hardware diagnostic command to test it.

# getmib -l
dec3 DOWN 10 HD
dec2 DOWN 10 HD
dec1 DOWN 10 HD
dec0 UP 100 FD

In this example, we can see that the dec0 interface is up from a hardware perspective. The remaining interfaces are down. NOTE: The command used in this example is specific to CyberGuard firewalls.

9. Finally, check the operating system logs (i.e. syslog, osmlog, event viewer, etc.).

Data Link Layer

At the data link layer, local communications occur by network port hardware addresses, also referred to as Media Access Control (MAC) addresses. Failures at this layer are usually caused by an improperly configured network port or a physical problem.

1.a. If there are network connectivity issues, check the Address Resolution Protocol (ARP) table.

# arp -a
hostname (192.168.1.100) at 0:31:f8:3:b7:de

1.b. The IP address of at least one system should be listed. If there are no systems listed, there is a problem at the physical layer (above).

1.c. From the arp command results above, determine if the MAC address matches the distant network port hosting that IP address. If the MAC address is incorrect, delete the offending ARP entry.

# arp -d <IP address>

The ARP entry will be added automatically when network traffic arrives for that IP address. In most cases this occurs almost immediately. If the incorrect ARP entry appears again, there is a duplicate IP address on the network.

1.d. In some instances, two systems are linked in a high availability (HA) configuration. To ensure consistent service, if one system fails the other takes over automatically. This is accomplished by the standby system sending a gratuitous ARP broadcast across the local network (i.e., my MAC address answers for this IP address). If HA failovers are not taking place, the local router may have ARP caching enabled. To restore HA functionality, disable ARP caching on the router.

1.e. MAC addresses can also be used to determine the vendor of systems attached to the network. This can be useful in tracking down an offending system. To determine the vendor of a network port, visit the IEEE site at http://standards.ieee.org/regauth/oui/index.shtml. The search format is separated by dashes (i.e. 08-00-20).

2. Systems must be configured to auto negotiate or use the same speed and duplex settings. Otherwise there may be network performance issues or intermittent loss of connectivity. If these are the symptoms of your issue, confirm that network ports at each end of the wire are configured in the same manner (e.g. auto negotiate or 100 Mbps full duplex).

3. If there are intermittent or constant connectivity problems, use the netstat command to check the status of the network interfaces:

# netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Collis
lo0 2048 127 127.0.0.1 333929 0 333929 0 0
eeE0 1500 10.0.4 10.0.4.1 0 0 448800 0 0
dec3 1500 10.0.2 10.0.2.1 1798 0 3 0 0
dec2 1500 192.168.11 192.168.11.11 0 0 0 2 0
dec1 1500 10.0.5 10.0.5.1 1108 0 224 0 0
dec0 1500 64.94.50 64.94.50.88 738768 0 101501 0 10519

Errors in the Ierrs and Oerrs are usually caused by defective network hardware. Entries in the Collis column indicate that the network is very busy or there is an issue with the network hardware.

4. It is possible that the hardware is fine and the interface is down within the operating system. Use the ifconfig command to determine the status of the interface:

# ifconfig -a
dec0: flags=4023<UP,BROADCAST,NOTRAILERS,EXTERNAL> mtu 1500
inet 64.94.50.88 netmask ffffff00 broadcast 64.94.50.255
dec1: flags=2023<BROADCAST,NOTRAILERS,INTERNAL> mtu 1500
inet 10.0.5.1 netmask ffffff00 broadcast 10.0.5.255

In this example, the dec0 interface is "UP" and operational. The dec1 interface is down because ifconfig does not list it as "UP".

If "UP" is missing from the ifconfig status, use ifconfig to bring it on-line:

# ifconfig dec1 up

5. The default Message Transmit Unit (MTU) setting is 1500 (see "ifconfig -a" output above).

a. If the MTU is set to something other than 1500, the network may run slowly. To set the MTU to a default of 1500:

# ifconfig dec0 mtu 1500

b. In the event that there are issues with VPN connectivity over a Cable or DSL connection, try setting the VPN client workstation to an MTU of 1400. The DrTCP utility can be used for this purpose (http://www.dslreports.com/drtcp).

6. During the boot process, Windows workstations typically use the Dynamic Host Configuration Protocol (DHCP) to obtain basic network configurations. DHCP servers usually serve up an IP address, DNS server settings, netmask and default gateway. To view active configurations, use the ipconfig command.

c:\> ipconfig /all

If this process fails, the workstation will not have a proper network configuration. This issue typically occurs in home environments when the workstation boots before the system providing DHCP services (usually a router or modem). The fix action is to release and renew the network configurations using DHCP.

c:\> ipconfig /release

c:\> ipconfig /renew

Network

In order to communicate across a network, each system needs an IP address, a default gateway and a network mask.

1. Confirm that each node on the network has a unique IP address. If a system boots and advertises an IP address that is already in use, the system previously using that address will respond, and the new system will shut down its own networking. Use the ipconfig and ifconfig commands to determine the IP address assigned to each interface (Windows and UNIX/Linux, respectively).

2. Each system sends network traffic to its default gateway. If the default gateway is incorrect or missing, network traffic will not flow. The only exceptions are manually configured static route entries. Determine if the default gateway is correct in the output of "netstat -rn".

3. The network mask tells the system which devices are on its local network. All other traffic will have to go through a router. The most common network mask is 255.255.255.0. Current mask configurations can be displayed with the ipconfig and ifconfig commands. The topic of subnetting is too complex to discuss here. If you are uncertain about a network mask, contact your network administrator.

4. Try using the ping command between devices (when interacting with a CyberGuard firewall, this requires echo/ICMP rules with enable replies selected).

# ping 192.168.1.100

From a client, can you ping the internal interface of the firewall?
From the firewall, can you ping the client?
From the firewall, can you ping the firewall's default gateway?

5. If there are still issues with external connectivity, contact your ISP and ask them to test the line.

6. Denial of Service (DoS) attacks can degrade the performance of a system until it stops accepting network traffic. To determine what systems are connected, use the netstat command:

# netstat -an (output abbreviated)

tcp	64.94.50.88	64.94.50.84.1112	ESTABLISHED
tcp	*.21	.	LISTEN
tcp	64.94.50.88	64.94.50.84.80	SYN_RECEIVED

This syntax is also a good method detect a SYN flood attack (SYN_RECEIVED).

7. If you suspect ICMP based virus traffic:

a. On CyberGuard firewalls, use the netguard command:

# netguard -nS all

Under Sessions, take a look at the ICMP column.

Press Ctrl-C to exit the netguard session

b. On Windows and Linux/UNIX systems, use the netstat command to view the ICMP protocol statistics:

# netstat -s -p icmp

8. If there are issues with routing, outbound traffic will not flow properly.

a. Check the routing table for erroneous entries:

# route print

b. Use the lookup feature of the route command to determine how it will route traffic based on an IP address.

# route -n lookup <IP address>

9. If there are external connectivity issues, determine if Dynamic Network Address Translation (DNAT) is enabled on the external interface of the firewall. If DNAT is not enabled, traffic cannot find a route back to your system unless a static NAT is in place.

Transport

On CyberGuard firewalls, select enable replies in UDP and ICMP packet-filter rules. These protocols support ping, the Domain Name System (DNS) and syslog.

Session & Presentation

Issues rarely occur at the session and presentation layers.

Application

The application layer is where the client-server issues fall. This includes SMTP, POP3, HTTP, FTP, etc.

1. DNS supports many commonly used programs and services, including web pages and e-mail. DNS translates host names into IP addresses (e.g. www.cyberguard.com to 64.94.50.88). To confirm DNS functionality, use the nslookup command:

# nslookup www.cyberguard.com

It should respond with an IP address. If it does not, check your DNS server settings. Additional DNS troubleshooting is beyond the scope of this article.

2. If the traffic flows through a CyberGuard firewall, use the netguard and grep (search) commands:

# netguard -An | grep 192.168.1.100

P = Traffic was permitted
X = Traffic was proxied
D = Traffic was denied

If the traffic you are expecting is not listed in the netguard output, then it never reached the firewall.

3. If all else fails, use the tcpdump command to troubleshoot:

# tcpdump -vvpni dec1 -s1514 -w /archive2/dec1.dmp host 10.0.1.13

Tcpdump functions from the application layer down to the data link layer. If you are troubleshooting a proxy, you will need to run it on both sides. Tcpdump output is not for the faint of heart. I recommend viewing it with Ethereal (http://www.ethereal.com). The Windows version is Windump (http://windump.polito.it).

As you can see, network troubleshooting can be quite involved. In practice, the symptoms of the issue will contribute to how you approach it. For example, if you can ping the remote system, you will troubleshoot up the OSI model from there. Above all keep your cool and troubleshoot systematically.

How Network Traffic Flows - Getting Started

DNS Troubleshooting - Everything Depends on It

E-mail Troubleshooting - The Mail Must Get Through!