Network Troubleshooting: A Complex Process Made Simple
Gideon T. Rasmussen - CISSP, CISM, CFSO, SCSA
The
most efficient manner to troubleshoot a network issue is
to approach it in a systematic way. Start by gathering background
information; then troubleshoot following the Open System
Interconnection (OSI) networking model.
GATHER
BACKGROUND INFORMATION
It
is critical to obtain a complete picture of the issue. Carefully
consider how the problem manifests itself. For example,
does it apply to inbound traffic, outbound traffic or both?
Try
to determine when the issue started and consider how often
the issue occurs. Is this a constant or intermittent problem?
Is this issue reproducible? If so, how?
The
cause may be an unforeseen side effect of maintenance. Has
anyone made any changes to the firewall or the networking
equipment that it connects to?
Perhaps
this is a symptom of a larger issue. Has anything else strange
happened recently?
If
this is a new initiative involving a series of complex configurations,
there may be a better solution. In that case, consider what
the ultimate goal is and work from there.
TROUBLESHOOT
UP THROUGH THE OSI MODEL
Now
that you have a firm understanding of the issue, track down
its source. Conduct network troubleshooting following the
OSI model. Start with the physical layer and work up to
the application layer. Network problems are usually associated
with the first three layers. This section of the article
provides troubleshooting tips for firewalls, networking
gear and the systems that connect to them. Unless otherwise
noted, commands apply to both Windows and UNIX/Linux.
Physical
Layer
The
physical layer is one of the easiest to troubleshoot. It
is also frequently overlooked. If there is a network connectivity
problem, consider the following:
1.
Ensure the equipment at the distant end is powered on. Don't
laugh. It happens.
2.
Examine the cabling. Check for defects or damage. If a cable
has been cut or stretched, it may not pass traffic. Check
the connectors too. The cable may not be properly inserted
into the connector. If the connector is not crimped properly,
the wiring may not be making contact with it.
3.
Keep in mind that the maximum length of an Ethernet segment
is 100 meters. If a cable is too long, there may be intermittent
connectivity problems, or it may not work at all.
4.
Ensure that each cable connector clicks as it is inserted
into the network port.
5.
Check the network port indicator lights on each system.
If a link light is out, there is an issue with either the
network card or cabling.
6.
Ensure that the proper type of cabling is in use:
a.
Cabling between computer systems and network devices use
a "straight through" cable. To examine the wiring,
look closely at the clear connectors at each end of the
cable. A straight through cable has an identical wiring
layout on both sides.
b. Direct cabling between computer systems requires a crossover
cable. For example, if you connect a laptop directly to
a server. A crossover cable has two wires flipped so the
wiring on each side has a slightly different layout.
7.
Finally, try swapping out the network cable with one that
is known to be good or test it with different equipment.
8.
If you suspect hardware issues with the network card, use
a hardware diagnostic command to test it.
# getmib -l
dec3 DOWN 10 HD
dec2 DOWN 10 HD
dec1 DOWN 10 HD
dec0 UP 100 FD
In
this example, we can see that the dec0 interface is up from
a hardware perspective. The remaining interfaces are down.
NOTE: The command used in this example is specific to CyberGuard
firewalls.
9.
Finally, check the operating system logs (i.e. syslog, osmlog,
event viewer, etc.).
Data
Link Layer
At
the data link layer, local communications occur by network
port hardware addresses, also referred to as Media Access
Control (MAC) addresses. Failures at this layer are usually
caused by an improperly configured network port or a physical
problem.
1.a.
If there are network connectivity issues, check the Address
Resolution Protocol (ARP) table.
# arp -a
hostname (192.168.1.100) at 0:31:f8:3:b7:de
1.b.
The IP address of at least one system should be listed.
If there are no systems listed, there is a problem at the
physical layer (above).
1.c.
From the arp command results above, determine if the MAC
address matches the distant network port hosting that IP
address. If the MAC address is incorrect, delete the offending
ARP entry.
# arp -d <IP address>
The
ARP entry will be added automatically when network traffic
arrives for that IP address. In most cases this occurs almost
immediately. If the incorrect ARP entry appears again, there
is a duplicate IP address on the network.
1.d.
In some instances, two systems are linked in a high availability
(HA) configuration. To ensure consistent service, if one
system fails the other takes over automatically. This is
accomplished by the standby system sending a gratuitous
ARP broadcast across the local network (i.e., my MAC address
answers for this IP address). If HA failovers are not taking
place, the local router may have ARP caching enabled. To
restore HA functionality, disable ARP caching on the router.
1.e.
MAC addresses can also be used to determine the vendor of
systems attached to the network. This can be useful in tracking
down an offending system. To determine the vendor of a network
port, visit the IEEE site at http://standards.ieee.org/regauth/oui/index.shtml.
The search format is separated by dashes (i.e. 08-00-20).
2.
Systems must be configured to auto negotiate or use the
same speed and duplex settings. Otherwise there may be network
performance issues or intermittent loss of connectivity.
If these are the symptoms of your issue, confirm that network
ports at each end of the wire are configured in the same
manner (e.g. auto negotiate or 100 Mbps full duplex).
3.
If there are intermittent or constant connectivity problems,
use the netstat command to check the status of the network
interfaces:
# netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Collis
lo0 2048 127 127.0.0.1 333929 0 333929 0 0
eeE0 1500 10.0.4 10.0.4.1 0 0 448800 0 0
dec3 1500 10.0.2 10.0.2.1 1798 0 3 0 0
dec2 1500 192.168.11 192.168.11.11 0 0 0 2 0
dec1 1500 10.0.5 10.0.5.1 1108 0 224 0 0
dec0 1500 64.94.50 64.94.50.88 738768 0 101501 0 10519
Errors
in the Ierrs and Oerrs are usually caused by defective network
hardware. Entries in the Collis column indicate that the
network is very busy or there is an issue with the network
hardware.
4.
It is possible that the hardware is fine and the interface
is down within the operating system. Use the ifconfig command
to determine the status of the interface:
# ifconfig -a
dec0: flags=4023<UP,BROADCAST,NOTRAILERS,EXTERNAL>
mtu 1500
inet 64.94.50.88 netmask ffffff00 broadcast 64.94.50.255
dec1: flags=2023<BROADCAST,NOTRAILERS,INTERNAL> mtu
1500
inet 10.0.5.1 netmask ffffff00 broadcast 10.0.5.255
In
this example, the dec0 interface is "UP" and operational.
The dec1 interface is down because ifconfig does not list
it as "UP".
If
"UP" is missing from the ifconfig status, use
ifconfig to bring it on-line:
# ifconfig dec1 up
5.
The default Message Transmit Unit (MTU) setting is 1500
(see "ifconfig -a" output above).
a.
If the MTU is set to something other than 1500, the network
may run slowly. To set the MTU to a default of 1500:
# ifconfig dec0 mtu 1500
b.
In the event that there are issues with VPN connectivity
over a Cable or DSL connection, try setting the VPN client
workstation to an MTU of 1400. The DrTCP utility can be
used for this purpose (http://www.dslreports.com/drtcp).
6.
During the boot process, Windows workstations typically
use the Dynamic Host Configuration Protocol (DHCP) to obtain
basic network configurations. DHCP servers usually serve
up an IP address, DNS server settings, netmask and default
gateway. To view active configurations, use the ipconfig
command.
c:\> ipconfig /all
If
this process fails, the workstation will not have a proper
network configuration. This issue typically occurs in home
environments when the workstation boots before the system
providing DHCP services (usually a router or modem). The
fix action is to release and renew the network configurations
using DHCP.
c:\> ipconfig /release
c:\> ipconfig /renew
Network
In
order to communicate across a network, each system needs
an IP address, a default gateway and a network mask.
1.
Confirm that each node on the network has a unique IP address.
If a system boots and advertises an IP address that is already
in use, the system previously using that address will respond,
and the new system will shut down its own networking. Use
the ipconfig and ifconfig commands to determine the IP address
assigned to each interface (Windows and UNIX/Linux, respectively).
2.
Each system sends network traffic to its default gateway.
If the default gateway is incorrect or missing, network
traffic will not flow. The only exceptions are manually
configured static route entries. Determine if the default
gateway is correct in the output of "netstat -rn".
3.
The network mask tells the system which devices are on its
local network. All other traffic will have to go through
a router. The most common network mask is 255.255.255.0.
Current mask configurations can be displayed with the ipconfig
and ifconfig commands. The topic of subnetting is too complex
to discuss here. If you are uncertain about a network mask,
contact your network administrator.
4.
Try using the ping command between devices (when interacting
with a CyberGuard firewall, this requires echo/ICMP rules
with enable replies selected).
# ping 192.168.1.100
-
From
a client, can you ping the internal interface of the
firewall?
-
From
the firewall, can you ping the client?
-
From
the firewall, can you ping the firewall's default gateway?
5.
If there are still issues with external connectivity, contact
your ISP and ask them to test the line.
6.
Denial of Service (DoS) attacks can degrade the performance
of a system until it stops accepting network traffic. To
determine what systems are connected, use the netstat command:
# netstat -an (output abbreviated)
|
tcp
|
0
|
0
|
64.94.50.88
|
64.94.50.84.1112
|
ESTABLISHED
|
|
tcp
|
0
|
0
|
*.21
|
*.*
|
LISTEN
|
|
tcp
|
0
|
0
|
64.94.50.88
|
64.94.50.84.80
|
SYN_RECEIVED
|
This
syntax is also a good method detect a SYN flood attack (SYN_RECEIVED).
7.
If you suspect ICMP based virus traffic:
a.
On CyberGuard firewalls, use the netguard command:
# netguard -nS all
Under
Sessions, take a look at the ICMP column.
Press
Ctrl-C to exit the netguard session
b.
On Windows and Linux/UNIX systems, use the netstat command
to view the ICMP protocol statistics:
# netstat -s -p icmp
8.
If there are issues with routing, outbound traffic will
not flow properly.
a.
Check the routing table for erroneous entries:
# route print
b.
Use the lookup feature of the route command to determine
how it will route traffic based on an IP address.
# route -n lookup <IP
address>
9.
If there are external connectivity issues, determine if
Dynamic Network Address Translation (DNAT) is enabled on
the external interface of the firewall. If DNAT is not enabled,
traffic cannot find a route back to your system unless a
static NAT is in place.
Transport
On
CyberGuard firewalls, select enable replies in UDP and ICMP
packet-filter rules. These protocols support ping, the Domain
Name System (DNS) and syslog.
Session
& Presentation
Issues
rarely occur at the session and presentation layers.
Application
The
application layer is where the client-server issues fall.
This includes SMTP, POP3, HTTP, FTP, etc.
1.
DNS supports many commonly used programs and services, including
web pages and e-mail. DNS translates host names into IP
addresses (e.g. www.cyberguard.com to 64.94.50.88). To confirm
DNS functionality, use the nslookup command:
# nslookup www.cyberguard.com
It
should respond with an IP address. If it does not, check
your DNS server settings. Additional DNS troubleshooting
is beyond the scope of this article.
2.
If the traffic flows through a CyberGuard firewall, use
the netguard and grep (search) commands:
# netguard -An | grep 192.168.1.100
P
= Traffic was permitted
X = Traffic was proxied
D = Traffic was denied
If
the traffic you are expecting is not listed in the netguard
output, then it never reached the firewall.
3.
If all else fails, use the tcpdump command to troubleshoot:
# tcpdump -vvpni dec1 -s1514
-w /archive2/dec1.dmp host 10.0.1.13
Tcpdump
functions from the application layer down to the data link
layer. If you are troubleshooting a proxy, you will need
to run it on both sides. Tcpdump output is not for the faint
of heart. I recommend viewing it with Ethereal (http://www.ethereal.com). The Windows
version is Windump (http://windump.polito.it).
As
you can see, network troubleshooting can be quite involved.
In practice, the symptoms of the issue will contribute to
how you approach it. For example, if you can ping the remote
system, you will troubleshoot up the OSI model from there.
Above all keep your cool and troubleshoot systematically.
How Network Traffic Flows - Getting
Started
DNS Troubleshooting - Everything
Depends on It
E-mail Troubleshooting - The Mail
Must Get Through!
Copyright © 2005 CyberGuard Corporation All Rights Reserved.
Reprinted with Permission
|