Figure 1. Link and activity lights can tell you a lot about the connection between two network devices.
All network architectures rely upon a physical (or RF) connection between devices. Troubleshooting almost always starts with a check of the physical layer of the network. Are the cables connected? Are the wall jacks and patch panels properly terminated?
A whole industry has developed around providing test equipment to verify link performance. This is not surprising since the majority of your network problems involve a physical device or connection.
Physical and electrical connections
One of the most useful troubleshooting tools is the computer itself. Almost all network devices come with diagnostics lights. (See Figure 1.) These lights give you basic information about the network connection to the device.
The link light indicates that a link has been established with another network device at the other end of the cable. This light should stay on continuously. If yours is not on, it most likely indicates a problem with the physical connection, although lack of a link may also be caused by improper network interface card (NIC) drivers.
The activity light indicates network activity. If the link light is not lit, then the network activity light will be dark. This is because you must have a good link before you can see any activity on the network. If the link light is lit and the network activity light is dark but it is blinking on other computers, this means that you have a valid physical and electrical connection, but for some reason, the NIC card in your computer is not seeing any network traffic. This could be because of a port failure in the network switch, or it could be because of a wiring error. If the NIC lights are working, but you are still having problems, your computer has several programs that can help identify networking problems.
Troubleshooting with ping
First, find the IP address of another computer on the network (the target) that is working normally. Next, go to the computer that is having difficulty and open a command line window. Type “Ping [IP address],” where [IP address] is the address of the target computer. If you see a display similar to Figure 2, your system is able to communicate with the target computer.
Figure 2. A ping request lists the IP address of the system you are pinging along with the time the ping messages took to transit the network. C:\>Ping 10.10.10.2 Pinging 10.10.10.2 with 32 bytes of data: Reply from 10.10.10.2: bytes=32 time<10ms TTL=128 Reply from 10.10.10.2: bytes=32 time<10ms TTL=128 Reply from 10.10.10.2: bytes=32 time<10ms TTL=128 Reply from 10.10.10.2: bytes=32 time<10ms TTL=128 C:\>
If you see the message, “Request timed out,” it means that your computer cannot send and receive messages from the target computer. This could be caused by a problem in your computer or on the network. It could also mean that the computer you are trying to ping is configured to ignore ping requests. Try pinging the target computer from another system so that you know it is working properly.
Trace route command
Another useful utility is the trace route command (traceroute on some systems, tracert on others). Trace route not only shows the path packets take from one computer to another, but it also shows the time it takes packets to transit from one place to another.
Figure 3 on page 36 shows how trace route can be used to find areas of network congestion. In this example, I use trace route to find the route between myself and xyz.com (this is just an example, not the real XYZ.com). You can see that traffic leaves my local network and then travels on bellsouth.net. Transit time to bellsouth is pretty good — generally less than 6ms. But then on hop 6 at 126.96.36.199, the response time jumps to 24ms. A few hops later, the carrier changes to broadwing.net. We can only assume that the jump in response times happens at a meeting point between bellsouth and broadwing. Next, on hop 11, the response time jumps to 63ms. From the URLs, it looks as if this jump occurs somewhere within the broadwing network.
Figure 3. The trace route command can reveal the source of network bottlenecks. C:\>tracert xyz.com
Tracing route to xyz.com [188.8.131.52]
over a maximum of 30 hops: 1 <1ms <1ms <1ms 192.168.1.1 2 4ms 4ms 4ms adsl-33-166-1.asm.bellsouth.net [184.108.40.206] 3 5ms 5ms 4ms 220.127.116.11 4 7ms 4ms 4ms 18.104.22.168 5 5ms 5ms 6ms axr00asm-1-3-1.bellsouth.net [22.214.171.124] 6 24ms 24ms 24ms 126.96.36.199 7 30ms 28ms 25ms 188.8.131.52 8 25ms 26ms 25ms ge-2-1-0.a1.chcg.broadwing.net [184.108.40.206] 9 34ms 30ms 29ms p5-0.gnwd.broadwing.net [220.127.116.11] 10 46ms 46ms 46ms p4-0.c0.ftwo.broadwing.net [18.104.22.168] 11 62ms 63ms 63ms s7-3-0.c1.atln.broadwing.net [22.214.171.124] 12 63ms 62ms 63ms p3-0-0.a1.atln.broadwing.net [126.96.36.199] 13 65ms 65ms 69ms 188.8.131.52 14 65ms 65ms 64ms www.xyz.com [184.108.40.206] Trace complete.
In this example, we are observing congestion (or perhaps delays induced by distance) across the Internet. But trace route can be used equally as well to identify choke points on a local network.
You might not think security belongs in an article on troubleshooting, but I can tell you from firsthand experience that many security issues manifest themselves initially as problems on the network. As the person in charge of maintaining your networks, it is important that you consider security when performance problems arise.
The effects of compromised security on a corporate network can be substantial. An e-mail worm can slow performance as an infected computer sends out hundreds or thousands of e-mails in an attempt to infect other systems.
Viruses can turn computers into zombies — computers that can be remotely controlled by their attackers. At a prearranged time, the zombies may try to communicate with other computers in a distributed denial-of-service attack. This coordinated attack can slow network performance as a large number of computers all attempt to communicate with the same host at the same time. Be aware that performance issues can point to a security breach in your organization.
If most people on your network use a central server, the problem could actually be an overloaded server. Almost all servers typically have a number of diagnostic tools, which will let you know how heavily loaded they are.
The cause of server performance problems depends on how clients use the server. Database applications are computationally intensive. This can require large amounts of processor and memory resources. Certain operating systems — Windows Server 2003, for example — require large amounts of available memory. If there is not enough memory available in the server, performance suffers dramatically.
As you might expect, streaming servers require a lot of bandwidth, both at the NIC card and on the bus. So, when users complain of slow network performance, be sure to consider whether the users all connect to a common server, and check the server to see if it is resource-starved.
There is one last place to look when you are troubleshooting network problems: log files. They are created by most modern operating systems and can be helpful. If you are a server administrator, you should look at log files every day. Not only can they help you identify network issues, but they can also provide early warning of security issues.
When you have problems with your network, you should start by checking the physical and electrical connections to the computer. You can then ping other computers to see if your computer is able to communicate across the network. If you experience performance problems on the network, you can use trace route to see where the problems come from. Remember, problems that appear to be network-related may actually be caused by a lack of available resources on a central server. Finally, if your network suddenly develops problems, remember that the problem may be caused by a security breach.
Brad Gilmer is president of Gilmer & Associates, executive director of the AAF Association and executive director of the Video Services Forum.