HOWTO - Basic Network Troubleshooting / Understanding
Networking is sometimes considered to be complex, and hard to debug and manage. However, Linux (and thus Ubuntu) provides you with numerous tools to figure out exactly what's going wrong on your network, and how to fix it. What is really the problem is most people don't understand networking in the way they should. Hopefully, this HOWTO will get you started down the long, and sometimes hairy, road of figuring out exactly what's going wrong. This guide isn't intended to be all inclusive. However, this is the first in three guides on networking I plan on posting, this being the most basic.
The Basic Formula
This is a list of basic steps you should take in troubleshooting your network. Explanations will follow.
Is the interface configured correctly ? (lspci, lsmod, dmesg, ifconfig /etc/network/interfaces)
Is/Are DNS / hostnames configured correctly ? (Bind, /etc/hosts, /etc/resolv.conf)
Are the ARP tables correct ? ( arp -a )
Can you ping the localhost ? (ping localhost / 127.0.0.1 *try both*)
Can you ping other local hosts (hosts on the local network) by IP Address? How about hostname? (ping)
Can you ping hosts on another network (ala internet) ? (ping)
Do applications like ssh, firefox, sftp, etc, work ? (chosen application)
This seems rather verbose. However, all you're doing is going either up or down the network model layers. Huh, you say? The basic network model layers are just differing levels of the network, it's better if you see a layout:
Application Layer (ssh, telnet, firefox)
Transport Layer (flow control, etc)
Network Layer (addressing, routing)
Link Layer (hardware / device drivers)
Physical Layer (the actual cable or other physical media)
If you look at my troubleshooting steps above, we move up the layers. This is always a good idea when troubleshooting networks. Move up or down the layers; don't skip any steps.
Ok, so you're reading this and your mind feels like it's frying. What the hell is dataw0lf talking about??? I'M A NEWBIE, FOR GOD'S SAKE! Well, I'm going to try to explain some simple commands that'll make your life (and hopefully mine) a bit easier. Before that, however, we're going to go over routing tables and DNS very, very quickly. In the intermediate guide, I'll go over more complex protocols.
Want to see your routing tables? Open a prompt and type either netstat -nr or route. This will list your routing tables. Well what does that mean? Let's take a look.
Brief explanation of the -nr options: -n means return numeric output (ie, IP address instead of hostname) and -r means print the routing table.
dataw0lf@darktower:~ $ netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
0.0.0.0 192.168.0.1 0.0.0.0 UG 0 0 0 eth0
To understand IP routing you just have to understand one thing : it's 'next hop' routing. At any given point in the network, you just have to figure out where the NEXT host or router to get to to reach it's destination.
This is where 'default' routes come in. The 0.0.0.0 in my routing table is the 'default route'. If I get a packet that says ' I need to go to yahoo.com ', my routing tables are searched. If yahoo.com is NOT found in my routing table (which it obviously isn't), it's routed to my default route, which is also my gateway (the G flag in Flags). So it's forwarded to that IP (which happens to be my router/firewall, which in turn routes it out to the wilderness of the internet). Always check and see if the box you're having trouble with has a default route. Adding a default route is easy as pie:
The gateway is usually going to be your DSL modem / router / firewall.
route add default gw <gateway-ip-address>
If you can wrap your mind around that, you understand routing.
DNS can be very complex to setup (ala Bind). However, I'm going to give you a very simple explanation: DNS is used to map names (dataw0lf.org) to IPs (18.104.22.168). This is why, when you check hosts on your network, try to use the hostname AND IP.
A brief anecdote on why this is important:
At work, I installed a fiber network card on an employee's workstation. It was on Windows 2k Pro, and I thought I'd just stick it in real quick before I got onto some more pressing issues. So I just made sure it was detected and operating (and removed the ethernet NIC). What a mistake.
Three days later, the same employee can't attach our backup software gui to a ssh x session. I go through complex ssh configs, try everything. After three hours of futile troubleshooting, I remember the fiber card. And that I didn't setup the IP to make sure it was the same (he was trying to attach using the hostname, which pointed at the old IP, and it was giving off ssh authentication errors). Three hours when, if I'd tried attaching the display to the IP as well as the hostname, I would've known what the problem was.
Now, onto tools..
ping is THE tool. Just because it's simple doesn't mean it shouldn't be used. ping returns all sorts of goodies. Plus, it's a great baseline: if you can't ping something, you probably can't connect to it through a higher level application (ssh and the like). Here's an example of ping at work:
First off: the -c option is used to pass how many packets you're sending. As you can see, I decided to send 3 (just to keep it brief). Optionally, you can just Ctrl-C your way out of a ping, but that's no fun.
dataw0lf@darktower:~ $ ping -c 3 yahoo.com
PING yahoo.com (22.214.171.124) 56(84) bytes of data.
64 bytes from w2.rc.vip.scd.yahoo.com (126.96.36.199): icmp_seq=1 ttl=55 time=65.5 ms
64 bytes from w2.rc.vip.scd.yahoo.com (188.8.131.52): icmp_seq=2 ttl=56 time=65.1 ms
64 bytes from w2.rc.vip.scd.yahoo.com (184.108.40.206): icmp_seq=3 ttl=56 time=65.4 ms
--- yahoo.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 65.150/65.378/65.568/0.342 ms
As you can see I'm pinging yahoo.com. It gives me the IP address (definitely useful, and meaning that DNS is working [ yahoo.com got resolved to the IP address ] ), and some other rather cryptic stuff. We'll go over this briefly.
Now, look at the icmp_seq. As you might guess, the icmp_seq is the ICMP Sequence number. If these are off (ie, you get a 1, then a 4, then a 6). A healthy network won't drop these too much. TCP/IP isn't foolproof, but if you're seeing major gaps in your icmp_seq, you're losing packets somewhere, and you'll experience lag that you normally shouldn't. I'm not going to explain how to 'clean' this up (through traceroute), that will be for my 'intermediate course'.
Ah, ye ifconfig. This can tell you everything you need to know about the interface.
This gives me my IP, my MAC address (HWaddr), RX/TX packets, the interrupt it's on, etc. RX means Received, TX means Transferred. This is another easy tool to see if your interface is actually loading correctly. I'll explain Broadcasting and Netmasking in the next guide, when I explain TCP/IP in detail.
dataw0lf@darktower:~ $ ifconfig
eth0 Link encap:Ethernet HWaddr 00:0A:E6:C6:07:85
inet addr:192.168.0.6 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::20a:e6ff:fec6:785/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:18458 errors:0 dropped:0 overruns:0 frame:0
TX packets:8982 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:4015093 (3.8 MiB) TX bytes:1449812 (1.3 MiB)
Interrupt:10 Base address:0xd400
This is the very basics of network debugging. In the next issue, we'll discover TCP/IP, tcpdump, and traceroute.