1 Attachment(s)
Beginners programming challenge #28
Seeing as challenge #27 is now over 4 months old with no #28 in sight, and that I have (what I think is) an interesting idea for a new challenge, I have taken the liberty of creating it myself. As you will see, it will help you develop a somewhat different (but no less important) set of skills than most previous challenges. So without further ado...
Welcome to the 28th Beginners programming challenge.
Most previous challenges asked you to work with text or numeric data. By contrast, this one will ask you to work with binary data. I also wanted to make it relevant, so you will work with real-world data. Namely, you will implement an ARP packet analyser. Don't worry, it's less scary than it sounds. ;) First, some background.
Background: The ARP Protocol
(To simplify the discussion, we only consider two machines that are in the same LAN segment, meaning they can communicate directly, without any router between them.)
Most people know that machines on a network are identified by an IP address. A bit less known is that machines are also identified by a hardware (or MAC) address. You can see it for example with the command ifconfig:
Code:
firas@aoba ~ % ifconfig eth1
eth1 Link encap:Ethernet HWaddr 00:25:00:48:09:8c
inet addr:192.168.1.19 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::225:ff:fe48:98c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:93066 errors:0 dropped:0 overruns:0 frame:38438
TX packets:50786 errors:19 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:130634360 (130.6 MB) TX bytes:4352298 (4.3 MB)
Interrupt:23
Here the IP address of my machine is 192.168.1.19, and its hardware address is 00:25:00:48:09:8c. Generally, when a user wants to use the network, they specify only the IP address of the machine they want to communicate with. I am not going to dwelve into the reason for having two addresses (IP and hardware) in the first place, but the fact is that in order to communicate with another machine on the network using the IP protocol, a machine needs to know both its IP and hardware addresses. The IP address is specified by the user, but the hardware address is not, so how does a machine obtain the hardware address of the machine it wants to communicate with? This is where the ARP (Address Resolution Protocol) protocol kicks in.
Remember that my machine has IP address 192.168.1.19 and hardware address 00:25:00:48:09:8c, let's call it machine A. Suppose I want to send an IP packet to the machine with IP address 192.168.1.1, let's call it machine B. First, machine A needs to acquire the hardware address of machine B. In order to do that, it simply sends a packet to every other machine on the network, saying in effect: "Hi, I am 00:25:00:48:09:8c, I have IP address 192.168.1.19, and I would like to know who here has IP address 192.168.1.1." Assuming there actually is a machine on the network that has IP address 192.168.1.1, it will reply with a packet stating its hardware address, saying in effect: "Hi, I am 38:46:08:d1:83:97, and it is I who has IP address 192.168.1.1." Then machine A has all the information it needs in order to send an IP packet to machine B. Also, it will store the hardware address of machine B in its ARP table, so as to not perform an ARP lookup every time. You can see your machine's ARP table with the aptly named arp command:
Code:
firas@aoba ~ % arp -n
Address HWtype HWaddress Flags Mask Iface
192.168.1.1 ether 38:46:08:d1:83:97 C eth1
The format of an ARP packet is defined in several RFCs, but the Wikipedia article (especially section 2, Packet structure) will be sufficient for this task. ARP packets are encapsulated in Ethernet frames, so you will also need the Ethernet frame Wikipedia article.
Task
Your task is simply to write a program that will read a copy of an ARP packet, and print the information it contains, such as whether the packet is a request or reply packet, and the addresses of the two machines involved. Sample request and reply packets are available in the attached archive. The files request.bin and reply.bin are the raw request and reply packets, that your program will take as input. The files suffixed .hexdump are hexadecimal dumps of the corresponding packets in text format, for easier visualisation. (If you would like to capture packets yourself, see post #4 below.)
Before you start coding, you should get familiar with the structure of an ARP packet (and of an Ethernet frame that contains one). To that end you can simply look at the hexdumps in your favourite text editor, or, even better, open them in Wireshark. Wireshark is available in the Ubuntu repositories (package wireshark), simply run it, cick File > Import, and open the hexdump file of your choice, keeping all the other options at their default values. Examining the packets in Wireshark will let you see exactly where in the packet each piece of information is stored, for example:
http://imageshack.us/a/img842/6445/wireshark.th.png
You can assume that all input packets are correctly formatted. Also, I have included the encapsulating Ethernet frame only to make opening the packets in Wireshark easier, you can just skip over the Ethernet data in your program.
Cookie points
Cookie points will be awarded for the following extras:
- Drop the assumption that all packets are correctly formatted, and handle incorrect packets gracefully.
- Also print the information contained in the Ethernet frames.
- Make your program support both input formats (raw and hexdump in the same format as in the provided archive).
Disqualified Entries
Any overly obfuscated code will be immediately disqualified without account for programmers skill. Please remember that these challenges are for beginners and therefore the code should be easily readable and well commented.
Any non-beginner entries will not be judged. Please use common sense when posting code examples. Please do not give beginners a copy paste solution before they have had a chance to try this for themselves.
Assistance
If you require any help with this challenge please do not hesitate to come and chat to the development focus group. We have a channel on irc.freenode.net #ubuntu-beginners-dev
Or you can pm me
Have fun,
Happy coding
Bachstelze
Re: Beginners programming challenge #28
http://goo.gl/0UDto (the code)
A sample run with the example packets:
Code:
nfm@FX6840:~/dev/arp$ gcc -Wall -o arp main.c
nfm@FX6840:~/dev/arp$ ./arp
ETHERNET HEADER
--------------------
Destination MAC Address: ff:ff:ff:ff:ff:ff
Source MAC Address: ca:01:33:65:00:08
Ethertype: 0x0806
ARP PACKET
------------------
Hardware Type: 1
Protocol Type: 0x0800
Packet Type: REQUEST
Sender MAC Address: ca:01:33:65:00:08
Sender Protocol Address: 192.168.1.1
Target MAC Address: 00:00:00:00:00:00
Target Protocol Address: 192.168.1.2
nfm@FX6840:~/dev/arp$ rm packet.bin
nfm@FX6840:~/dev/arp$ cp reply.bin packet.bin
nfm@FX6840:~/dev/arp$ ./arp
ETHERNET HEADER
--------------------
Destination MAC Address: ca:01:33:65:00:08
Source MAC Address: cc:03:33:76:00:00
Ethertype: 0x0806
ARP PACKET
------------------
Hardware Type: 1
Protocol Type: 0x0800
Packet Type: REPLY
Sender MAC Address: cc:03:33:76:00:00
Sender Protocol Address: 192.168.1.2
Target MAC Address: ca:01:33:65:00:08
Target Protocol Address: 192.168.1.1
I think it could be improved, but I'm getting lazy. Also I'm trying to learn Dvorak typing at the same time so typing the code is frustrating. What do you think? I can handle criticism :p
Re: Beginners programming challenge #28
On lines 45 and 46 you have a semicolon that should not be there, I assume it's a typo.
Compiling with all warnings enabled gives:
Code:
firas@ichiyoh arp % gcc -std=c99 -pedantic -Wall -Wextra -o arptool arptool.c
arptool.c:31: warning: unused parameter 'argc'
arptool.c:31: warning: unused parameter 'argv'
arptool.c: In function 'getData':
arptool.c:81: warning: comparison is always false due to limited range of data type
arptool.c:91: warning: comparison is always false due to limited range of data type
This is a common mistake: fgetc() returns an int, you need to check that it is not EOF before storing it in a variable of type unsigned char.
Also, hardcoding the input file path is not really good practice, you should take it as a command-line argument. And you need to check that the input file was opened successfully, otherwise your program will segfault.
Re: Beginners programming challenge #28
For those interested, here's how to capture ARP (and other) packets for yourself with Wireshark:
1. Install Wireshark if you haven't installed it already. By default, only root can capture traffic, which is inconvenient, so it is better to allow yourself to do it (I assume Debian/Ubuntu, see here for other OSes). Run
Code:
sudo dpkg-reconfigure wireshark-common
and choose "Yes". Add yourself to the wireshark group, then log out and log back in.
2. Find a machine on your LAN that is not in your ARP table. If there is none, delete an entry from it with
Code:
sudo arp -d <IP_ADDRESS>
3. Run Wireshark. In the "Interface List" panel, click the interface you want to capture on.
http://imageshack.us/a/img688/8418/wireshark1.th.png
Then try to send an IP packet to the other machine, with e.g. ping:
http://imageshack.us/a/img820/4953/wireshark2.th.png
4. To save a packet in binary form (to use in your program), select a packet and click the "Frame" header so that all the bytes of the packet are selected:
http://imageshack.us/a/img31/5363/wireshark3.th.png
Then click File > Export > Selected packet bytes.
5. To create a hexdump of your packet in a format similar to the ones I provided, open a terminal and do:
Code:
od -Ax -tx1 -v packet.bin > packet.bin.hexdump
Re: Beginners programming challenge #28
Quote:
Originally Posted by
Bachstelze
On lines 45 and 46 you have a semicolon that should not be there, I assume it's a typo.
Hmm, it's not there in my original file, must have been a copypasta glitch. :confused:
Thanks for the feedback! I would never have thought about the EOF thing.
Re: Beginners programming challenge #28
356 views and only one entry, I thought it would be more popular than that...
Re: Beginners programming challenge #28
Quote:
Originally Posted by
Bachstelze
356 views and only one entry, I thought it would be more popular than that...
I'm not having a lot of free time, but the challenge is really interesting and seems pretty doable. I'll surely try it if I can; I'm trying to learn Caml these days, might give it a shot with that.
Even if I don't I've learned something new about ARP today, so, well, thank you for the input anyway.
Re: Beginners programming challenge #28
I have a question about the Ethernet Frame. It's the first time I look at this kind of subjects, so it might be a dumb question.
As far as I can understand, the endiannes is the way a binary number is represented. A big endian representation has the most significant digit coming first (or left), and a little endian last (or right). The usual notation for a binary number written by hand is big endian.
The ethernet frame is, as far as I discovered, big endian on the bytes, and little endian on the bits; does that mean that the decimal number 270 would be represented as 0x80 0x70, instead of 0x01 0x0e?
If so, is the payload represented in the same way? I mean, if in the payload all I wanted to send was 0x01 0x0e, would it be represented as 0x80 0x70?
Re: Beginners programming challenge #28
Bit-endianness does not matter here because we read entire bytes, not bits, so you can consider that the Ethernet frame is just an array of bytes, that appear in the same order and have the same value that what you see when you do a hexdump. Ethernet frames are only transmitted bit-little-endian "on the wire", at each end the bytes are reconstructed and bit-endianness ceases to matter.
Re: Beginners programming challenge #28
Quote:
Originally Posted by
Bachstelze
Bit-endianness does not matter here because we read entire bytes, not bits, so you can consider that the Ethernet frame is just an array of bytes, that appear in the same order and have the same value that what you see when you do a hexdump. Ethernet frames are only transmitted bit-little-endian "on the wire", at each end the bytes are reconstructed and bit-endianness ceases to matter.
Perfect. Thanks!