CS 3700 - Project 2

CS 3700 - Networks and Distributed Systems

Project 2: Simple Bridge

This project is due at 11:59pm on October 7, 2016. The milestone is due at 11:59pm on September 28, 2016

Description

You will implement a simple bridge that is able to establish a spanning tree and forward packets between its various ports. Since running your bridge on Northeastern's network would likely impact real traffic, we will provide you with a simulation environment that emulates LANs and end hosts. Your bridges must construct a spanning tree (and disable any ports that are not part of the tree), must forward packets between these ports, learn the locations (ports) of hosts, and handle both bridge and port failures (e.g., by automatically reconfiguring the spanning tree).

Your bridges will be tested for both correctness and performance. Part of your grade will come from the overhead your network has (i.e., lower overhead earns a higher score) and the fraction of packets that are successfully delivered between end hosts. Your results will be compared to your classmates' via a leaderboard.

Language

You can write your code in whatever language you choose, as long as your code compiles and runs on unmodified Khoury College Linux machines on the command line. Do not use libraries that are not installed by default on the Khoury College Linux machines. You may use IDEs (e.g. Eclipse) during development, but do not turn in your IDE project without a Makefile. Make sure you code has no dependencies on your IDE.

Your Program

For this project, you will submit one program named 3700bridge that implements a bridge. You may use any language of your choice, and we will give you basic starter code in Perl and Python. You may not use any bridging libraries in your project; you must implement all of the bridge logic yourself.

In reality, a bridge is a hardware box that has a bunch of Ethernet jacks (or ports) on it. A network administrator would plug cables into these ports, and Ethernet frames would begin arriving on each port. The primary function of the bridge is to forward each frame from the source port on which arrived to the correct destination port, based on the MAC addresses in the Ethernet frame header. This is known as routing or forwarding. However, as we have discussed in class, a bridge may be plugged in to a network that has loops, which can cause the network to fail. Thus, the secondary function of the bridge is to implement the spanning tree protocol, and coordinate with the other bridges in the network to form a spanning tree. Each bridge in the network helps to form the tree by turning off frame forwarding on specific ports, such that all of the remaining open ports in the network across all bridges form a tree.

Conceptually, your program is going to act like a bridge. When your program is executed, it will open several sockets, each of which corresponds to one "port" on your bridge. Thus, your program will have multiple open sockets. This contrasts with Project 1, where your client had only a single open socket. Your bridge will receive packets on these sockets, and you will have to forward each packet to the correct destination socket. Furthermore, since real networks often include many bridges, we will execute multiple copies of your program concurrently to create a network with many bridges. Each running instance of your bridge can communicate with neighboring instances by sending them messages over the network (i.e. using its ports). In other words, you will be building a distributed system in this project.

If you use C or any other compiled language, your executable should be named 3700bridge. If you use an interpreted language, your script should be called 3700bridge and be marked as executable. If you use a virtual machine-based language (like Java or C#), you must write a brief Bash shell script, named 3700bridge, that conforms to the input syntax below and then launches your program using whatever incantations are necessary. For example, if you write your solution in Java, your Bash script might resemble

#!/usr/bin/perl -w $args = join(' ', @ARGV); print 'java -jar 3700bridge.jar $args';

Requirements

To simplify the project, instead of using real packet formats, we will be sending our data across the wire in JSON (many languages have utilities to encode and decode JSON, and you are welcome to use these libraries). Your bridge program must meet the following requirements:

Coordinate with the other bridges to form a spanning tree, in order to prevent packet loops
Handle the failure of bridges and the introduction of new bridges and LANs over time
Learn the locations of end hosts
Deliver end host packets to the destination
Handle the mobility of end hosts between LANs
Senders and receivers must print out specified debugging messages to STDOUT
Your program must be called 3700bridge

You should implement a simplified version of the standard bridge spanning tree protocol that we discussed in class. Note that more sophisticated and properly tuned algorithms (i.e., those which perform better) will be given higher credit. For example, some desired properties include (but are not limited to):

Fast convergence: Require little time to form a spanning tree
Low overhead: Reduce packet flooding when possible

Regardless, correctness matters most; performance is a secondary concern. We will test your code and measure these two performance metrics; better performance will result in higher credit. We will test your code by introducing a variety of different errors and failures; you should handle these errors gracefully, recover, and never crash.

Simulator and Starter Code

Rather than testing your bridge on a real network, we will test it in a simulator. The simulator takes care of setting up one or more virtual LANs, placing hosts on each of these LANs, and executing copies of your bridge in order to connect the LANs to each other. The simulator is implemented in a script named run that is available in /course/cs3700f16/code/project2. To get started, you should copy this directory into your own local directory (i.e., cp -r /course/cs3700f16/code/project2 ~/), since you will need the run and test scripts, as well as the example configuration files.

At no point during this project should you modify the run script. Although you are certainly welcome to read script to understand how it works, you do not need to modify it in order to complete this assignment. Furthermore, when we grade your code, we will use the original versions of run and test, not your (possibly modified) versions.

Very basic starter code for the assignment in Perl and Python is also available in /course/cs3700f16/code/project2. The starter code provides a bare-bones implementation of a bridge that simply connects to the LANs and broadcasts a "Hello world!" message twice every second. You may use this code as a basis for your project if you wish, but it is strongly recommended that you do not do so unless you are very comfortable with Perl or Python.

Program Specification

The command line syntax for your bridge is given below. The bridge program takes command line arguments representing (1) the ID of the bridge, and (2) the LAN or LANs that it should connect to. The bridge must be connected to at least one LAN. The syntax for launching your bridge is therefore:

./3700bridge <id> <LAN> [LAN [LAN ...]]

id (Required) The id of the bridge. For simplicity, all bridge ids are four-digit hexadecimal numbers (e.g., 0aa1 or f29a).
LAN (Required) The unique name of the LAN(s) the bridge should connect to. The LANs are named using unique ASCII strings; the names themselves are not meaningful.

Connecting to the LAN

We will be using UNIX domain sockets to emulate the LANs, with one domain socket per LAN that your bridge is connected to. You do not need to be intimately familiar with how these work, but they essentially give you a socket-like device that you can read and write from. Whenever you write to it, all other end hosts and bridges on that LAN will receive your message. You should constantly be reading from it to make sure you receive all messages (they will be buffered if you don't read immediately).

One thing to note is that we will be using abstract domain sockets, which means that you should put a \0 byte before the LAN name when you are connecting to the socket, as well as pad the LAN name with \0 if it less than 108 bytes long. So, if you were trying to connect to the LAN named LAN#cbw#1, the name that you would actually connect to is \0LAN#cbw#1\0... <114 \0>...\0. We will be using the SOCK_SEQPACKET socket type, which basically provides a reliable message-oriented stream.

Exactly how to connect to a UNIX domain socket depends on your programming language. For example, if you were using perl to complete the project, your code for connecting would look like:

use IO::Socket::UNIX;
my $lan = IO::Socket::UNIX->new( Type => SOCK_SEQPACKET, Peer => "\0<lan>\0...\0" );

where <lan> is the name of the LAN that is passed in on the command line. You can then read and write from the $lan variable. In python, your code would look like

from socket import socket, SOCK_SEQPACKET, AF_UNIX
s = socket (AF_UNIX, SOCK_SEQPACKET)
s.connect('\0<lan>\0...\0')

with similar results.

We encourage you to write your code in an event-driven style using select() or poll() on all of the LANs that your bridge is connected to. This will keep your code single-threaded and will make debugging your code significantly easier. Alternatively, you can implement your bridge in a threaded model (with one thread attached to each LAN), but expect it to be significantly more difficult to debug.

Packet Format

In order to simplify the development and debugging of this project, we use JSON (JavaScript Object Notation) to format all messages sent on the wire. Many common programming languages have built-in support for encode and decoding JSON messages, and you should use these when sending and receiving messages (i.e., you do not have to create or parse JSON messages yourself). The format of all messages is

{"source":"<source>", "dest":"<destination>", "type":"<type>", "message":{<message>}}

where <source> and <destination> are either bridge or end host addresses. Recall that all addresses are four-byte hexadecimal numbers (e.g., 98a2), and a special broadcast address ffff indicates the packet should be received by all hosts and bridges. Additionally, <message> should be the JSON-encoded message itself, and <type> is either 'bpdu' for BPDUs or 'data' for end-host data packets. For example, a BPDU that you send from bridge 02a1 might look like

{"source":"02a1", "dest":"ffff", "type": "bpdu", "message":{"id":"92b4", "root":"02a1", "cost":3}}

All data packets will include a unique id field that you should use refer to that message. For example, a data message from host 28aa to 97bf might look like

{"source":"28aa", "dest":"97bf", "type": "data", "message":{"id": 17}}

BPDU Messages

You should configure your bridges to periodically broadcast BPDUs on all ports. You should broadcast BPDUs no more frequently than once every 500 ms. Using those BPDUs, you should constantly be listening for new roots, new bridges, etc, and should make decisions about which ports are active (i.e. root and designated ports) and inactive upon receiving each BPDU. Additionally, you should "timeout" BPDUs after 750 ms. To aid in grading and debugging, your bridge program should print out messages about the spanning tree calculation to STDOUT. When starting up, your bridge should print out

Bridge <id> starting up

where <id> is the ID of the bridge. When your bridge selects a new root, it should print out

New root: <id>/<root>

where <id> is the ID of the local bridge and <root> is the ID of the new root. When your bridge changes its root port, it should print out

Root port: <id>/<port_id>

where <port_id> is the port number (0-indexed). When your bridge decides that a port is the designated port for a LAN, it should print out:

Designated port: <id>/<port_id>

Finally, when your bridge decides that a port should be disabled, it should print out:

Disabled port: <id>/<port_id>

Data Messages

Additionally, your bridge should build up a forwarding table as discussed in class. You should "timeout" forwarding table entries 5 seconds after receiving the last message from that address. When any of your bridge's ports changes state (designated, root, etc), you should flush your forwarding table. When forwarding data packets, your bridge program should print out the following messages to STDOUT. When your bridge receives a message on an active port (i.e., not disabled), it should print out

Received message <id> on port <port_id> from <source> to <dest>

where <id> is the unique identifier of the data message, <port_id> is the port number on the bridge that the message was received on, and <source> and <dest> are the source and destination of the message. (Note that your bridge should silently ignore all messages [other than BPDUs] on disabled ports) Once your bridge makes a forwarding decision about the message, it should print out one of three messages:

Forwarding message <id> to port <port_id>

Broadcasting message <id> to all ports

Not forwarding message <id>

Thus, every non-BPDU message your bridge receives on an active port should have one of the above three lines printed out. This will help you to debug why your bridges are misrouting messages (if this should ever occur).

Testing Your Code

In order for you to test your code, we have included a Perl script that will create the emulated LANs, run your bridge program and connect it to these LANs, start and stop your bridges in a configurable way, and create and record end host traffic. This script is included in the starter code, and you can run it by executing

./run <config-file>

where <config-file> is the configuration file that describes the network you would like to implement. Note that you do not need to parse the config files yourself; the run script does this. Instead, you can create custom configs to test your code under different scenarios.

Config File Format

The configuration file that you specify describes the LANs, the bridges, and the connections between these. It also contains information about when bridge come up and down, and the end host traffic that should be generated. The file is formatted in JSON and has the following elements

lifetime (Required) The number of seconds the simulation should run for.
bridges (Required) An array of bridge elements (described below). At least one bridge must be specified. Each bridge element is an associative array that has the following properties:
- id (Required) The ID of the bridge, a string.
- lans (Required) An array of the LAN IDs that the bridge is connected to. All LANs are identified by a non-negative number.
- start (Optional) The start time (in seconds) when the bridge should be turned on. If not specified, the bridge is started at the beginning of the simulation.
- stop (Optional) The stop time (in seconds) when the bridge should be turned off. If not specified, the bridge is stopped at the end of the simulation.
hosts (Required) The number of hosts to generate (these are randomly attached to LANs).
traffic (Required) The number of end host packets to randomly generate (these are sent with randomly selected sources and destinations).
wait (Optional) The number of seconds to wait before sending any data traffic (default of 2 seconds).
seed (Optional) The random seed to choose. If not specified, a random value is chosen. Setting this value will allow for a reproducible set of hosts and traffic.

For example, a simple network with two LANs connected by a single bridge would be:

{
    "lifetime": 30,
    "bridges": [{"id": "A", "lans": [1, 2]}],
    "hosts": 10,
    "traffic": 1000
}

In this case, one copy of your bridge will be executed, and it will open two sockets. A more complex network may be:

{
    "lifetime": 30,
    "bridges": [
        {"id": "A", "lans": [1, 3]},
        {"id": "B", "lans": [2, 3], "stop": 7},
        {"id": "C", "lans": [1, 2], "start": 5, "stop": 9},
        {"id": "D", "lans": [2, 4]},
        {"id": "E", "lans": [2, 4, 5, 6]}
    ],
    "hosts": 100,
    "traffic": 10000
}

In this case, five copies of your bridge will be executed in parallel. Instance one will receive ID "A" on the command line, and it will open two sockets. Instance five will receive ID "E" and it will open four sockets. Note that in this simulation, bridges will be stopped and started in the middle of the test, i.e. one bridge will not be present at the beginning of the simulation, and two bridges will "fail" during the simulation.

./run Output

The output of the ./run script includes timestamps and all logging information from your bridges and the emulated end hosts. Note that all data traffic will be delayed for 2 seconds at the beginning of the simulation to allow your bridges to form an initial spanning tree. At the end, the output also includes some statistics about the your bridges' performance:

bash$ ./run config.json
...
[ 14.9990 Host ed10] Sent message 776 to 41c1
[ 15.0001 Bridge 92ba] Received message 776 on port 0 from ed10 to 41c1
Simulation finished.
Total packets sent: 6730
Total data packets sent: 2000
Total data packets received: 1984
Total data packets dropped: 16 (message ids 52, 70, 181, ...)
Total data packets duplicated: 17 (message ids 311, 433, 541, ...)
Data packet delivery ratio: 0.992000

Each of the fields is self-explanatory. Ideally, you would like all messages to be delivered (a delivery ratio of 1.0) and the number of packets dropped and duplicated to be 0 (a message can cause two packets to be delivered if the network is being re-configured when it is sent). Additionally, you want the number of total packets sent to be low as well (this includes BPDUs, which are overhead).

Testing Script

Additionally, we have included a basic test script that runs your code under a variety of different network configurations and also checks your code's compatibility with the grading script. If your code fails in the test script we provide, you can be assured that it will fare poorly when run under the grading script. To run the test script, simply type

bash$ ./test
Basic (no failures, no new bridges) tests (PDR = 1.0)
    One bridge, one LAN [PASS]
    One bridge, two LANs [PASS]
    One bridge, three LANs [PASS]
    Two bridges, one LAN [PASS]
    Two bridges, two LANs [PASS]

This will run your code on a number of configurations, and will output whether your program performs sufficiently. If you wish to run one of the tests manually, you can do so with

bash$ ./run basic-4.conf

Implementing Your Bridge

Although implementing your bridge may seem like a daunting task, it is manageable if you carefully plan the order in which you implement features. We recommend implementing your bridge using the following steps:

Before you write any code, look at some of the test cases and understand what kind of network topologies they create. Draw the topologies on a piece of paper, and manually go through the process of determining which bridge is the root, which ports are root ports, and which ports are designated. If you cannot complete this exercise manually, you have no hope of implementing a program that honors the spanning tree protocol.
If you're not using the starter code, start by writing code to open the Unix domain sockets, and use select() or poll() to read from them. Try sending simple "Hello World" messages on the ports.
Start implementing the spanning tree protocol by creating and sending BPDUs. At this point, focus on getting each bridges to agree on who is the root (i.e. the bridge with the lowest ID) and which port is the root port (i.e. the port with the shortest path to the root). On startup, each bridge believes it is the root, and broadcasts a BPDU on all ports. As BPDUs arrive from neighbors, read the neighbor's root ID and determine if they should be the root. Note that multiple ports may have paths to the root, in which case you will need to break ties by looking at the cost of the path in hops, and the neighbor's ID. Whenever a bridge learns about a new root, or a better path to the root, it needs to broadcast updated BPDUs to its neighbors. At this point, do not worry about timeouts or forwarding tables. Packets from hosts can simply be dropped.
Complete the basic spanning tree protocol by implementing the selection of designated ports. A bridge Bwill designate a port P if B offers the shortest path to the root (or the only path to the root) for hosts on port P. Designated ports are "active", i.e. the bridge should forward packets from hosts that are received on the port. If a port is not designated, then it is "inactive", i.e. packets from hosts received on the port should be dropped. Note that this does not mean you should close() sockets that are inactive: your bridge always needs to be able to receive and broadcast BPDUs on all ports, and non-designated ports may become designated in the future if the network topology changes.
At this point, your bridges should be able to form a spanning tree, i.e. by globally agreeing on which bridge is the root, and setting the root port and all designated ports to "active". Next, implement packet forwarding by constructing a forwarding table. As packets from hosts arrive, add the source information to your forwarding table, and attempt to lookup the correct destination port for the packet. If the destination is not in the forwarding table, broadcast the packet on all designated and root ports.
Start implementing timeouts on BPDUs and forwarding table entries. Recall that your bridge should broadcast BPDUs on all ports at least once every 500 milliseconds. You will need to record the last time you received a BPDU entry from each neighbor, and time them out if 750 milliseconds pass without a fresh BPDU. If a BPDU times out, this indicates that a bridge has failed somewhere in the network, which should trigger reconstruction of the spanning tree. Additionally, forwarding table entries should time out if they are not refreshed every 5 seconds.
At this point, your implementation should be quite solid, and passing most, if not all, test cases. Now focus on tuning the performance of your system. You may adjust timeouts to improve convergence speed, or buffer packets during times of spanning tree construction, etc.

We recommend starting off with test cases simple-[1, ..., 6] and intermediate-[1, 2]. Use these cases to test your spanning tree protocol implementation. Once you are confident in your spanning tree, move on to simple-[7, 8, 9] and intermediate-[3, 4] to stress test your spanning tree, and begin working on packet forwarding. Move on to the advanced-* cases when you are ready to begin implementing timeouts and error recovery.

Corner-cases, Gotchas, and Subtleties

One of the most challenging things about this assignment is that there are many corner-cases and subtleties to consider in your implementation. Here are some things to think about as your design and implement your bridge:

How should the bridge treat a port on which no BPDUs are received?
How should the bridge treat a port on which messages are received from itself (i.e. bridge 01AC send a packet on port 1 and receives that exact same packet on port 2)?
When the spanning tree topology changes (i.e. a new bridge is added, or a bridges fails) what happens to the bridge's existing root and designated ports?
What should you do to your forwarding table after the spanning tree topology changes?
Much of this assignment depends on precise timing. However, select() and poll() are blocking calls (read the documentation). How can you ensure that your bridge meets its obligations (i.e. sending BPDUs every 500 ms) while still using these APIs?

A Note on Cheating

It is possible to write a very simple program that will pass all test cases without implementing the spanning tree protocol. To do this, your program simply needs to record all observed packets, and then drop duplicate packets that it observes in the future.

However, implementing a bridge that contains this logic is not possible in the real world, and totally violates the spirit of this assignment. Anyone who turns in a bridge that implements this logic (or similar logic) will receive a zero on the project. I consider this solution to be cheating, and will treat groups that use this approach as cheaters. If you have any questions about a particular approach to this assignment, feel free to ask for clarification on Piazza.

Performance Testing

As mentioned in class, 10% of your grade on this project will come from performance. Your project will be graded against the submissions of your peers. To help you know how you're doing, the testing script will also run a series of performance tests at the end; for each test that you successfully complete, it will report your time elapsed and bytes sent to a common database. For example, you might see

Performance tests
    Network 1 [PASS]
        99.1% packets delivered, 3.0% overhead

This indicates that you successfully delivered 99.1% of all end-host packets and had an overhead of 3%. This score will be reported to the common database. Note that, by default, only reasonable scores are sent to the database; if your scores aren't showing up, that means you need to improve your performance.

In order to see how your project ranks, you can run

bash$ /course/cs3700f16/bin/project2/printstats
----- TEST: Eight bridges, eight LANs -----
Least overhead:
1: cbw 200 packets
2: foo 220 packets
Highest delivery ratio:
1: foo 1.00000
2: cbw 0.950000

which will print out the rank of each group for each performance test, divided into the number of packets sent and the delivery ratio. In this particular example, cbw's project has lower overhead but delivers fewer of the packets. Obviously, you would ideally have like to have more packets delivered and fewer packets sent.

Submitting Your Project

If you have not done so already, register yourself for our grading system using the following command:

$ /course/cs3700f16/bin/register-student [NUID]

NUID is your Northeastern ID number, including any leading zeroes.

Before turning in your project, you and your partner(s) must register your group. To register yourself in a group, execute the following command:

$ /course/cs3700f16/bin/register project2 [team name]

This command registers you for the milestone submission and the final submission. This command will either report back success or will give you an error message. If you have trouble registering, please contact the course staff. You and your partner(s) must all run this script with the same [team name]. This is how we know you are part of the same group.

To turn-in your project, you should submit your (thoroughly documented) code along with two other files:

A Makefile that compiles your code. Your Makefile may be blank, but it must exist.
A plain-text (no Word or PDF) README file. In this file, you should briefly describe your high-level approach, any challenges you faced, and an overview of how you tested your code.

Your README, Makefile, source code, etc. should all be placed in a directory. You submit your project by running the turn-in script as follows:

$ /course/cs3700f16/bin/turnin <project2-milestone|project2> [project directory]

The first parameter determines if you are turning in the milestone or the final submission. [project directory] is the name of the directory with your submission. The script will print out every file that you are submitting, so make sure that it prints out all of the files you wish to submit! The turn-in script will not accept submissions that are missing a README or a Makefile. Only one group member needs to submit your project. Your group may submit as many times as you wish; only the last submission will be graded, and the time of the last submission will determine whether your assignment is late.

Grading and Milestones

This project is worth 14% of your final grade in total. 1% comes from the milestone and 13% comes from the rest of the project.

By the first milestone, you must submit bridge code that successfully passes the first six test cases (simple-1.conf through simple-6.conf). Fractional points will be awarded if you pass less than all six cases, i.e. passing 4 of 6 will earn you 4/6 of the points.

The final grading in this project will consist of

75% Program correctness (based on passing test cases)
10% Performance
15% Style and documentation

At a minimum, your code must pass the test suite without errors or crashes, and it must obey the requirements specified above. All student code will be scanned by plagarism detection software to ensure that students are not copying code from the Internet or each other.