CY 2550 - Project 4

CY 2550 - Foundations of Cybersecurity

Project 4: Fuzzing

This project is due at 11:59pm on Friday, March 29.

Description and Deliverables

Hunting for bugs is a common task for all programmers, but especially those concerned about cybersecurity. While bugs my be a nuisance to typical users (i.e., people get annoyed when programs crash or don't behave as expected), bugs are a vital resource for malicious actors. Finding bugs in a program is the first step towards isolating exploitable bugs that can be leveraged for attacks against that program.

In this project you will implement a fuzzer, which is a program that tries to find bugs in other programs. Your fuzzer will take two inputs: (1) a program specification in XML format, and (2) the path to the target program. Your fuzzer will then repeatedly execute the target program with different command line arguments in an attempt to make the target program crash. Your fuzzer will report which command line inputs caused the target program to crash. In the real world, the developer of the target program would then use this information to find and patch the bugs in their program.

To receive full credit for this project, you will turn in the following three things:

A command line program, named fuzzer, that can fuzz test other command line programs.
A Makefile that compiles your fuzzer.
A README.md file that documents your fuzzer and your approach to this project.

Each of these deliverables is described in greater detail below.

Background

There are many tools and frameworks that help programmers find bugs. Some, like unit tests, are developed by the programmer, included in source code, and are tightly integrated with a given program. You gained experience writing tests in Fundies 1, since it is an integral part of the design recipe. You will have a chance to draw on those test development skills in this project.

Other bug finding tools are more general, and are designed to help find bugs in any program. Fuzz testing, or simply fuzzing, is a black-box technique for attempting to find bugs in programs. This technique is often implemented in a program called a fuzzer. The goal of a fuzzer is repeatedly execute a target program and attempt to crash it by supplying it with a variety of strange inputs. In other words, the goal of the fuzzer is to find bugs in the target, where the programmer has failed to deal with corner-cases in input handling code. Fuzzing is a black-box technique because it does not depend on the source code of the target. Any program that accepts external input (and almost all do) can be fuzzed.

Motivating Example

Let's imagine that someone wrote a very simple, command line calculator app. The calc program works like this:

$ ./calc
Usage: $ ./calc [number] [operation (+, -, *, /)] [number]
$ ./calc 1 + 1
2
$ ./calc 10 / 5
2

calc takes exactly three command line arguments: a number, a string that signifies a simple mathematical operation, and another number.

The examples above demonstrate that calc works as expected... when the command line inputs conform to the program specification. However, take a moment and speculate about all the ways that calc might crash if it was supplied with unexpected, or carefully crafted, command line inputs. This is akin to designing test cases in Fundies 1: think about all the corner cases that the developer of calc may have forgotten to handle in their code. For example:

What if the user supplies less than three arguments? What if they supply more than three arguments?
What if the user doesn't input a number for argument one or three?
What if the user doesn't input "+", "-", "*", or "/" for argument two?
What if the user inputs a negative number?
What if the user inputs a really, really large number?
What if the user inputs zero for the third argument when argument two is set to division?

The idea behind a fuzzer is that it will automatically run a target program, like calc, over and over again with different combinations of command line inputs, to try and trigger bugs and expose them, so they can be fixed.

Fuzzer Specification and Design

Your goal is to write a command line program named fuzzer that performs model-based checking of other command line programs. You may implement your program in any language that is available on the Khoury College Linux machines. We will supply starter code in Python and Java (available in /course/cs2550sp19/code/project4/ on the Khoury College Linux machines) that students may use in their implementations.

Your fuzzer must obey the following command line syntax:

$ ./fuzzer [path to XML configuration file] [path to the target program]

The fuzzer program takes exactly two command line arguments, each of which is a file path. The first path will point to an XML-formatted configuration file that describes the model for fuzz testing the target program. The second path points to the target program itself. Your fuzzer will use the given model to construct a variety of command line arguments for the target program, and will execute the target program repeatedly to see if it crashes. Your fuzzer should output all command lines that caused the target program to crash. Returning to our calc example, your fuzzer might generate the following output:

$ ./fuzzer calc_model.xml calc
./calc A B C
./calc 1 A B
./calc 1 / 0

Each of these sets of command line arguments causes calc to crash. For example, ./calc A B C and ./calc 1 A B cause it to crash because one or more arguments that are supposed to be numbers are not.

$ ./calc A B C
Traceback (most recent call last):
    File "./calc", line 9, in <module>
    n1 = int(sys.argv[1])
ValueError: invalid literal for int() with base 10: 'A'

Note that fuzzer tried to run calc with many other sets of command line arguments that did not result in crashes, thus they were not printed out. Some other example command lines that it might have tried include:

./calc A, doesn't crash because the program makes sure exactly three command line arguments are supplied
./calc A B C D, doesn't crash for the same reason as above
./calc 1 A 1, doesn't crash, or print anything, because argument two isn't a valid mathematical operator

Model Specification

Ideally, your fuzzer program should be able to test a wide variety of command line programs. However, every command line program has a different sets of optional and required arguments that it expects as input. How can we design fuzzer to be general, i.e. able to test many different kinds of command line programs?

This is where the model comes into play. Your fuzzer program will read a model, specified in XML, that describes the command line arguments of the target program. The general XML format of models is as follows:

<spec>
    <options>
        <option>
            <name>Name of the optional argument</name>
            <type>Type of the optional argument</type>
        </option>
        ...other options...
    </options>
    <positional>
        <arg>
            <type>Type of the positional argument</type>
        </arg>
        ...other positional arguments...
    </positional>
</spec>

The model specification is divided into two sections: optional arguments and positional arguments. Optional arguments are command line arguments that are typically prefaced with - or --, and as their name implies, they are optional, i.e. you may or may not pass them to a given target program. The <name> of an optional argument is literally its name on the command line. Positional arguments are the command line arguments that must be passed into the target program, in a specific order, for it to function. Both optional and positional arguments have a <type>, which is the data type of that argument. For the purposes of this assignment, the only types you will need to deal with are integers, strings, and null, which is a special case for optional arguments.

Returning to our calc example, this program takes exactly three positional command line arguments. The model specification for calc would be:

<spec>
    <positional>
        <arg>
            <type>integer</type>
        </arg>
        <arg>
            <type>string</type>
        </arg>
        <arg>
            <type>integer</type>
        </arg>
    </positional>
</spec>

This model captures our understanding of how calc should work: it takes three arguments, the first and third are integers, and the second is a string.

Here is a slightly more complex example. Consider the following program that orders food online:

$ ./buy-food
usage: buy-food [-h] [--organic] [--quantity QUANTITY] [--store STORE] food area
buy-food: error: the following arguments are required: food, area
$ ./buy-food -h
usage: buy-food [-h] [--organic] [--quantity QUANTITY] [--store STORE] food area

Buy food online by typing what you want and the city to deliver it to.
Currently only supports Boston area deliveries.

positional arguments:
  food                  The name of the food you want
  area                  The name of the area to deliver to

optional arguments:
  -h, --help            show this help message and exit
  -o, --organic         Make sure the food is organic. Default=False
  -q QUANTITY, --quantity QUANTITY
                        How many pieces of food do you want? Default=1
  -s STORE, --store STORE
                        Deliver from the given store. Default=Trader Joes
$ ./buy-food taco roslindale
Great, we're delivering your taco to roslindale!
$ ./buy-food --organic "clam juice" "harvard square"
Great, we're delivering your organic clam juice to harvard square!

This program has three optional command line arguments (four if you count --help, but we don't care about that), and two required positional arguments. The XML specification for buy-food is:

<spec>
    <options>
        <option>
            <name>--organic</name>
            <type>null</type>
        </option>
        <option>
            <name>--quantity</name>
            <type>integer</type>
        </option>
        <option>
            <name>--store</name>
            <type>string</type>
        </option>
    </options>
    <positional>
        <arg>
            <type>string</type>
        </arg>
        <arg>
            <type>string</type>
        </arg>
    </positional>
</spec>

The data types of --quantity and --store make intuitive sense. --organic has a data type of "null" because it does not accept any additional data on the command line. This type of command line argument is often referred to as a flag, and it typically turns some program functionality on or off (in this case, the desire for organic food).

Note that for the purposes of this assignment, we will make some simplifying assumptions about the behavior of command line arguments.

You can assume that the order of optional arguments does not matter. For example, if a target program takes two optional arguments, --X and --Y, you may assume that passing --X --Y is equivalent to passing --Y --X.
You may assume that optional command line arguments may only be passed to the target program once. For example, if the target program takes argument --X, there is no need to pass --X --X or --X --X --X, etc.
You may assume that all positional arguments are required by the target program, i.e. none are optional

Implementing Your Fuzzer

As part of this assignment, we will provide (1) buggy target programs for you to fuzz test, and (2) corresponding XML specifications for each target program. You do not need to develop these things yourself. Furthermore, the starter code that we provide (in Python and Java) already handles (1) parsing the command line input to the fuzzer program and (2) loading a given XML specification into a useful data structure. All of this is available in the /course/cs2550sp19/code/project4/ folder on the Khoury Linux machines.

You essentially have three tasks you must complete in order to implement the fuzzer program.

Given the XML specification data for a target program, generate all possible combinations of command line arguments to that program. For example, suppose the target program only takes a single, optional argument. That means there are two total possible command lines: one without the optional argument, and one with it. However, suppose the target program takes three optional arguments. Now there are eight total possible command lines: no arguments, three different single arguments, three combinations of two arguments, and all three arguments. Hint: all combinations of items in a set is called a powerset. The logic is slightly different for positional arguments, since they must be passed in the specified order.
Given a command line for a target program (generated during step 1), determine what value to pass for each argument. Obviously, there won't be any value to pass for arguments with a data type of null. But all other arguments will require integer or string arguments, and you must decide what value to pass for each one. Note that you will definitely want to try multiple values for each argument. For example, for an integer argument you might want to try passing a large number, a small number, zero, negative numbers, or a string.
Once you've written your logic for generating all command lines, and filling in the values for each argument, you will finally be able to run the target program and pass in these fully realized command lines. Your fuzzer will need to read the output of the target program to determine whether it crashed or not.

Detecting Crashes

To simplify things for this project, you may assume that all of the target programs will be written in Python. Furthermore, you may assume that when they crash, they generate unhandled exceptions that ultimately cause a traceback to be printed to the terminal. Thus, your fuzzer can detect crashes by examining the output of the target program and searching for this string:

Traceback (most recent call last):

If that string is in the target program's output, you may assume that it crashed, i.e. the currently configured command line triggered a bug in the target program.

Packaging Your Source Code

Because you are allowed to program in whatever language you wish, we require that all students submit a Makefile. If you choose to use a compiled language, you must turn in your source code, and the Makefile must compile your program. For example, if you write your program in C/C++, the final product of the Makefile should be a program called fuzzer.

If you choose to program in a compiled language that does not produce executable binaries (e.g. the Java compiler produces .class files), then you must include a shell script with your submission named fuzzer that can (1) invoke your program and (2) forward any given command line arguments to your program. You must also include a Makefile that transforms your source code into compiled files (e.g. .java files into .class files).

If you choose to use a language that does not need compilation (e.g. Python, Perl), you may leave your Makefile blank. We encourage students that choose to program in scripting languages to adopt shebang syntax and submit an executable script named fuzzer.

Submitting Your Project

Before turning in the project, you must register yourself for our grading system using the following command:

$ /course/cs2550sp19/bin/register-student [NUID]

NUID is your Northeastern ID number, including any leading zeroes. This command is available on all of the Khoury College lab machines.

The exact files that you submit for this assignment will vary depending on the programming language you choose to use to implement your fuzzer. At a minimum, you will probably submit:

The source code for your fuzzer program
A Makefile, which may be empty
A README.md text file that documents your fuzzer and explains your approach

You submit your project by running the turn-in script as follows:

$ /course/cs2550sp19/bin/turnin project4 <project directory>

where <project directory> is the name of the directory with your submission. The script will print out every file that you are submitting, so make sure that it prints out all of the files you wish to submit! The turn-in script will not accept submissions that are missing a Makefile or README.md. You may submit as many times as you wish; only the last submission will be graded, and the time of the last submission will determine whether your assignment is late.

At any time, you can run the following command to see all of your current grades for projects, essays, quizzes, and tests.

$ /course/cs2550sp19/bin/gradesheet

Grading

This project is worth 10% of your final grade. We will evaluate your fuzzer implementation against several held-out target programs (i.e. not provided to students ahead of time). You fuzzer will earn points for finding bugs in these programs. If your fuzzer fully explores the command line input space of each target program (i.e. all combinations of optional and positional arguments, and many common "corner case" values for each argument), it will find all of the bugs and earn full points. Note that points can be lost for turning in files in incorrect formats (e.g. not ASCII), failing to follow specified formatting or naming conventions, failing to compile, failing to follow specified command line syntax, etc.

Tips

The reference solution is roughly 90 lines of Python code, excluding comments and import statements. Roughly 30 of these lines are not in the Python starter code. The reference solution makes heavy use of the itertools module to keep things simple and tractable.
Steps one and three of the implementation plan above are the most straightforward. Step two is the hardest and requires the most creativity. Think back to how you designed test cases in Fundies 1 for inspiration about what values to pass for each argument.
Your fuzzer may end up having a lot of nested loops, because you'll be iterating over different combinations of optional and positional arguments, and for each one of those you'll be iterating over potential values for each argument. If you attempt to pack all this functionality into one, large, heavily nested function, you will quickly lose track of what's going on. Try to produce specific lists that you need, and then transform those lists iteratively until you arrive at valid command lines.
A well known problem with fuzz testing is that it can easily get bogged down in a combinatoric explosion of arguments and possible values. You don't want to have more than 5-7 possible values for any given data type.
If you're not sure you've implemented the command line of your program correctly, have one of your friends test it out. Alternatively, post some example command lines to Piazza; we'll be happy to tell you if the formatting or behavior is incorrect.
If you're using a compiled language, triple check that your code compiles and that your Makefile is free of errors before you submit. Make sure to test your compile on a Khoury College machine.