IS 4700 / CS 5750 - Social Computing

General Information

Professors:Christo Wilson and Alan Mislove
Room:West Village H 108
Time:Mondays and Wednesdays, 2:50-4:30pm
Office Hours:TBA
Teaching Assistant:Arash Molavi Kakhki
TA Email:arash@ccs.neu.edu
Lab Hours:TBA
Class Forum:On Piazza
Paper List:Here
Syllabus:Here
Course Project Description:Here

Course Description

Recently, online social networking sites have exploded in popularity. Numerous sites are dedicated to finding and maintaining contacts and to locating and sharing different types of content. Online social networks represent a new kind of information network that differs significantly from existing networks like the Web. For example, in the Web, hyperlinks between content form a graph that is used to organize, navigate, and rank information. The properties of the Web graph have been studied extensively, and have lead to useful algorithms such as PageRank. In contrast, few links exist between content in online social networks and instead, the links exist between content and users, and between users themselves.

The resulting graph is used to connect and to communicate. Unlike previous networks, graphs in online social networks intermingle people and content, allow systems designers to relate the reputation of content to the reputation of users, and vice versa. It opens the door for new types of systems, new ways of solving longstanding problems, and new security attacks and vulnerabilities.

This course provides a detailed look at popular social information systems, including from online social networks (Facebook, MySpace, Orkut), blogging and microblogging platforms (LiveJournal, Blogger, Twitter), social recommendation engines (Digg, Reddit, last.fm), collaborative organization (Wikipedia), and content sharing sites (Flickr, YouTube). Coursework includes studying models (both formal and sociological) of social information systems, and the application of them both in theory and by analyzing real data from social network interactions.

The graduate version of this courses places greater emphasis on the computing infrastructure that underlies the emerging systems. Focuses on building scalable systems for managing and manipulating large amounts of data, on ensuring privacy for the users, on designing and using interfaces for third-party applications, and on leveraging the mobile nature of the access mechanisms that many users use. A course project of the students choosing will be expected.

Logistics

The class will meet twice per week for 90-minute sessions. As this is a discussion-based class, you are required to attend and participate.

Prerequisites

The CS5750 version of this course is intended for Computer Science Master's and Ph.D. students; the IS4700 version is intended for Computer Science or Information Science B.S. students. We expect you to understand the basics of computer systems, and to have experience implementing non-trivial systems-and-networking-type projects. You should also be able to read unix manual pages, and be able to familiarize yourself with unix utilities.

This course will be partially project-centric, and all students will complete in projects in groups of two (or possibly three, if necessary). Thus, to succeed in this course, you must be able to work in a group. We will allow you to form your own groups, and the course staff will serve as a matching service if necessary. As you are free to choose your partner(s), we will not be sympathetic to complaints at the end of the semester about how your group-mates did not do any work.

Finally, to succeed in this course, you should have some experience with programming in unix development tools, as well being willing to learn how to use online APIs (e.g., Facebook, Twitter). It is also highly recommended that you become familiar with using a debugger, as this will greatly aid you in completing the projects. At a high level, you should be motivated, eager to learn, willing to work hard, and make up, on your own, any prerequisite deficiencies you may have.

Class Forum

The class forum is on Piazza. Why Piazza? Because they have a nice web interface, as well as iPhone and Android apps. Piazza is the best place to ask questions about projects, programming, debugging issues, exams, etc. In order to keep things organized, please tag all posts with the appropriate hashtags, e.g. #lecture1, #project3, etc. We will also use Piazza to broadcast announcements to the class. Bottom line: unless you have a private problem, post to Piazza before emailing the professors and TAs.

Schedule, Lecture Slides, and Assigned Readings

#DateTopicReadingsSummary #Comments
1 Sept. 4 Intro, Basic Graph Theory [9] - Ch. 2 Join Piazza
2 Sept. 9 Graph Types and Models [3], [39] summary02
3 Sept. 11 Sociology Background [9] - Ch. 3-3.2, [11], [22] summary03
4 Sept. 16 Graph Sampling and Representation [23], [19] summary04
5 Sept. 18 Interactions Over Networks [40], [15], [20] summary05
6 Sept. 23 Information Dissemination and Influence [6], [7] summary06
7 Sept. 25 Ranking Nodes [32], [16], [10] summary07 Hw. 1 Due
8 Sept. 30 Graphs Over Time [43], [41] summary08 Instructor Meeting Deadline
9 Oct. 2 Link Prediction [18], [2] summary09
10 Oct. 7 Community Detection [27], [4], [1] summary10
11 Oct. 9 Privacy [17], [24] summary11 Project Proposal Due
Oct. 14 No Class - Columbus Day
12 Oct. 16 Anonymization and Deanonymization [26], [34], [8] summary12 Hw. 2 Due
13 Oct. 21 Spam [12], [35] summary13
14 Oct. 23 Sybil Attacks [37], [36] summary14
15 Oct. 28 Geo-social Networks [29], [28] summary15
16 Oct. 30 Content Rating Systems [25], [31] summary16
17 Nov. 4 Network Economics [5], [33] summary17
18 Nov. 6 Human Mobility [30], [38] summary18 Interim Report Due
Nov. 11 No Class - Veteran's Day
19 Nov. 13 New Approaches to Services [42], [13] summary19 Hw. 3 Due
20 Nov. 18 Personalization and Price Discrimination [14], [21] summary20
21 Nov. 20 No Class - Work on Your Projects!
22 Nov. 25 No Class - Work on Your Projects!
Nov. 27 No Class - Thanksgiving
23 Dec. 2 Student Presentations Sarah & Vatsa Xiaofeng, Zhuoli & Qizhen Amitash & Rushabh
24 Dec. 4 Student Presentations Nick & Moon Prachi, James & Ryan
TBA Final Report Due

Textbook

The recommended (but not required) textbook for the course is:

This book is available freely online. Other useful texts and resources include: Note that none of the textbooks are required; they are only there for your reference. Any recent edition of these books should suffice.

Presentations and Summaries

This course will have a focus on reading and discussing research papers. As such, you will be required to make a approximately 20-minute presentation on one or more research papers and lead the class discussion on the paper. We expect high-quality presentations; you should expect to spend 8 to 10 hours preparing your presentation. In addition, you should come to class with a list of questions that will spark an interesting discussion.

You should also read all of the assigned papers before the class to ensure that you arrive in class prepared to take part in the discussion. To ensure that you have read the papers, you are required to submit approximately 500-word summaries of all of the papers (500 words in aggregate) by midnight before the class. This submission must be in ASCII text format (no Word, PDF, etc). In aggregate, these summaries will count for 10% of your grade.

Summaries are due at 11:59:59pm on the day before the lecture. Summaries will not be accepted late, and slip days cannot be used on summaries.

Homeworks

The goal of this course is to teach the both the fundamentals of social information systems, as well as how to write programs which leverage social media. As such, there will be multiple homework programming projects throughout the semester (in addition to a course-long project, described below). You will form groups of two people to do the homeworks.\footnote{If necessary, one group of three will be allowed.} To collaborate effectively, you should both be involved in all of the major design decisions. You should also determine a partitioning of responsibilities so that you can both work effectively in parallel. For example, one might be responsible for generating all the test code while the other is responsible for the main code. You may switch groups between homeworks.

As the graduate version of this course will contain extra requirements, it is strongly recommended (but not required) that teams be formed of either all undergraduates or graduate students. If any of the team members are graduate students, the project will be graded as a graduate student project and will not receive any additional credit.

Homeworks are due at 11:59:59pm on the specified date. You do not need to inform the instructor about the use of slip days; they will be automatically deducted.

AssignmentDescriptionDue DatePiazza Tag
Homework 1Basic Graph AnalysisSeptember 25#hw1
Homework 2Link PredictionOctober 16#hw2
Homework 3Twitter SpamNovember 13#hw3

Course Project

You will also decide on a project of your choosing as a course-long project. The goal of the project is to conduct a miniature version of a "real" research project. You will first pick a topic, and argue in a written research proposal that this is a topic worth exploring, and that you are capable and prepared to do so. You will design and implement a solution to the problem you have chosen, and quantitatively evaluate your solution. You will then write up the results of your project in a draft final report, which, after review, you will turn into a final report. Finally, you will give a 25-minute presentation of the results of your project in class.

The syllabus for the course project is available here.

I emphasize that this is a "research project", and not a "programming project". Although the implementation of your solution is an essential component, it is only one aspect of the project, next to other equally important components, such as the evaluation and the presentation of your results.

You are free to choose any project topic related to this course (with instructor approval). For example, we have collected a large amount of data on online social networks (e.g., a large collection of tweets from Twitter). This suggests an interesting avenue to explore, as the data has not been looked at by researchers before. Finally, if you are doing large-scale data analysis, we will try to get computing resources that are appropriate for the amount of data analysis you aim to do.

The course project is due at 11:59:59pm on December 10, 2013. No slip days may be used on the course project. The course project also has intermediate deadlines that will be announced as the course progresses.

Teamwork

You will form groups of two people to do the homeworks and programming projects (if necessary, one group of three will be allowed). To collaborate effectively, you should both be involved in all of the major design decisions. You may switch groups between programming projects.

Important: You alone are responsible for finding a partner. The class forum located on Piazza is a particularly good resource for this - there will be a thread there that serves exactly this purpose. Right before or right after lecture, as well as during TA lab hours, are also a good time to look for partners.

We often receive complaints that somebody cannot find their partner, or that their partner continues to promise things that are never delivered. To address this concern, the policy is {\bf you flake, you fail}. Simply put, if you disappear, or are generally not pulling your own weight at any time during the semester, you get an F in the course right then. End of story. If you don't completely flake, but are under-responsive, we reserve the right to design an appropriate (but still fair) way of redistributing points. If your partner is flaking on you, do not wait until the end of the semester to let the course staff know; let us known immediately.

Of course, disasters happen that may pull you away from campus. You are responsible for notifying your partner and the course staff if a major time conflict arises in your life. In the real world, you don't just disappear from your job for a week. You tell people you have to go. The same thing applies here. Likewise, if you feel you're going to need to drop this class, then do it between projects, not in the middle of one. Dropping the course in the middle of a project may be allowed by the university, but it's extremely rude to your partner(s). Be polite.

One useful bit of advice: work together with your partner. We don't mandate pair programming, but it really works. You'll be more than twice as effective as you might if you split the work up and did it separately. Also, you'll avoid the sort of rude surprises that often arise when partners have different expectations.

Submitting Summaries, Homeworks, and Projects

We will use Khoury College Linux-based scripts for submitting homeworks and programming projects; instructions for using these will be included with each project. Note that submitting projects via email is not acceptable, and no credit will be given for these submissions, and no extensions will be granted. You can submit your projects multiple times, and we will grade the latest submission. We suggest submitting your project a few times well before the deadline to ensure there are no configuration errors.

Before you can submit any assignments, you must register for the system with your student ID. To do so, ssh into any Khoury College Linux machine and execute

bash$ /course/cs5750f13/bin/register-student ID
About to register user 'USER' with student ID 'ID'.  Is this correct? [yn]
where ID is your student ID, with any leading 0s removed. For example, if your NEU student ID is 003044, you would enter
bash$ /course/cs5750f13/bin/register-student 3044
Double-check that the userid and student ID are correct; if so, type y and Enter. If not, type n and Enter. If it was successful, you will see the message
bash$ /course/cs5750f13/bin/register-student ID
About to register user 'USER' with student ID 'ID'.  Is this correct? [yn] y
User 'USER' is now successfully registered for CS5750 with student ID 'ID'.
bash$ 
If you see any other message (in particular, a message with "Error" in it), it has not succeeded. Email the instructor with the error message if you are not able to diagnose the problem.

To submit summaries, you will use the CS5750 turnin system described above. In particular, you should run

/course/cs5750f13/bin/turnin summary02 /path/to/file
if you wanted to submit the summary for lecture 02 (see the schedule for the list of assigned lecture numbers and submission keyword). When run, you should see output that looks like
bash$ /course/cs5750f13/bin/turnin summary02 ~/cs5750/summary02.txt
  Added file summary02.txt (28392 bytes)
Successfully submitted summary02 for user amislove (confirmation ZiwKE5).
Submitted a total of 1 files (28392 bytes) in 0 directories.
The script will print out every file that you are submitting, make sure that it prints out all of the files you wish to submit. Finally, make sure you see the "Successfully submitted" link at the end. You should also receive an email confirmation of your submission. If you receive any other error messages, email the instructor with the error message if you are not able to diagnose the problem.

Exams

There will be no exams.

Grading

The breakdown of the grades in this course is

Course Project:40%
Homeworks:20% (5% each)
Paper Presentation(s):20%
Paper Summaries:10%
Participation:10%
Each project and homework will include a breakdown and description of how it will be graded.

Final grades will be calculated by summing up the points obtained by each student (the points will sum up to some number x out of 100) and then applying the following scale to determine the letter grade: [0-60] F, [60-62] D-, [63-66] D, [67-69] D+, [70-72] C-, [73-76] C, [77-79] C+, [80-82] B-, [83-86] B, [87-89] B+, [90-92] A-, [93-100] A. Final grades will not be curved in any way.

Any requests for grade changes or regrading must be made within 7 days of when the work was returned. To ask for a regrade, attach to your work a page that specifies (a) the problem or problems you want to be regraded, and (b) for each of these problems, why do you think the problem was misgraded.

Requests for Regrading

In this class, we will use the Coaches Challenge to handle requests for regrading. Each student is allotted two (2) challenges each semester. If you want a project or a test to be regraded, you must come to the professors office hours and make a formal challenge specifying (a) the problem or problems you want to be regraded, and (b) for each of these problems, why you think the problem was misgraded. If it turns out that there has been an error in grading, the grade will be corrected, and you get to keep your challenge. However, if the original grade was correct, then you permanently lose your challenge. Once your two challenges are exhausted, you will not be able to request regrades. You may not challenge the use of slip days, or any points lost due to lateness.

Note that, in the case of projects, all group members must have an available challenge in order to contest a grade. If the challenge is successful, then all group members get to keep their challenge. However, if the challenge is unsuccessful, then all group members permanently lose one challenge.

Late Policy

For programming projects, we will use flexible slip days. Each student is given ten (10) slip days for the semester. You may use the slip days on any project or homework during the semester in increments of one day. For example, you can hand in one project ten days late, or one project two days late and two projects four days late. You do not need to ask permission before using slip days; simply turn in your assignment late and the grading scripts will automatically tabulate any slip days you have used.

Slip days will be deducted from each group member's remaining slip days. Keep this stipulation in mind: if one member of a group has zero slip days remaining, then that means the whole group has zero slip days remaining.

After you have used up your slip days, any project handed in late will be marked off using the following formula:

Original_Grade * (1 - ceiling(Seconds_Late / 86400) * 0.2) = Late_Grade

In other words, every day late is 20% off your grade. Being 1 second late is exactly equivalent to being 23 hours and 59 minutes late. Since you will be turning-in your code on the Khoury College machines, their clocks are the benchmark time (so beware clock skew between your desktop and Khoury College if you're thinking about turning-in work seconds before the deadline). My late policy is extremely generous, and therefor we will not be sympathetic to excuses for lateness.

Cheating Policy

It's ok to ask your peers about the concepts, algorithms, or approaches needed to do the assignments. We encourage you to do so; both giving and taking advice will help you to learn. However, what you turn in must be your own, or for projects, your group's own work. Looking at or copying code or homework solutions from other people or the Web is strictly prohibited. In particular, looking at other solutions (e.g., from other groups or students who previously took the course) is a direct violation. Projects must be entirely the work of the students turning them in, i.e. you and your group members. If you have any questions about using a particular resource, ask the course staff or post a question to the class forum.

All students are subject to the Northeastern University's Academic Integrity Policy. Per Khoury College policy, all cases of suspected plagiarism or other academic dishonesty must be referred to the Office of Student Conduct and Conflict Resolution (OSCCR). This may result is deferred suspension, suspension, or expulsion from the university.

Accommodations for Students with Disabilities

If you have a disability-related need for reasonable academic accommodations in this course and have not yet met with a Disability Specialist, please visit www.northeastern.edu/drc and follow the outlined procedure to request services. If the Disability Resource Center has formally approved you for an academic accommodation in this class, please present the instructor with your "Professor Notification Letter" at your earliest convenience, so that we can address your specific needs as early as possible.

Title IX

Title IX makes it clear that violence and harassment based on sex and gender are Civil Rights offenses subject to the same kinds of accountability and the same kinds of support applied to offenses against other protected categories such as race, national origin, etc. If you or someone you know has been harassed or assaulted, you can find the appropriate resources here.