This data contains 5 different anonymized Facebook university networks (Caltech, Oklahoma, Princeton, UNC Chapel Hill, and Georgetown) which were recorded
in 2005. The adjacency matrix of each social network is provided. Information about each user is also provided, such as student/faculty status, gender, major, and dorm.
For more information about the data, please consult the technical report.
Data Format:
Within the zip file is a Matlab .mat file for each university. Each .mat file contains two matrices:
A: a sparse matrix storing the N x N adjacency matrix of the social network (where N is the number of users). A[i,j] = 1 indicates that users i
and j are friends.
local_info: an N x 8 matrix of user attributes (each row is a different user). Missing data is coded as 0. These are the attributes:
User ID
Student/faculty status (multiple levels)
Gender (1=female, 2=male)
Major (encoded as integer)
Second major/minor (encoded as integer)
Dorm/house (encoded as integer)
Class year
High school (encoded as integer)
Note that the mappings to the original textual representation of the attributes (e.g. the name of the high school) are not available.
Joint Modeling of Links and Attributes:
The task is to learn a model over both the friendship links (in the adjacency matrix) and the user attributes. Given the links
of a user, can that user's attributes be predicted? For example, can we predict the gender or major of an individual based on knowledge of that
individual's Facebook friends? Conversely, given the attributes of a user, can that user's friendship links be predicted? To accurately evaluate the
model, one can perform cross-validation.
Transfer Learning:
Since the data consists of 5 independent Facebook networks, a logical question to ask is whether these networks share similar social structures. If so,
can we use these commonalities to make improved predictions over links/attributes? The task is similar to task 1 above -- learn a model over the
links/attributes such that the model captures shared commonalities between the five networks. Given complete information about four of the networks, the goal is to perform
link/attribute prediction on the fifth network and determine whether knowledge of the other four networks improves prediction quality.
Feature Selection:
An important task is determining which features would be most informative when doing link prediction. For instance, perhaps the probability of a
friendship link is 99% if two people are in the same dorm and in the same major. The most informative features may be complicated functions of the
attributes. The task is to find these features. One can perform
exploratory data analysis and use Matlab to quickly create plots in order to discover
correlations within the data. One can also try to find these features in an automated way.