CS 277: Data Mining
Winter 2010
Instructor: Padhraic Smyth
Lectures: 2 to 3:20pm, Tuesdays and Thursdays, ICS 225
Office Hours: after class on Thursdays, until 4:30pm, DBH 4212
Software: information about MATLAB
Text: Principles of Data Mining, MIT Press, 2001, D. J. Hand, H. Mannila, P. Smyth
(optional, but recommended)
Background Reading: additional papers to supplement the text
Project Guidelines, including due dates, instructions, and links to data sets
Syllabus
Introduction to Data Mining:
- Topics
- basic concepts in data mining
- data measurement
- exploratory data analysis
- data visualization
- Reading
- Links
- Slides
- Introduction to Data Mining [PPT] [PDF]
- Measurement and Data [PPT] [PDF]
- Exploratory Data Analysis and Visualization [PPT] [PDF]
- Homeworks
Basic Principles of Data Mining
- Topics
- predictive modeling: classification and regression
- model fitting as optimization
- evaluation of predictive performance
- overfitting, regularization
- other data mining tasks: clustering amd pattern detection
- Reading
- Links
- Slides
Text Mining
- Topics
- information retrieval and search
- text classification
- unsupervised learning
- Reading
- Links
- Slides
- text classification [PPT] [PDF]
- text mining and topic models [PPT] [PDF]
- notes on graphical models [PPT] [PDF]
Recommender Systems
- Topics
- recommender data, Netflix prize data
- nearest neighbor algorithms
- matrix decomposition algorithms
- efficient algorithms for large data sets
- modeling systematic effects
- Reading
- Links
- Slides
- Recommender systems [PPT] [PDF]
- Netflix case study [PPT] [PDF]
Web Data Analysis
- Topics
- Web data: collection and interpretation
- analyzing user browsing behavior
- learning from clickthrough data
- predictive modeling and online advertising
- link analysis and the PageRank algorithm
- Reading
- Links
- Slides
Social Network Analysis
- Topics
- descriptive analysis of social networks
- network embedding and latent space models
- network data over time: dynamics and event-based networks
- link prediction
- Reading
- overview of network analysis methods (e.g., Newman et al)
- overview of SNA concepts
- review paper on community detection in networks
- specific papers by Watts, Leskovecs, Barabasi, etc
- Links
- Slides
Time Series Analysis and Anomaly Detection
- Topics
- basic concepts in time-series analysis
- principles of Markov and hidden Markov models
- event data and Poisson models
- techniques for detecting anomalies, events, changes, motifs, etc
- case study: analysis of large-scale traffic data
- Reading
- Links
- Slides
Grading
Class grades will be based on a class project (80%) and homeworks (20%). The projects will require submission of progress reports during the quarter, presentations in class during finals week, and a final report.
Academic Honesty
It is the responsibility of each student to be familiar with the UCI Senate Academic Honesty Policies. For homework assignments and projects you are allowed to discuss ideas and concepts verbally with other class members, but you are not allowed to look at or copy anyone else's written solutions or code relating to homework assignments or projects. All material submitted must be material you have personally written during this quarter. Failure to adhere to this policy can result in a student receiving a failing grade in the class.