CS 277: Data Mining

Fall 2007

Who, Where, When

Instructor: David Newman (newman@uci.edu)
 
Lectures: Tuesday and Thursday, 5:00-6:20, DBH 1425

Office Hours: Friday, 9:30-10:30, DBH 4064
 

Links:

·        Project information: pointers to data sets and general guidelines (may be accessible only from ICS machines).
Instructions for your Project Proposal (due Tues Oct 16).
Instructions for your Progress Report 1 (due Tues Oct 30).
Please email me your 2-page ppt/pdf Progress Report 2 by 2pm, Tuesday Nov 13.

·        You will be expected to program in MATLAB in this class.  Here is a link for general information about MATLAB.

·        Tutorial background reading and suggested additional reading.

Overview

This class provides a broad overview of techniques, algorithms, and applications in data mining, with an emphasis on text mining. The first 3 or 4 weeks of the course will cover basic principles of data mining tasks, data measurement, exploratory data analysis and visualization, model structures, scoring and evaluation, as well as brief reviews of techniques for classification, regression, and clustering. The remaining 6 weeks or so will cover specific application areas in depth: topics that are likely to be included (roughly 1 per week) are text mining and information extraction, Web data mining, credit scoring and list scoring, transaction data such as market basket data, and bioinformatics data.

Note that a prerequisite for this class is that you have already taken a class in Machine Learning such as CS 273 or CS 274 or a course that covers similar material to either of these.

Due Dates for Homeworks and Project

Here are the due dates for the three homeworks and the project.

Week
Tuesday
Thursday
0

Sep 27:
1
Oct 2: (Homework 1 available here) Oct 4:
2
Oct 9: Homework 1 due
Oct 11:
3
Oct 16: Project proposal due (Homework 2 available here) Oct 18:
4
Oct 23:
Oct 25:
5
Oct 30: Project progress report 1 due (instructions here)
Nov 1:
6
Nov 6: Homework 2 due (Homework 3 available here) Nov 8:
7
Nov 13: Project progress report 2 due
Nov 15:
8
Nov 20: Homework 3 due
Guest lecture: Dr. Ashish Bhan
Nov 22: NO CLASS (Thanksgiving)
9
Nov 27:
Nov 29:
10
Dec 4: NO CLASS
Dec 6: Project presentations in class (1st group)
11
Dec 11: NO CLASS (Finals week)
Dec 13: Final (4-6pm):
Project Presentations in class (2nd group)
Final Project Report due

Syllabus and Lectures

This is an estimate of the lecture schedule. We will adapt/adjust the schedule as necessary during the quarter.

Week 0:

Week 1:

Week 2:

Week 3:

Week 4:

Week 5:

Week 6:

Week 7:

Week 8:

Week 9:

Week 10:

Week 11:

 

Textbook

We will use Mining the Web: Discovering Knowledge from Hypertext Data by Soumen Chakrabati, 2003.  This text is a good general introduction to data mining with applications to the internet, and an emphasis on text mining. 

Homeworks

There will be 3 homework assignments.
 

Student Projects

An important part of this class for each student will be their class project.  Details to be announced shortly on the   project information Web page and projects will also be discussed in class.

Grading Policy

The grading for this class will be based on:

- 30% for the homeworks

- 70% for the class project

Academic Honesty

Academic honesty is taken seriously. For homework problems or programming assignments you are allowed to discuss the problems or assignments verbally with other class members, but under no circumstances can you look at or copy anyone else's written solutions or code relating to homework problems or programming assignments. All problem solutions submitted must be material you have personally written during this quarter. Failure to adhere to this policy can result in a student receiving a failing grade in the class. It is the responsibility of each student to be familiar with UCI's current academic honesty policies. Please take the time to read the current UCI Senate Academic Honesty Policies. Also you may want to look at the ICS Department's policies on academic honesty .

UCI Catalog Description

277 Data Mining (4).  (Course code: 35363).  Introduction to the general principles of inferring useful knowledge from large data sets (commonly known as data mining or knowledge discovery). Relevant concepts from statistics, databases and data structures, optimization, artificial intelligence, and visualization are discussed in an integrated manner. Prerequisite: CS 273 or 274 or consent of instructor.

Acknowledgement

This syllabus, and the class lectures, homeworks and project are closely based on the same class offered by Professor Padhraic Smyth.  This material is used with permission by Professor Smyth.