ICS 278: Data Mining

Spring 2006

Who, Where, When

Instructor: Padhraic Smyth  
Lectures: Tuesday and Thursday, 11:00-12:20, CS 243

Office Hours: Friday, 9:30 to 11:00 on Fridays
 

Links:

·        Project information: pointers to data sets and general guidelines (may be accessible only from ICS machines).
  Information on how to submit project proposals is now available, due Monday April 24th, by 8am
  Infomation on software and data sets that may be useful for projects

·        General information about MATLAB

·        Tutorial background reading and suggested additional reading

Overview

This class provides a broad overview of techniques, algorithms, and applications in data mining. The first 3 or 4 weeks of the course will cover basic principles of data mining tasks, data measurement, exploratory data analysis and visualization, model structures, scoring and evaluation, as well as brief reviews of techniques for classification, regression, and clustering. The remaining 6 weeks or so will cover specific application areas in depth: topics that are likely to be included (roughly 1 per week) are text mining and information extraction, Web data mining, credit scoring and list scoring, transaction data such as market basket data, and bioinformatics data.

Note that a prerequisite for this class is that you have already taken a class in Machine Learning such as ICS 273 or ICS 274 or a course that covers similar material to either of these.

Syllabus and Lectures

This is the current best estimate of the lecture schedule. We may adapt/adjust the schedule as necessary during the quarter.

·         Week 1:

·         Week 2:

·         Week 3:

·         Week 4:

·         Week 5:

·         Week 6: 

·         Week 7:

·         Week 8:

·         Week 9:

·         Week 10:

 

Textbook

We will use Principles of Data Mining by Hand, Mannila, and Smyth, MIT Press, 2001. This text is fairly widely used in data mining classes so you should be able to find second-hand copies at the bookstore or via the Web.

 

Homeworks

There will be 3 or 4 homework assignments.

 

Student Projects

An important part of this class for each student will be their class project.  Details to be announced shortly on the   project information Web page and projects will also be discussed in class.

 

Grading Policy

The grading for this class will be based on:

30% for the homeworks

70% for the class project

 

Academic Honesty

Academic honesty is taken seriously. For homework problems or programming assignments you are allowed to discuss the problems or assignments verbally with other class members, but under no circumstances can you look at or copy anyone else's written solutions or code relating to homework problems or programming assignments. All problem solutions submitted must be material you have personally written during this quarter. Failure to adhere to this policy can result in a student receiving a failing grade in the class. It is the responsibility of each student to be familiar with UCI's current academic honesty policies. Please take the time to read the current UCI Senate Academic Honesty Policies (in Spring Schedule of Classes, a few pages from the end). Also you may want to look at the ICS Department's policies on academic honesty .

UCI Catalog Description

278 Data Mining (4). Introduction to the general principles of inferring useful knowledge from large data sets (commonly known as data mining or knowledge discovery). Relevant concepts from statistics, databases and data structures, optimization, artificial intelligence, and visualization are discussed in an integrated manner. Prerequisite: ICS 273 or 274 or consent of instructor.