Instructor: David
Newman (newman@uci.edu)
Lectures: Tuesday and Thursday,
Office Hours: Friday,
9:30-10:30, DBH 4064
Links:
·
Project
information: pointers to data sets and general guidelines
(may be accessible only
from ICS machines).
Instructions for your Project Proposal
(due Tues Oct 16).
Instructions for your Progress Report 1
(due Tues Oct 30).
Please email me your 2-page ppt/pdf Progress
Report 2 by 2pm, Tuesday Nov 13.
· You will be expected to program in MATLAB in this class. Here is a link for general information about MATLAB.
·
Tutorial
background reading and
suggested additional
reading.
This class provides a broad overview of techniques, algorithms, and applications in data mining, with an emphasis on text mining. The first 3 or 4 weeks of the course will cover basic principles of data mining tasks, data measurement, exploratory data analysis and visualization, model structures, scoring and evaluation, as well as brief reviews of techniques for classification, regression, and clustering. The remaining 6 weeks or so will cover specific application areas in depth: topics that are likely to be included (roughly 1 per week) are text mining and information extraction, Web data mining, credit scoring and list scoring, transaction data such as market basket data, and bioinformatics data.
Note that a prerequisite for this class is that you have already taken a class in Machine Learning such as CS 273 or CS 274 or a course that covers similar material to either of these.
Here are the due dates for the three homeworks and the
project.
| Week |
Tuesday |
Thursday |
| 0 |
Sep
27: |
|
| 1 |
Oct 2: (Homework 1 available here) | Oct
4: |
| 2 |
Oct
9: Homework 1 due |
Oct
11: |
| 3 |
Oct 16: Project proposal due (Homework 2 available here) | Oct
18: |
| 4 |
Oct
23: |
Oct
25: |
| 5 |
Oct
30: Project progress report 1 due (instructions
here) |
Nov
1: |
| 6 |
Nov 6: Homework 2 due (Homework 3 available here) | Nov
8: |
| 7 |
Nov
13: Project progress report 2 due |
Nov
15: |
| 8 |
Nov
20: Homework 3 due Guest lecture: Dr. Ashish Bhan |
Nov
22: NO CLASS (Thanksgiving) |
| 9 |
Nov
27: |
Nov
29: |
| 10 |
Dec
4: NO CLASS |
Dec
6: Project presentations in class
(1st group) |
| 11 |
Dec
11: NO CLASS (Finals week) |
Dec
13: Final (4-6pm): Project Presentations in class (2nd group) Final Project Report due |
This is an estimate of the lecture schedule. We will adapt/adjust
the schedule as necessary during the quarter.
Week 0:
Week 1:
Week 2:
Week 3:
Week 4:
Week 5:
Week 6:
Week 7:
Week 8:
Week 9:
Week 10:
Week 11:
We will use Mining the Web: Discovering Knowledge from Hypertext
Data by
Soumen Chakrabati, 2003.
This text
is a good general introduction to data mining with applications to the
internet, and an emphasis on text mining.
There will be 3 homework assignments.
An important part of this class for each student will be their class
project. Details to be announced
shortly on the
project
information Web page and projects will also be discussed in class.
The grading for this class will be based on:
- 30% for the homeworks
- 70% for the class project
Academic honesty is taken seriously. For homework problems or programming assignments you are allowed to discuss the problems or assignments verbally with other class members, but under no circumstances can you look at or copy anyone else's written solutions or code relating to homework problems or programming assignments. All problem solutions submitted must be material you have personally written during this quarter. Failure to adhere to this policy can result in a student receiving a failing grade in the class. It is the responsibility of each student to be familiar with UCI's current academic honesty policies. Please take the time to read the current UCI Senate Academic Honesty Policies. Also you may want to look at the ICS Department's policies on academic honesty .
277 Data Mining (4). (Course code: 35363). Introduction
to the general principles of
inferring useful knowledge
from large data sets (commonly known as data mining or knowledge
discovery). Relevant
concepts from statistics, databases and
data structures, optimization, artificial intelligence, and
visualization are
discussed in an integrated manner. Prerequisite: CS 273 or 274 or
consent of
instructor.
This syllabus, and the class lectures,
homeworks and project are closely based on the same class offered by
Professor Padhraic Smyth. This material is used with permission
by Professor Smyth.