Dr. Kalev Kask - University of California at Irvine ZOT!


CompSci 273P: Machine Learning and Data Mining, Spring 2018


Course Outline

  • When: Tuesday & Thursday, 5:00 - 6:20p
  • Where: SSL 228 UCI campus map
  • Course Code: 35515
  • Lab/Discussion section : Tue 7:00-7:50 SSL 270.
  • Instructor: Kalev Kask
    • Email: kkask@uci.edu; when sending email, put CS273P in the subject line
    • Office hours: TBD
  • TA: Filjor Broka
  • Reader: Ananthakrishnan Pushpendran


Course Overview:


How can a machine learn from experience and become better at a given task? How can we automatically extract knowledge or make sense of massive quantities of data? These are the fundamental questions of machine learning. Machine learning and data mining algorithms use techniques from statistics, optimization, and computer science to create automated systems which can shift through large volumes of data at high speed to make predictions or decisions without human intervention.

This class will familiarize you with a broad cross-section of basic/popular models and algorithms for machine learning, and prepare you for industry application of machine learning techniques.


Background:


We will assume basic familiarity with the concepts of probability, statistics, calculus and linear algebra. Some programming will be required; we will primarily use Python, using the libraries "numpy" and "matplotlib", as well as course code..


Assignments:


There will be a few homework-assignments (one the average one hw every two weeks), two projects, and a final.


Course-Grade:

  • Homeworks 20%
  • 2 projects, 20% each
  • Final 40%


Projects:


You will be required to finish 2 projects :
  • Project 1 is regression; due approx week 9
  • Project 2 is classification; due approx week 11
  • Project consists of a team of 3 students working together
  • Each team will submit results to Kaggle competion
  • Further details TBA


Textbook and Reading:


There is no required textbook for the class. However, useful books on the subject for supplementary reading include :
  • Duda, Hart, Stork, "Pattern Classification"
  • Daume "A Course in Machine Learning"
  • Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning"
  • Murphy "Machine Learning: A Probabilistic Perspective"
  • Bishop "Pattern Recognition and Machine Learning"
  • Sutton "Reinforcement Learning"


Python:


While you can use any environment/language/platform for computer coding assignments, we recommend and support Python. I strongly suggest the "full SciPy stack", which includes NumPy, MatPlotLib, SciPy, and iPython notebook for interactive work and visualization; see HERE for installation information.

Here is a simple introduction to numpy and plotting for the course; and of course you can find complete documentation for these libraries as well as many more tutorial guides online.

While Python 2.7 is still widely used, try to program in a 3.0 compatible way; if you find parts of the code do not work for more recent versions of Python please let us know the issue and we will try to fix it.


Lab and Discussion:


There is a lab/discussion section on Tuesdays 7:00pm, shortly after class, in SSL 270. This is where you can discuss course material, get help with programming (Python) and discuss project related issues/questions.

We will use a course Piazza page for questions and discussion. Please post your questions there; you can post privately if you prefer, or if (for example) your question needs to reveal your solution to a homework problem. I prefer to use Piazza for all class contact, since it enables responses by either myself, the TA, or fellow students (if public), which should get you answers more quickly.

Note: when posting privately, please post to "Instructors" (which includes the instructor & TAs).


Syllabus:

Subject to changes

Week Topic Date   Reading    Lecture      Slides Homework  
Week 1
  • Class setup; Concepts; Complexity


  • Bayes classifiers; Naive Bayes
04-03


04-05
Intro
Python tutorial
Python ipynb





Set 1


Set 2
HW1-code

HW1
Week 2
  • Nearest neighbor models


  • Linear regression; Gradient Descent
04-10


04-12
kNN




Set 3
Week 3
  • Regularization; Cross-validation


  • Linear classifiers; Perceptrons; Logistic Regression
04-17


04-19
LinReg


LinClass




Set 4


Set 5
HW2-code

HW2
Week 4
  • VC Dimension


  • Support Vector Machines
04-24


04-26
VCdim

Set 6
HW3-code

HW3
Week 5
  • SVM Kernels


  • Neural Networks
05-01


05-03


Set 7


Set 8
Project 1
Week 6
  • Neural Networks


  • Decision Trees
05-08


05-10



Dtree










Set 9
Week 7
  • Ensembles: Bagging


  • Ensembles: Boosting
05-15


05-17









Set 10
HW4-code

HW4
Week 8
  • Clustering: k-means, EM


  • Latent space models; PCA / SVD
05-22


05-24




Set 11


Set 12



Week 9
  • Project 1 report is due (05/29)
  • Latent space models; Collaborative filtering & recommender systems
  • Markov models; Markov decision processes
05-29


05-31
Set 13 Project 2
HW5-code
HW5
Week 10
  • Reinforcement learning; bandits

  • Sarsa, Q-learning
06-05

06-07
Set 14



Week 11

  • Project 2 report is due (06/14)
  • Final exam (06/14 4-6pm SSL 228)
06-14


Online Lectures:

Online Notes: