Machine Learning CS 273A Fall 2011

Classnotes (last updated xxxx)

Q&A Blog on Machine Learning(last updated xxxx )

CompSci: 273A
Instructor: Max Welling

I am available for questions after every lecture. 


-There will be no class on Friday November 25

-Project Reports are due Sunday December 4 midnight (no extension!)

- Last week every group will be scheduled for a project presentation of 15 mins.




The goal of this class is to familiarize you with various state-of-the-art machine learning techniques for classification, regression, clustering and dimensionality reduction.
The emphasis of this course is on gaining experience through the implementation of various algorithms for homework and for your project.

Homework & Exams:

Important Note for MS students: This is a CS core course and as such part of the MS comprehensive exam. MS students who are not subscribed to this course can particiate in the exam on Monday Dec. 5, 4-6pm. If you want to do this you should let me know well in advance. Please make sure you have read the new policy concerning the MS comprehensive exam.

Every week there will be homework mostly consisting of a coding assignement. Coding will be done in MATLAB or R. Finishing all your homework assignments is required for passing this class. If you do not hand in your HW before the weekly deadline, you will acculumate penalty points (or viewed more positively, if you hand in your HW in time every week you will accumulate bonus points). I may ask students to demo their implementation of the HW assignment in class.

There will be short quizzes every week on Fridays, starting in week 2. There is no midterm.
Regular quizzes will be multiple choice, and you will need to buy Green scantron test forms in the UCI bookstore (both variety of large green form will work, but not the very small ones).

There will also be a project, starting right from day 1. Details are given below. You may work in group's of up to 3 students but every member must do their own coding work. Each group will also be required to do a presentation and write a brief report (as a group). It is important that the various contributions of the groupmembers will be made transparent. Your project is due Monday November 21 in a designated EEE dropbox. Please keep it brief (say ,up to 5 pages). Describe what you did. Make very sure you explain your own contribution and how it is different from what the other ion your group have done. Also make sure you are precise about all input you received, literature referencves etc. A scientific analysis is also desriable, e.g. report on how well do various methods you tried compare, what are the weaknesses and stengths of the methods etc

There will also be a comprehensive final exam on Monday Dec. 5, 4-6pm.

Your final grade will be determined as a combination of Homework (10%), Quizzes(20%), Project (30%) and final exam (40%),.


You are required to work on the Heritage Health Competition on the Kaggle website or on the "Give Me Some Credit" Competition on the Kaggle website

Please subscribe to one of these competitons and download the data.
For the Heritage Competition we have provided you with a simple set of features to get you started (thanks to Sungjin Ahn!), but you should feel free to change these features into anything else. The login and password to download these features will be provided in class.

You may form a team with up to 3 people. Use the following naming convention for your team: UCI-CS273A-WelShaEin where "WelShaEin" are the first 3 letters of my last name (Welling) and my collaborators Shannon and Einstein. You are required to submit your results regularly to the website and appear on the leaderboard. If you end up high on the leaderboard this will earn you extra bonus points (e.g. the team that wins the competition and receives the $3M will receive an A+ etc.)

By December 4 (midnight) you need to submit your final reports (see above). Don't make them too long (I can't read 20 reports of each 10 pages). However, you do need to report what you did and provide the appropriate analysis and interpretation of your best scoring method. Also make sure you make it abundantly clear where you got your information and who did what in your team.


(the following text was plagiarized from D.Kay)

Don't do it: Plagiarism means presenting somebody else's work as if it's your own. You may use whatever outside sources (books, friends, interviews, periodicals) are appropriate for an assignment, so long as you cite them: Any time you use two or more words in a row that you didn't think up and write yourself, you must put the words in quotation marks and indicate where they came from. (There could be situations where this two-word rule isn't appropriate. If you think you have one, check with us.) Even if you paraphrase (state in your own words) someone else's work or ideas, you should cite the source (e.g., "Dijkstra says that unrestricted branching is dangerous."). Plagiarism is academically dishonest, and we expect that nobody in the class will engage in it.

Turning in another person's work as your own violates the honesty policies of ICS and UCI ( The School of ICS takes academic honesty very seriously and imposes serious penalties on students who violate its guidelines. Detected violations could result in your failing the course, having a letter filed with the school, and losing a variety of other benefits and privileges. We do check for academic dishonesty both manually and automatically. It is an unfortunate fact that nearly every quarter, some students in ICS classes are found to have violated these policies; to protect the privacy of the guilty, violations are not made public, but sadly, they do occur. Compared to the consequences of academic dishonesty, one low assignment score is a minor disadvantage. If you feel as if you're falling behind or have other difficulties, see the instructor; we will help you work around your trouble. No matter how pressured you feel, don't plagiarize; it's not worth it.

Most importantly, realize that getting "the answer" is only the last part of each assignment. Equally important is the process of getting the solution—including the false starts, bugs, misconceptions, and mistakes—because the learning occurs in the doing. Completely apart from the ethical issues, copying a solution deprives you of the whole point of the assignment.


These slides will change

-Introduction, Scatter Plots, kNN, Logistic Regression, Xvalidation, overfitting [ppt] [pdf]
-Decision Theory, Loss functions, ROC curves [ppt][pdf]
-Clustering [ppt] [pdf]
[ppt] [pdf]
-Convex Optimization (see classnotes and book)
-SVMs [ppt] [pdf]
-Kernel-PCA (see classnotes and book)
-Spectral Clustering (classnotes)
-FisherDiscriminant Analysis (classnotes)
-Ridge Regression & Kernel RG (classnotes)
-Neural Networks [ppt] [pdf]
-Bias-Variance Tradeoff [ppt] [pdf]
-DT [ppt] [pdf]
-Boosting [ppt] [pdf]

For your own interest:

-Classifier Evaluation [ppt][pdf]
-Canonical Correlation Analysis & Maximum Autocorrelation Factor Analysis (classnotes)

Practice Exams

Final Exam 2009

Final Exam 2011

Data Sets and Online Resources:

Homework Dataset (map to two classes for HW): Iris Flower

Homework : (always due the next Tuesday at 12pm)

These homework exercises will change.

Week 1: Homework 1 [doc] [pdf]
Week 2: Homework 2 [doc] [pdf]
Week 3: Homework 3 [doc] [pdf]
Week 4: Homework 4 [doc] [pdf]
Week 5: Homework 5 [doc] [pdf]
Week 6: Homework 6 [doc][pdf]
Week 7: Homework 7 [doc] [pdf]
Week 8: Homework 8 [doc][pdf]
Week 9: Homework 9 [doc] [pdf]

For your own interest:

Week 10: Homework 10 [doc][pdf]


I am writing my own text which will be posted here and on the top of this webpage. Note that I will continuously update it during the course and feedback is appreciated.

The textbook that will be used for this course is:

1. C. Bishop: Pattern Recognition and Machine Learning

Optional side readings are:

2. D. MacKay: Information Theory, Inference and Learning Algorithms
3. R.O. Duda, P.E. Hart, D. Stork: Pattern Classification
4. C.M. Bishop: Neural Networks for Pattern Recognition
5. T. Hastie, R. Tibshirani, J.H, Friedman: The Elements of Statistical Learning
6. B.D. Ripley: Pattern Recognition and Neural Networks
7. T. Mitchell: Machine Learning. (