Winter 2015 — Information and Computer SciencesUC Irvine

Informatics 141 / Computer Science 121: Course Reference
INFORMATION RETRIEVAL

Instructor: David G. Kay, 5056 Donald Bren Hall (kay@uci.edu). TAs: Siva Thirunavukkarasu (sthiruna@uci.edu) and Shriti Raj (shritir@uci.edu).

Quick links: Textbook Assignments Piazza Q&A (public) Resources

Course goals: Today we can gain new insights and make better decisions to a degree impossible even ten years ago due to the advent of faster and smaller processors, denser and cheaper storage, proliferating mobile devices, and widespread internet use. This phenomenon is sometimes called "big data." This course introduces the enabling factors, implementation techniques, and implications of this data deluge, with an emphasis on using and implementing software tools.[Acknowledgements: These course materials borrow heavily, with permission, from those of Professors Crista Lopes and Don Patterson.]

Enrollment: Unfortunately, more people wish to enroll in this course than we can accommodate. A longer description of the circumstances is available. At the instructional level, the only possible last-minute resolution is finding a classroom with more physical seats, but we can't be sure that's possible. Everyone who is not officially enrolled should have an alternative plan that does not involve this course this quarter. But you should contact the ICS Student Affairs Office right away, via Email to ucounsel@uci.edu, if you're not enrolled, if Infx 141/CS 121 is a graduation requirement for you, if no other course can be substituted, and if you would satisfy all other graduation requirements by the end of Summer 2015; if that applies to you, send the message so at least the demand can be assessed.

Prerequisite courses and concepts: The official prerequisites for this course are at least four quarters of programming, familiarity with Java, and a course in statistics. Those without the official prerequisites will have lower priority for enrollment than those who have them. If it should happen that you are able to enroll without the official prerequisites, you do so at your own risk: It is possible that completing the coursework will require prerequisite knowledge that we don't have time to teach and that will take you more time to acquire than you have available.

We also expect you to have these basic computing skills: Searching and browsing the Web, reading and sending Email, downloading files, viewing and printing PDF (Adobe Acrobat) documents, and creating or saving documents for Email and other purposes in plain ASCII text form (not HTML or Word attachments).

Meeting place and times: Lecture meets Tuesdays and Thursdays from 11:00 to 12:20 in ICS 174. Discussion meets Wednesdays from 5:00 to 5:50 in Rowland Hall 101. Details and advice on the assignments will be covered in the discussion section.

The projection screen and audio of each class will be recorded and available through UCI Replay; after each class, you will receive electronic mail with the link for access. We must note, however, that this process is not 100% reliable; some classes may end up not being recorded. Moreover, the recordigs do not capture questions, interactive activities, or work on the whiteboard. They're convenient if you're forced to miss a class, but they're not an equivalent substitute for showing up in person.

Office hours: I will be in or near my office during these scheduled times, during which course-related matters will have first priority: Tuesdays from 12:30 to 1:15 and Thursdays from 9:30 to 10:00. I may need to adjust these times after the first week. Of course emergencies may come up, but I will try to give advance notice of any change. If I'm not immersed in something else, I'll be glad to answer short questions whenever I'm in my office, so feel free to drop by any time. The TAs will hold scheduled hours as needed at a time and place to be announced. We'll also be happy to make appointments for other times during the week.

Questions and announcements: You can usually get a response to your course-related questions within a few hours (perhaps a bit longer on the weekends) by sending electronic mail to the address i141@uci.edu or cs121@uci.edu. We will never intentionally ignore a message, so if you don't receive a response, write again; sometimes overactive spam filters snag a legitimate message. Using course-specific subject lines and your UCInet Email address will help your messages get noticed.

Email you send to the addresses above is private between you, the TA, the reader, and me. We have also set up a more public discussion forum at www.piazza.com; we think Piazza will be particularly useful for advice and tips as you work on your assignments.

We may also send course announcements by Email to the official course mailing list, so you should check your Email regularly. Note that this mailing list goes to the Email address that the registrar has for you (your UCInet ID). If you prefer to read your Email on another account, you should set your UCInet account to forward your Email to your preferred account (you can do this on the web at http://www.oit.uci.edu/email/deliverypoint.html. Don't let this slide; if you miss official announcements, your grade could suffer.

This course has a home page at http://www.ics.uci.edu/~kay/courses/i141/; an archive of official course Email is available on your MyEEE page.

Textbook: Introduction to Information Retrieval by Manning, Raghavan, and Schutze. More details are available on the resources page.

Course structure:
Assignments (55% of the course grade, with later assignments weighted more heavily than earlier ones)
Class participation (10%).
Two quizzes, on dates to be announced, each covering the most recent four or five weeks of the course(together 35%)
In keeping with the project-oriented nature of the course, there will not be a formal, written final exam. However, demonstrations of your projects will be required and may be scheduled during exam week.

We determine final grades neither on a formal curve (with equal numbers of As and Fs, Bs and Ds, and so on) nor on a straight, fixed scale. We recommend that you focus not on letter grades but on learning what's necessary to complete the projects and earn high scores; the grades will follow from that.

We're required to say that in unusual circumstances, these criteria could change, but we do not expect that to happen.

Special needs: Any student who feels he or she may need an accommodation due to a disability should contact the UCI Disability Services Center at (949) 824-7494 as soon as possible to explore the possible range of accommodations. We encourage all students having difficulty, whether or not due to a disability, to consult privately with the instructor at any time.

What you must do right now to get started in Informatics 141 / CS 121:
— If you do not have a UCInet ID, get one. See http://activate.uci.edu/.
— If you prefer to read your electronic mail on an account other than your UCInet account, redirect your mail at http://www.oit.uci.edu/email/deliverypoint.html.
— Go to checkmate.ics.uci.edu, log in with your UCInet ID, choose "Course Listing" and "Winter 2015,"click "Go" next to Informatics 141, and then click "List me for this course." You'll submit most of your work electronically; this step is necessary to set that up.
— Sign yourself up for Informatics 141 / CS 121 on Piazza.com.

Good advice and helpful hints:

Check your electronic mail regularly; this is an official channel for course announcements. When sending course-related mail, start the subject line with "Infx 141" or "CS 121" or "IR class".

Always keep your own copy of each assignment, both electronically and on paper; if an assignment should get lost in the shuffle (or if a file server should crash, which has happened in the past), we'll expect you to be able to supply a replacement easily.

If you find yourself having trouble or getting behind, speak with the instructor. But never take the shortcut of copying someone else's work and turning it in; the consequences can be far worse than just a low score on one assignment. The ICS department takes academic honesty very seriously; for a more complete discussion, see the course collaboration guidelines and the ICS academic honesty policy: http://www.ics.uci.edu/ugrad/policies/index.php.

Approximate course outline:

Week Datre Topics
1. 6 January Introduction to the course and "big data"
8 January Web search basics (Ch. 19 in the Manning text)
2. 13 January Text processing
15 January Search engine optimization
3. 20 January Web crawling (Ch. 20)
22 January Web crawling
4. 27 January Index construction and scoring (Ch. 4)
29 January Querying, scoring, term weighting, vector space model (Ch. 1, Ch. 6)
5. 3 February Querying, scoring, term weighting, vector space model
5 February — First Quiz —
6. 10 February Link analysis (Ch. 21)
12 February Link analysis
7. 17 February Matrix decompositions and latent semantic indexing (Ch. 18)
19 February Evaluation in IR (Ch. 8)
8. 24 February Law and ethics; intellectual property law
26 February Intellectual property law
9. 3 March Privacy issues
5 March — No class meeting —
10. 10 March — Second Quiz —
12 March Looking back and looking forward
F. Exam Week No formal written final exam, but some project demos may be scheduled this week.


David G. Kay, kay@uci.edu
Monday, February 23, 2015 9:21 PM