Informatics 141: Information Retrieval: Calendar: Winter 2009

Home | Administrative Policies | Course Structure | Resources & Materials | (Calendar )

Department of Informatics->Donald Bren School of Information and Computer Sciences ->University of California, Irvine


Week 01 : Web Search Basics

1/5
1/6 Lecture 01: Notes Slides (PDF, Flash)
1/7
1/8 Lecture 02: Notes Slides (PDF, Flash)
1/9 Assignment 01 due

Reading List

  • Textbook Chapter 19: Web Search Basics
  • Wikipedia entry on Vannevar Bush
  • "As We May Think" The Atlantic Monthly, July, 1945. (reprinted in ACM CHI Interactions, March 1996)
  • "Stuff I’ve seen: A system for personal information retrieval and re-use " by (S. Dumais, E. Cutell, J. Cadiz, G. Jancke, R. Sarin, and D. Robbins, SIGIR, 2003)
    1. Commentary: "This paper addresses an increasingly important problem – how to search and manage personal collections of electronic information. ... it addresses an important user-centered problem. ...this paper presents a practical user interface to make the system useful. ..., the paper includes large scale, user-oriented testing that demonstrates the efficacy of the system. ..., the evaluation uses both quantitative and qualitative data to make its case. I think this paper is destined to be a classic because it may eventually define how people manage their files for a decade. Moreover, it is well-written and can serve as a good model for developers doing system design and evaluation, and for students learning about IR systems and evaluation."

return to menu

Week 02: Web Search Basics (continued)

 

1/12 Discussion 01: Slides (PDF)
1/13 Lecture 03: Notes Slides (PDF, Flash)
1/14
1/15 Lecture 04: Notes Slides (PDF, Flash) Quiz 01 on week 1 reading
1/16 Assignment 02 due

Reading List

  1. "Simple, Proven Approaches to Text Retrieval" by Robertson and Jones
    1. Commentary: "This paper provides a brief but well informed and technically accurate overview of the state of the art in text retrieval, at least up to 1997. It introduces the ideas of terms and matching, term weighting strategies, relevance weighting, a little on data structures and the evidence for their effectiveness. In my view it does an exemplary job of introducing the terminology of IR and the main issues in text retrieval for a numerate and technically well informed audience. It also has a very well chosen list of references."

 

return to menu

Week 03: Web Crawling and Indices

1/19 Holiday: MLK Jr. Martin Luther King Jr. Day (Service Opportunities)
1/20 Lecture 05 Notes Slides (PDF,Flash)
1/21
1/22 Lecture 06 Notes Slides (PDF, Flash)
1/23 Assignment 02 due

Reading List

  1. Textbook Chapter 20 : Web Crawling and Indices
  2. "The Web As a Graph" by R. Kumar, P Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal, PODS 2000)
    1. Abstract: "The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph has about a billion nodes today, several billion links, and appears to grow exponentially with time. There are many reasons—mathematical, sociological, and commercial—for studying the evolution of this graph. We first review a set of algorithms that operate on the Web graph, addressing problems from Web search, automatic community discovery, and classification. We then recall a number of measurements and properties of the Web graph. Noting that traditional random graph models do not explain these observations, we propose a new family of random graph models."

 

return to menu

Week 04: Index Construction

1/26 Discussion 02 Slides (PDF)
1/27 Lecture 07 Notes Slides (PDF, Flash)
1/28
1/29 Lecture 08 Notes Slides (PDF,Flash) Quiz 02 on week 2 and week 3 reading
1/30

Reading List

  1. Textbook Chapter 4 : Index Construction
  2. "The WebGraph Framework I: Compression Techniques " by (P. Boldi and S. Vigna, WWW 2004)
    1. Abstract: "Studying web graphs is often difficult due to their large size. Recently,several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms and tools that aims at making it easy to manipulate large web graphs. This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other). WebGraph can compress the WebBase graph (118 Mnodes, 1 Glinks)in as little as 3.08 bits per link, and its transposed version in as littleas 2.89 bits per link.

return to menu

Week 05: Index Construction (continued)

2/2 Discussion 03 Assignment 03 due 2/3 Lecture 09 Notes Slides (PDF,Flash)
2/4
2/5 Lecture 10 Notes Slides (PDF_01,PDF_02,PDF_03) (Flash_01,Flash_02,Flash_03)
2/6 Mid-Term Evaluation due

Reading List

  1. Nothing this week

 

return to menu

Week 06: Querying, Scoring, Term Weighting and the Vector Space model

2/9 Discussion 04 Slides (PDF)
2/10 Lecture 11 Notes Slides (PDF,Flash)
2/11
2/12 Lecture 12 Notes Slides (PDF,Flash)
2/13

Reading List

  1. Textbook Chapter 1 : Boolean Retrieval
  2. Textbook Chapter 6 : Scoring, term weighting & the vector space model
  3. "The Anatomy of a Large-Scale Hypertextual Web Search Engine" by (S. Brin and L. Page, this link is to the long version, the short version was publishied in WWW1998)
    1. Commentary: "This paper (and the work it reports) has had more impact on everyday life than any other in the IR area. A major contribution of the paper is the recognition that some relevant search results are greatly more valued by searchers than others. By reflecting this in their evaluation procedures, Brin and Page were able to see the true value of web-specific methods like anchor text. The paper presents a highly efficient, scalable implementation of a ranking method which now delivers very high quality results to a billion people over billions of pages at about 6,000 queries per second. It also hints at the technology which Google users now take for granted: spam rejection, high speed query-based summaries, source clustering, and context(location)-sensitive search. IR and bibliometrics researchers had done it all (relevance, proximity, link analysis, efficiency, scalability, summarization, evaluation) before 1998 but this paper showed how to make it work on the web. For any non-IR engineer attempting to build a web-based retrieval system from scratch, this must be the first port of call."

 

return to menu

Week 07: Querying, Scoring, Term Weighting and the Vector Space model (continued)

2/16 Holiday: President's Day
2/17 Lecture 13 Notes Slides 01 (PDF, Flash) Slides 02 (PDF, Flash)
2/18
2/19 Lecture 14 Notes Slides (PDF, Flash) Quiz 03 on week 6 reading
2/20 Assignment 05 due MapReduce posting list lite

Reading List

  1. None

 

return to menu

Week 08: Link Analysis

2/23 Discussion 05 Slides (PDF)
2/24 Lecture 15 Notes Slides 01 (PDF, Flash) Slides 02 (PDF, Flash)
2/25
2/26 Lecture 16 Notes Slides (PDF, Flash)
2/27

Reading List

  1. Textbook Chapter 21 : Link Analysis

 

return to menu

Week 09: Matrix decompositions and latent semantic indexing

3/2 Discussion 06 Slides (PDF)
3/3 Lecture 17 (Sick) Assignment 06 due Use MapReduce to index Wikipedia
3/4
3/5 Lecture 18 Notes Slides(PDF,Flash)
3/6

Reading List

  1. Textbook Chapter 18 : Matrix Decompositions and latent semantic indexing
  2. "Indexing by latent semantic analysis" by (Deerwester, Dumais, et.al)
    1. Commentary: " IR, as a field, hasn’t directly considered the issue of semantic knowledge representation. The above paper is one of the few that does in the following way. LSI is latent semantic analysis (LSA) applied to document retrieval. LSA is actually a variant of a growing ensemble of cognitively-motivated models referred to by the term “semantic space”. LSA has an encouraging track record of compatibility with human information processing across a variety of information processing tasks. LSA seems to capture the meaning of words in a way which accords with the representations we carry around in our heads. Finally, the above paper is often cited and interest in LSI seems to have increased markedly in recent years. The above paper has also made an impact outside our field. For example, recent work on latent semantic kernels (machine learning) draws heavily on LSI. "

 

return to menu

Week 10: Matrix decompositions and latent semantic indexing (cont)

3/9 Discussion 07 Slides (PDF)
3/10 Lecture 19 Notes, Slides (PDF, Flash)
3/11
3/12 Lecture 20 Notes Slides (PDF,Flash) Quiz on Chapter 21, 18 and 8
3/13

Reading List

  1. Textbook Chapter 8 : Evaluation in Information Retrieval
  2. "Unsupervised Named-Entity Extraction from the Web: An Experimental Study " (Etzioni, et.al.)
    1. This paper represents a new generation of IR work that attempts to do more than build a bag of words for information retrieval, but also attempts to make some sense of the information as well.

 

return to menu

Finals Week

3/15 (Sunday) Assignment 07 due Implement rapid cosine scoring & optional U/I
3/16 Assignment 07 demonstration
3/17
3/18
3/19 (Meet at the University Club Parking Lot at 10am) - 12:15 (Google Tour)
3/20

return to menu