Course PersonnelInstructor: Dmitri V. Kalashnikov
office: DBH 2072
office hours: By appointment
Meeting Times & PlacesFirst meeting (Apr 2, 2013)
time: 5:00 pm – 6:20 pm
place: DBH 1431
After that, starting Apr 11, 2013, we will meet on Thursdays only
time: Thu 5:00 - 6:20 PM
place: DBH 1431
(Notice: class at the same location, same time, but no classes on Tue)
Course ObjectivesThe effectiveness of data-driven technologies as decision support tools, data exploration and scientific discovery tools is closely tied to the quality of data on which such techniques are applied. It is well recognized that the outcome of the analysis is only as good as the quality of data on which the analysis is performed. That is why today organizations spend a tangible percent of their budgets on cleaning tasks such as removing duplicates, correcting errors, filling missing values, to improve data quality prior to pushing data through the analysis pipeline. Forrester Research group has estimated that the market for data quality passed the $1 Billion mark in 2008.
The objective of this course is to deepen our understanding of recent trends in information quality research, that is, this is not a comprehensive data quality course. We will focus specifically, but not exclusively, on data management techniques for solving entity resolution (ER) problem. The ER challenge arises because objects in the real world are referred to using references or descriptions that are not always unique identifiers of the objects, leading to ambiguity. This ambiguity must be resolved, or taken into account, when analyzing the data to produce meaningful results.
The course will be based on student presentations of prominent publications in the area of information quality. There will be a list of publications to be covered in the class. The students will choose publications that they want to present from that pool and they will decide the dates of their presentations. Presenting papers not from that pool is also encouraged, but please get an approval from the instructor well in advance.
PrerequisitesBasic understanding of databases and machine learning.
TextbooksThere is no required textbook. While there is no comprehensive textbook, those who are interested in furthering their knowledge of the area might want to read:
Tentative List of PublicationsThis list was created for your convenienence. Please feel free to present a paper not from that list, but first get an approval of the paper you choose from the instructor. The paper you choose must be on the course topic of Entity Resolution or Data Quality.
Midterm & Final ExamNone.
Grading CriteriaIn assigning the final grade the following factors will be considered:
Prominent Active IQ/ER Research GroupsSome prominent entity resolution & data quality research groups and projects: