Exploiting Context Analysis for Combining Multiple Entity Resolution Systems

Appeared in ACM SIGMOD 2009


Zhaoqi Stella Chen, Dmitri V. Kalashnikov, Sharad Mehrotra

Computer Science Department
University of California, Irvine
GDF project (http://www.ics.uci.edu/~dvk/GDF)

Abstract

Entity Resolution (ER) is an important real world problem that has attracted significant research interest over the past few years. It deals with determining which object descriptions co-refer in a dataset. Due to its practical significance for data mining and data analysis tasks many different ER approaches has been developed to address the ER challenge. This paper proposes a new ER Ensemble framework. The task of ER Ensemble is to combine the results of multiple base-level ER systems into a single solution with the goal of increasing the quality of ER. The framework proposed in this paper leverages the observation that often no single ER method always performs the best, consistently outperforming other ER techniques in terms of quality. Instead, different ER solutions perform better in different contexts. The framework employs two novel combining approaches, which are based on supervised learning. The two approaches learn a mapping of the clustering decisions of the base-level ER systems, together with the local context, into a combined clustering decision. The paper empirically studies the framework by applying it to different domains. The experiments demonstrate that the proposed framework achieves significantly higher disambiguation quality compared to the current state of the art solutions.


Categories and Subject Descriptors

H.2.m [Database Management]: Miscellaneous - Entity Resolution;
H.2.m [Database Management]: Miscellaneous - Data Cleaning;
H.2.m [Database Management]: Miscellaneous - Information Quality;


Keywords

ER Ensemble, Entity Resolution, Context Analysis


Downloadable Files

Paper: SIGMOD09_dvk.pdf
Presentation: SIGMOD09_dvk.ppt

BibTeX Entry

@inproceedings{SIGMOD09::dvk,
   author    = {Zhaoqi Stella Chen and Dmitri V.\ Kalashnikov and Sharad Mehrotra},
   title     = {Exploiting Context Analysis for Combining Multiple Entity Resolution Systems},
   booktitle = {Proc.\ of ACM SIGMOD International Conference on Management of Data (ACM SIGMOD 2009)},
   year      = {2009},
   month     = {June 29--July 2},
   address   = {Providence, RI, USA}
}


Back to Kalashnikov's homepage

© 2011 Dmitri V. Kalashnikov. All Rights Reserved.