Query Processing and Optimization in Information-Integration Systems
Chen Li
Ph.D. Thesis, Stanford University, August 2001

The advent of the Internet provides us the access to many autonomous and heterogeneous information sources. The purpose of information integration is to support seamless access to these data sources. To deal with source heterogeneity, many systems use a mediation architecture, in which a mediator processes user queries by accessing source data. There are two approaches to information integration: the source-centric approach (taken by the TSIMMIS system at Stanford), and the query-centric approach (taken by many systems, such as the Information Manifold at AT&T). My thesis focuses on efficient query processing in both approaches. In the first approach, I work on how to process queries efficiently when sources have limited capabilities of answering queries. In the second approach, I develop query-optimization techniques, such as how to use mediator caching to improve query performance, and how to generate efficient rewritings of queries using views. Most of my thesis work is developed in the TSIMMIS project at Stanford University.