Press release
September 9, 2009

Carey, Li Awarded NSF Grant to Develop Technologies for Storing and Analyzing Semi-Structured Data

Grant part of $2.7 million grant across three University of California campuses

Michael Carey and Chen Li, professors of Computer Science at the University of California, Irvine’s Donald Bren School of Information and Computer Sciences, have been awarded a $1.8 million grant from the National Science Foundation’s Data-Intensive Computing Program. The project entitled “ASTERIX: A Highly Scalable Parallel Platform for Semistructured Data Management and Analysis” will research and develop new technologies for storing and analyzing semi-structured data.

Semi-structured data is a type of data that does not conform with the formal structure of tables and data models associated with databases. They do, however contain tags or other markers to separate semantic elements and hierarchies of records and fields within the data. The amount of semi-structured data is increasing rapidly as the Internet has allowed for information sets beyond traditional full-text documents and databases to exist.

“The evolution of the ‘human Web’, powered by HTML and HTTP, has revolutionized the way that people find information, buy things, communicate, and collaborate,” says Carey. “Web services and semi-structured data formats are having a similar impact on the ‘machine Web’.”

Semi-structured data formats like XML (Extensible Markup Language) and JSON (JavaScript Object Notation) – sets of rules for encoding documents electronically – are enriching our ability to find and interchange information. Industry-specific needs have created XML-based data exchange standards, XML backbones have gained adoption in support of service-oriented architectures and software-as-a-service initiatives. Other semi-structured formats, like JSON, are playing similar roles for Web-based services, and XML is being increasingly used for its original purpose of semantic document markup. Project ASTERIX will develop new technologies for storing, querying, analyzing, and mining the rapidly increasing quantity of such semi-structured data.

“Semi-structured data has been widely used in many popular Web services such as Google Map and eBay,” says Li.

The funds are a part of a larger $2.7 million grant to be distributed over three years across three UC campuses. Carey and Li will be collaborating with Yannis Papakonstantinou and Alin Deutsch of UC San Diego and Vassilis Tsotras of UC Riverside. Previously, Carey and Li were awarded $132,000 in seed funding in support of ASTERIX from a UC Discovery Grant and eBay.

A Bren Professor in Information and Computer Sciences, Carey’s research interests are in database systems, information integration, service-oriented computing, middleware, distributed systems, and computer system performance evaluation.

Li's research interests are in the fields of databases and information systems, including text search, data cleansing, data integration, and distributed data systems.

Media interested in interviewing ICS faculty, students or alumni should contact Matt Miller at (949) 824-1562 or via email at