Press release
May 28, 2010

Li, Xie awarded Intel grant to study compression of personal human genome data

Computer science professors Chen Li and Xiaohui Xie have been awarded a $100,000 grant from Intel. The grant will support a research on genome compression and direct querying of compressed genomic data. Li and Xie will be collaborating on the research with colleagues at China's Northeastern University.

Compression of genomic data comes at a time when personalized medicine has become the new medical paradigm and the cost of mapping an individual's genome is rapidly declining. Some estimates suggest that the cost to sequence individual genomes may be less than $1,000 in a couple years. These haploid human genomes are large and consist of 3 billion letters representing the nucleic acid bases (A, C, G, T) that make up our genome.

Storage and searchability of these data sets is critical in personalized medicine, where individual patient care is designed and optimized to that patient's preventative and therapeutic needs. Li and Xie demonstrated the possibility of compressing these large data sets without losing any information to less than 4 million bytes (about 4 megabytes) — small enough to send as an email attachment — in the journal of Bioinformatics last year.

The Intel grant will support Professors Li and Xie's research efforts to further optimize the compression algorithm they proposed. In addition, the award will support a new research project on how to query human genomes, such as returning sequences at a particular location or identifying all sequences perfectly and approximately matched to a query string. The goal of the research effort is to make manipulation of the genome data much easier and more efficient, and significantly improve the practice of personal genomics.

Associate Professor Li's research interests are in the fields of database and information systems, including Web search, data integration, data cleansing, data-intensive computing

Assistant Professor Xie's research focuses in machine learning, bioinformatics, computational biology and neural computation. He is interested in both developing novel machine learning theory and algorithms, and applying them to practical problems, such as biology and medical science.