Chen Li
Archived News
From August 2012 to June 2013, I was the Vice Chair of Department of Computer Science.
(6/10/2013) We are very excited to release
our AsterixDB Beta! Here are some pictures at our celebrarion lunch in Laguna Beach.
(4/2013)
Attending DASFAA
2013 in Wuhan, China. Sharad and I gave a talk for our 10-year best
paper award. Here's
a picture at
the ceremony. Here are our slides:
[Chen's PPT],
[Sharad's PPT].
(4/2013) Two papers collaborated with my Chinese colleagues were
accepted by SIGMOD 2013, one titled "String Similarity Measures and
Joins with Synonyms" with Jiaheng Lu and Chunbin Lin at Renming University, and one titled
"Improving Regular-Expression Matching on Strings Using Negative
Factors" with Xiaochun Yang et al at Northeastern University.
(4/6/2013) We are very excited to release
our AsterixDB
alpha! Here are some pictures at our celebrarion dinner. Stay tuned for the beta release, which is coming soon!
(3/2013) Prof. Xiaohui and I received
an NIH
grant of $662K on assembling complete individual genomes. I am
working with a team to do efficient genome assembly using parallel computing in our ASTERIX project.
(2/2013) Our PhD student, Alex Behm, has graduated and will join
Cloudera. See the pictures taken at his party.
(1/13/2013) Our DASFAA 2003 paper titled "Efficient Record Linkage in
Large Data Sets" received the 10-year Best Paper Award for DASFAA 2013.
It was my first paper in the area of data cleaning and approximiate
string search in the context of
the Flamingo project.
(1/7/2013) This quarter I am teaching CS122B
titled "Projects in Databases and Web Applications."
(11/6/2012) On the Election Day I gave an invited talk about Election and ASTERIX at
the ACM GIS BigSpatial workshop in Redondo Beach, CA.
(11/5/2012) I was invited to write an article titled "Entrepreneurship
in Data Management Research" at the ACM SIGMOD Blog.
(11/1/2012) "Full professor-ed" :-)
(9/27/2012) This quarter I am teaching CS222/CS122C
titled "Principles of Data Management." For the first time it's co-listed
as a undergraduate course CS122C since we want to encourage undergraduate
students to get familiar with "what's inside a DBMS system" earlier.
(9/2012) I visited several universties and companies in China to talk
about our research on powerful search and ASTERIX.
(8/2012) I gave a talk
titled "Search as You Type: From Research to Commercialization" at the
DBRank 2012 workshop at VLDB in Istanbul, Turkey.
(8/2012) I gave a talk titled "Supporting Efficient Top-k Queries in
Type-Ahead Search" at SIGIR.
(5/2012) Our paper titled "Supporting Efficient Top-k Queries in
Type-Ahead Search" with Tsinghua colleagues (Guoliang Li, Jiannan Wang,
and Jianhua Feng) got accepted by SIGIR. It is amazing to see how
reviewers from different communities (Databases and Information
Retrieval) have so different tastes :-)
(5/2012) Our paper titled "Executing SQL over encrypted data in the
database-service-provider model" received
ACM
SIGMOD 2012 Test-of-Time Award. The
paper, published 10 years ago, envisioned the "Database as a service"
model.
(4/2012) This quarter I am again
teaching CS122B
titled "Projects in Databases and Web Applications". I am also
organizing the CS Seminar Series.
(3/2012) I gave a talk at University of Toronto
titled Improving
Search for Emerging Applications.
(3/2012) We recently released a paper
titled Analysis
of Instant Search Query Logs. It is based on our study to analyze the log of
our instant, fuzzy search system
called PSearch. We compared
it with a traditional search
system and showed the benefits of the new search paradigm. Some
user behavior patterns are very interesting.
(2/2012) I am glad to receive the 2012 ICS Dean's Award for Graduate
Student Mentoring.
(1/2012) We released an improved version of the source code of
the Hobbes project.
(12/2011) Our paper titled Hobbes: optimized gram-based methods for
efficient read alignment was published by Nucleic Acids Research.
(9/2011) This quarter I am
teaching CS122B
titled "Projects in Databases and Web Applications". I am also
organizing the CS Seminar Series.
(9/2011) Check OmniPlaces.com, a location-based
search engine to demonstrate the technology of Bimaple. It also has an
iPhone App.
(9/2011) Check a cool system built by our students, Sattam Alsubaiee
and Zachary Heilbron, to support spatial aggregation on Twitter data
using ASTERIX.
(8/26/2011) Our PhD student, Rares Vernica, co-advised by Prof. Mike
Carey, has successfully graduated and will join HP Labs. Here's
a picture
of our celebration. We will surely miss Rares!
(8/2011) Our MS student, Nagesh Honnalli, has successfully graduated and will join Amazon. Here's
a picture
of our celebration.
(7/2011) Check my blog on instant search.
(7/8/2011) We are glad to release the first
software to
support instant fuzzy search on large data sets.
(6/24/2011) Check the video clip on
the Bimaple homepage to show
location-based instant, fuzzy search on iPhone and
a live demo on more than
17 million records.
(6/17/2011) I advised a group of students to participate in the
Microsoft
Speller Challenge and won the third place. Congratulations to the
team! Here is
our qSpeller project page for the Microsoft Speller Challenge.
(5/18/2011) Bimaple released a prototype to do location-based
instant, fuzzy search. To our best knwoledge, it is the first system
that can do this type of search in a unified framework.
(5/2011) We (my Tsinghua colleagues and I) released
our CHIME demo to support
error-tolerant Chinese input. It's based on our coming IJCAI 2011
paper.
(4/22/2011) I gave an invited talk titled "The Flamingo Software
Package on Approximate String Queries" at
the DQIS
2011 workshop in Hong Kong. Here is the Powerpoint file.
(4/2011) Our paper titled "ASTERIX: Towards a Scalable, Semistructured Data
Platform for Evolving-World Models" by the ASTERIX project has been
accepted for publication in Distributed and Parallel Databases.
(4/2011) Our paper titled "An Efficient Error-Tolerant Chinese Pinyin Input Method"
with Tsinghua collaborators (Yabin Zheng and Maosong Sun) has been
accepted for publication in IJCAI 2011. It's my first paper in this
conference :-)
(4/2011) Our paper titled "Location-Based Instant Search" with my
graduated student Shengyue Ji has been accepted by
the SSDBM conference.
(4/2011) I am glad to launch
the Hobbes project on genome
sequence mapping.
(3/26/2011) This quarter I am again
teaching CS122B:
Projects in Databases and Web Applications.
(2/2011) My PhD student, Shengyue Ji, has just graduated and joined
the "Don't be evil" company.
(2/2011) Check a new system prototype Bimaple built to support instant, error-tolerant
search on Stack Overflow messages.
(1/2/2011) This quarter I am
teaching CS122B:
Projects in Databases and Web Applications.
(1/2/2011) The company I am starting, Bimaple, is hiring:
http://www.bimaple.com/jobs.html.
(12/5/2010) On the weekend of Dec. 4-5, I attended
the Random
Hacks of Kindness (RhoK) in Chicago. Together with three other people on
a team and my UCI students, Manik Sikka, Vijay Rajakumar, and Inci
Centindl, we did a project of
supporting full-text
search on
the Person
Finder project on
the Google App Engine
platform. Our project won the third-best-project prize.
(11/2010) Check my new "photo" above. Thanks
to Heri Ramampiaro for taking the
nice picture :-)
(10/2010) Our paper titled "Answering Approximate String Queries on
Large Data Sets Using External Memory" with Alexander Behm and Michael
Carey has been accepted by ICDE 2011.
(10/23/2010) We are glad to release
the Flamingo Package
Version 4.0.
(10/2010) My student, Shengyue Ji, received a Yahoo! Best Dissertation
Student Award.
(9/2010) My student, Alex Behm, received an ARCS scholar award.
(9/2010) Together with Professor Xiaohui Xie, I am receving
an NIH
grant to support our research on
the iPubMed system.
(9/2010) I am teaching CS222: Principles of Data Management this quarter.
(8/2010) On August 14, 2010, I gave a talk about scalable interactive
search at
the NFIC
conference. Here is my talk slides.
(6/2010) On June 29, I gave a talk about set-similarity joins using
Hadoop at the Yahoo
Hadoop Summit. Here is my talk file.
(5/2010) Together with Prof. Xiaohui Xie, we received an Intel grant to study compression of personal human genome data. See the ICS news for details. This is a collaboration with our colleagues, Bin Wang and Xiaochun Yang, at the
Northeastern University in China.
(4/2010) Ray wins a Yahoo! Key Scientific Challenge award:
Here is
the Yahoo announcement and
ICS news.
(4/2010) DASFAA excellent demo: Our demo won a DASFAA excellent
demo award.
(3/2010) Source-code/Demo Releases: My research team released
the flamingo package
version
3.0, source
code of fuzzy joins using MapReduce, and
demos
of supporting fuzzy keyword search on spatial data (such as maps).
(3/2010) Teaching: CS223 - Transaction Processing and Distributed Data Management
(3/2010) New NSF Grant: We are glad to receive an NSF
award 1030002
to support research on powerful keyword search with efficient indexing
structures and algorithms in a cloud-computing environment, especially in
the domain of
family reunification in disasters
such as the Haiti Earthquake.
(2/28/2010) Chile Earthquake Family Reunification: My team is
working on family reunification in the Chile Earthquake. Here is
the project home page.
(2/28/2010) ICDE 2010: Busy with local arrangements at ICDE 2010 in Long Beach.
(2/2010) Media article on our Haiti Project: On Feb. 8,
the UCI homepage published
an article
to report our Haiti Family
Reunification Project
(2/2010) SIGMOD 2010 paper: Our paper titled "Efficient
Parallel Set-Similarity Joins Using MapReduce" with Rares Vernica and Mike
Carey has been accepted by ACM SIGMOD 2010. The paper studies how to do
set-similarity joins (such as record linkage) on large amounts of data
using MapReduce.
(1/2010) Haiti Earthquake Family Reunification: My team is
working on getting data about missing people in the Haiti Earthquake and
doing powerful search on it. Here is
the project home page.
(1/2010) Teaching: This quarter I am again teaching CS122B: Projects in Database Management.
(11/2009) iPubMed: Check out our new iPubMed system
co-developed by my team and Tsinghua University to support type-ahead, fuzzy search on more than 18 million
MEDLINE records.
(9/2009) Life after Sabbatical: I am teaching two courses this quarter:
CS122B: Projects in Database Management, and
CS295: Database Management
and Information Retrieval .
(9/2009) VLDB 2009 Tutorial: Marios
Hadjieleftheriou and I gave a tutorial at VLDB 2009 on approximate string matching.
Here are the slides: [Part I],
[Part II].
Here are the slides of our ICDE09 tutorial:
[Part I],
[Part II].
(9/2009) NSF Funding for ASTERIX: The multi-UC-campus project ASTERIX led by Prof. Mike Caey and me has been funded at $2.7M for three years from the
NSF Data Intensive Computing program. The project, based at UCI, also includes UCSD and UCR participants. UCI's share is $1.8M.
(6/2009) Summer: I will be visiting colleagues at Tsinghua University, China in the summer. I will also work with several colleagues in China during the visit.
(5/2009) PSearch News:
Read this NACS news article about our PSearch prototype.
(5/2009) Our research needs a student: We are looking for an undergraduate
or MS student for a research project. The details
are here.
(4/2009) Students' award: I am proud that two of our ISG students,
Shengyue Ji and Mingya Gao, together with Wen Pu from UIUC, have been selected as one of the
five finalist teams for the SIGMOD 2009 programming
contest (Main Memory Transactional Index).
(4/2009) Dean's Award for Mid-Career Research: I am glad to receive the
ICS Dean's Award for Mid-Career Research.
(3/27/2009) Pictures of my home where I grew up : I had a
trip to my hometown in Jinan, Shandong, China. I took several pictures
of the home where I grew up as a child.
(3/2009) Startup: I have officially started a company BiMaple to support a
novel, powerful way to do search.
(3/2009) Launching new project: I am glad to officially launch TASTIER: a joint research project with Tsinghua University on
efficient auto-complete and type-ahead search on large data sets. .
(3/2009) New SIGMOD 2009 paper: Our paper titled "Type-Ahead Search on Relational Data: a TASTIER Approach" by Guoliang Li, Shengyue Ji, Chen Li, and Jianhua Feng has been accepted by the SIGMOD 2009 conference.
(2/2009) New NSF award: We are glad to receive an NSF award IIS-0844574 from the NSF CluE program to support our research on large-scale data cleaning using MapReduce/Hadoop environments. In addition to receiving the NSF support, we will also use software and services on a
Google-IBM cluster
to explore innovative research ideas in data-intensive computing.
(1/2009) New WWW2009 paper: Our paper titled "Efficient Interactive Fuzzy Keyword Search" by Shengyue Ji, Guoliang Li, Chen Li, and Jianhua Feng has been accepted by the WWW 2009 conference.
(11/2008) Launch of our new ISG group home page: Check out
this new page of our Information
Systems Group (ISG)!
(11/2008) First paper on bioinformatics: My
first paper on bioinformatics titled "Human genomes as email
attachments" has been published on the
journal Bioinformatics.
We used novel techniques to compress a human genome from 3.2GB to 4.1MB.
From the date we submitted the paper (Oct. 7, 2008) to the date it was
published online (Nov. 7, 2008), it took just one month! The PDF is
available
at here.
It was once the No. 1 most-frequently read article in the Journal of
Bioinformatics in January and February of 2009 according to the
following link
(as of March 2009).
(10/2008) Flamingo Release 2.0: we are glad to
release version
2.0 of the package to sup\
port fuzzy string search.
Version 2.0.1
(released on Nov. 7, 2008) fixed
compatibility issues for GCC 4.3.2.
(9/2008) New funding award from China: Together
with Prof. Xiaochun Yang from
Northeastern University of China, I received a funding award from the
"Research
Funds for Oversea Scholars" program of the
National Natural
Science Foundation of China. It will support our research on fuzzy
search on text documents.
(9/2008) Sabbatical: I am on sabbatical this year. I will be
mainly at UCI.
(9/2008) New PhD students: Two new PhD students, Minh Doan
and Sattam Mubark Alsubaiee, have joined our research team.
(9/2008) New ICDE2009 Publications: We have two full research
papers accepted by ICDE 2009:
"Space-Constrained Gram-Based Indexing for Efficient Approximate String
Search," by Alexander Behm, Shengyue Ji, Chen Li, and Jiaheng Lu;
"Best-Effort Top-k Query Processing Under Budgetary Constraints," by
Michal Shmueli-Scheuer, Chen Li, Yosi Mass, Haggai Roitman, Ralf Schenkel,
and Gerhard Weikum. In addition, I will be presenting a tutorial titled
"Efficient Approximate Search on String Collections" with Marios
Hadjieleftheriou (from AT&T Labs--Research).
(8/2008) Mike Carey joined us! We are extremely happy that
Prof. Mike Carey
has joined
our department.
(7/3/2008) Launching Search@ICS: I am glad to our research
prototype has been launched on the ICS
Homepage that can support interactive, fuzzy search for ICS people and
general pages at ICS.UCI.EDU.
(4/1/2008) Launching PSearch: I am glad to release
the PSearch Prototype to support
interactive, fuzzy search for UCI Directory.
(3/31/2008) This quarter I am teaching CS122B
and CS224.
(2/22/2008) New SIGMOD08 paper: The conference has accepted
our paper titled "Cost-Based Variable-Length-Gram Selection for String Collections to
Support Approximate Queries Efficiently", a joint work with Bin Wang and Xiaochun
Yang when they visited our place last fall. The paper solves several open,
important problems not addressed in our VLDB07 VGRAM paper.
(2/1/2008) New Visitor: I am glad that Guoliang Li from
Tsinghua University is visiting my research team for about four months.
(12/12/2007) Today I attended a local computer industry forum
about the computer cluster workforce in Orange County. There is
an excellent survey
on the needs of computer cluster workforce in the county. One interesting
finding is that the county is facing the challenge of not being able
to find enough workers in the IT industry. The survey also gives
us some thoughts on how we design our education curriculum to meet the
need of the industry.
(12/2007) I am looking for a motivated BS/MS student for an
independent research project. Requirements: strong java
programming skills. Please contact me if you are interested.
(10/2007) New paper on approximate string matching: Our
recent paper titled "Efficient Merging and Filtering Algorithms for
Approximate String Searches" by Chen Li, Jiaheng Lu, and Yiming Lu
will appear in ICDE 2008. We developed new algorithms and indexing
structures that can significantly improve the performance of
approximate string search.
(10/2007) New NSF Grant: We received an NSF grant of $95K
for our proposal titled "SGER: Answering Approximate String Queries
Using Variable-Length Grams."
(8/2007) Visitors: Bin Wang and Xiaochun Yang are visiting our
team again this summer. We will continue working on topics related to
approximate query answering.
(8/2007) New PhD student: I am glad that Alex Behm has joined
our research team as a new PhD student.
(6/2007) Summer: My students, Ray and Yiming, will be doing
summer internships at Microsoft Research and IBM T.J. Watson,
respectively. I will be traveling early summer in China, attending
conferences and visting schools and companies. After that, I will be
working with my students, postdoc, and visitors at UCI. There are
several
very exciting ideas I would like to pursue.
(6/2007) Tenured.
(6/2007) VGRAM for VLDB07: Our paper titled "VGRAM:
Improving
Performance of Approximate Queries on String Collections
Using Variable-Length Grams" by
Chen Li, Bin Wang, and Xiaochun Yang will appear in VLDB 2007.
I am glad that the reviewers liked the work as much as we do.
(4/17/2007) Flamingo 1.0 Release: I am glad to release our
Flamingo Package
1.0 on approximate string matching.
(4/17/2007) Release of Web-object-history data: I am glad to release our
data set of the
history of data objects collected from 6 web sites in 1.5 years.
(4/2007) SIGMOD07 Undergraduate Scholarship Program: I am
chairing this program. Click here for more
information.
(4/2007) Teaching: This quarter I am teaching CS223 (formerly ICS214B) -
Transaction Processing and Distributed Data Management.
(1/2007) Teaching: This quarter I am teaching CS122B (formerly ICS185), Projects in Database Management.
(12/2006) Research Funds: I received an ICS
Ted & Janice Smith Faculty Seed Fund and an ICS CORCLR research/travel fund.
(12/2006) NSF Proposals: My team and I submitted two proposals
to the NSF
IIS program. Both proposals are based on our observations on several
critical problems the solutions of which are greatly needed by many real
applications.
(9/2006) New Project on Family Reunification: Ray and I have
started working on a new project called Family Reunification. It's a
data-integration project using real data from many Web sources. It's part
of the RESCUE project. More information will come soon.
(9/2006) Release of SEPIA 1.0: Ray has released SEPIA
1.0 on selectivity estimation of fuzzy string predicases based on
our VLDB 2005 paper.
(9/2006) New Junior Specialist: We have a new junior
specialist, Jiaheng Lu, who is joining our research team. He's
expecting his PhD from the National University of Singapore. He will
be working on projects related to data integration.
(9/2006) Google Research Award: I received a Google Research
Award in the amount of $37,500 renewable for a second year. It will be
used to support my research on data cleaning, especially on approximate
string searching. I am very thankful for their support, especially since
this is the largest support I received from the industry.
(7/2006) Work on Data Exchange: Recently I finished a
technical report with Foto Afrati and Vassia Pavlaki (at NTUA, Greece)
titled "Data Exchange with Arithmetic Comparisons." It is a work we
have been working on for almost one year: all of us went to Stanford
for one week, and Vassia visited UCI twice. It took us a lot of time
to think about all the subtle issues that are not covered in the excellent
paper on data exchange by Fagin et al. I am glad that finally we
completed the work, and I really like it.
(6/2006) Summer: My student, Ray, is doing a summer internship at
Yahoo!. My other students are working with me during the summer. I
will have two visitors (Xiaochun Yang and Bin Wang).
I will visit a few places (IBM, SRI,
Yahoo, Google, possibly Toronto, and VLDB in Korea). Well, these will
keep me busy enough, not to mention I have two sons to play with :-)
(5/2006) New PhD Student: I am glad that a new student, Yiming Lu,
is joining our PhD program soon. He graduated from Shanghai Jiaotong
University with a BS and an MS, and has been working on data quality
at Microsoft
Research Asia.
(5/2006) Work on Query Relaxation: Our paper titled Relaxing Join and
Selection Queries (joint work with Nick Koudas, Anthony Tung, and
my student, Rares Vernica) will appear in VLDB 2006, Seoul, Korea. It
is about how to relax empty-answer SQL queries in RDBMS in order to
compute answers for users with a minimal relaxation. We use skyline
as our relaxation framework, in which we need to consider join
conditions as well. The work extends our previous work on supporting
approximate query answering in applications such as data cleaning.
See our two VLDB'2005 papers on similar topics.
(5/2006) CleanDB Workshop: I am currently organizing the
CleanDB Workshop with Dongwon Lee. It will be
colocated with VLDB2006 in Seoul,
Korea.
(5/2006) New Release of StringMap: I spent some days
cleaning the StringMap code that supports approximate string searches
and joins. The new release is available at here.
(4/2006) $$ from M$R: In April 2006, I received an unrestricted
gift fund from Microsoft Research. I want to thank them for their
generous support. It's very encouraging, and I wish to receive more
support from the industry in the future.