Chen Li's Publications

Refereed Conference Full Papers

1.     AsterixDB: A Scalable, Open Source BDMS, Sattam Alsubaiee, Yasser Altowim, Hotham Altwaijry, Alexander Behm, Vinayak R. Borkar, Yingyi Bu, Michael J. Carey, Inci Cetindil, Madhusudan Cheelangi, Khurram Faraaz, Eugenia Gabrielova, Raman Grover, Zachary Heilbron, Young-Seok Kim, Chen Li, Guangqiang Li, Ji Mahn Ok, Nicola Onose, Pouria Pirzadeh, Vassilis J. Tsotras, Rares Vernica, Jian Wen, Till Westmann:. PVLDB 7(14): 1905-1916 (2014)

2.     Storage Management in AsterixDB, Sattam Alsubaiee, Alexander Behm, Vinayak R. Borkar, Zachary Heilbron, Young-Seok Kim, Michael J. Carey, Markus Dreseler, Chen Li, PVLDB 7(10): 841-852 (2014)

3.     Efficient instant-fuzzy search with proximity ranking, Inci Cetindil, Jamshid Esmaelnezhad, Taewoo Kim, Chen Li. ICDE 2014: 328-339

4.     Efficient direct search on compressed genomic data, Xiaochun Yang, Bin Wang, Chen Li, Jiaying Wang, Xiaohui Xie, ICDE 2013: 961-972

5.     Improving regular-expression matching on strings using negative factors, Xiaochun Yang, Bin Wang, Tao Qiu, Yaoshu Wang, Chen Li, SIGMOD Conference 2013: 361-372

6.     String similarity measures and joins with synonyms, Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, Haiyong Wang, SIGMOD Conference 2013: 373-384

7.     Supporting Efficient Top-k Queries in Type-Ahead Search, Guoliang Li, Jiannan Wang, Chen Li, Jianhua Feng, SIGIR 2012. [PDF], [PPTX], [Demo]

  1. Inside “Big Data Management”: Ogres, Onions, or Parfaits? Vinayak Borkar, Michael J. Carey, and Chen Li, EDBT 2012. [PDF]
  2. Location-Based Instant Search, Shengyue Ji, Chen Li, SSDBM 2011: 17-36. [PDF]
  3. CHIME: An Efficient Error-Tolerant Chinese Pinyin Input Method, Yabin Zheng, Chen Li, Maosong Sun, IJCAI 2011, 2551-2556. [PDF], [Demo]
  4. Answering Approximate String Queries on Large Data Sets Using External Memory, Alexander Behm, Chen Li, and Michael Carey, ICDE 2011. [PDF] [Source Code]
  5. Supporting Location-Based Approximate-Keyword Queries, Sattam Alsubaiee, Alexander Behm, and Chen Li, ACM GIS 2010. [PDF] [PPT] [Source Code and Demos]
  6. Hybrid Indexing and Seamless Ranking of Spatial and Textual Features of Web Documents, Ali Khodaei, Cyrus Shahabi, Chen Li, DEXA 2010. [PDF]
  7. Efficient Parallel Set-Similarity Joins Using MapReduce. Rares Vernica, Michael J. Carey, Chen Li, SIGMOD 2010, [PDF], [ source code]
  8. Type-Ahead Search on Relational Data: a TASTIER Approach, Guoliang Li, Shengyue Ji, Chen Li, and Jianhua Feng, SIGMOD 2009. [PDF], [PPTX].
  9. Efficient Interactive Fuzzy Keyword Search, Shengyue Ji, Guoliang Li, Chen Li, and Jianhua Feng, WWW 2009. [PDF], [PPTX]
  10. Best-Effort Top-k Query Processing Under Budgetary Constraints, Michal Shmueli-Scheuer, Chen Li, Yosi Mass, Haggai Roitman, Ralf Schenkel, and Gerhard Weikum, ICDE 2009. [PDF], [PPT]
  11. Space-Constrained Gram-Based Indexing for Efficient Approximate String Search, Alexander Behm, Shengyue Ji, Chen Li, and Jiaheng Lu, ICDE 2009. [PDF], [PPTX]
  12. Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently, Xiaochun Yang, Bin Wang, and Chen Li, ACM SIGMOD 2008. [PDF], [PPT]
  13. Efficient Merging and Filtering Algorithms for Approximate String Searches, Chen Li, Jiaheng Lu, and Yiming Lu. ICDE 2008. [PDF], [PPT], [Source Code].
  14. Data Exchange with Arithmetic Comparisons, Foto Afrati, Chen Li, and Vassia Pavlaki. EDBT 2008. [PDF]
  15. VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams, Chen Li, Bin Wang, and Xiaochun Yang. VLDB 2007. [PDF], [PPT]
  16. Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems, Ramaswamy Hariharan, Bijit Hore, Chen Li, Sharad Mehrotra, SSDBM 2007. [PDF]
  17. Protecting Individual Information Against Inference Attacks in Data Publishing, Chen Li, Houtan  Shirani-Mehr, and Xiaochun  Yang. DASFAA 2007. [PDF]
  18. Supporting Approximate Similarity Queries with Quality Guarantees in P2P Systems, Qi Zhong, Iosif Lazaridis, Mayur Deshpande, Chen Li, Sharad Mehrotra, Hal Stern, COMAD 2006, December 14-16, 2006, Delhi, India. [PDF]
  19. Relaxing Join and Selection Queries. Nick Koudas, Chen Li, Anthony Tung, and Rares Vernica. VLDB 2006, Seoul, Korea, 2006.  (13.2% accepted) [PDF], [PPT], [Source Code]
  20. Selectivity Estimation for Fuzzy String Predicates in Large Data Sets, Liang Jin and Chen Li. VLDB 2005, Trondheim, Norway, August 30 - September 2, 2005. (16% accepted) [PDF], [PPT], [Source Code].
  21. Indexing Mixed Types for Approximate Retrieval, Liang Jin, Nick Koudas, Chen Li, Anthony K.H. Tung.VLDB 2005, Trondheim, Norway, August 30 - September 2, 2005. (16% accepted) [PDF], [PPT], [Source Code].
  22. Secure XML Publishing without Information Leakage in the Presence of Data Inference. Xiaochun Yang and Chen Li. VLDB, Toronto, Canada, August 29 - September 3, 2004. [PDF], [PPT]. (16% accepted)
  23. NNH: Improving Performance of Nearest-Neighbor Searches Using Histograms. Liang Jin, Nick Koudas, Chen Li. EDBT, Crete, Greece, March 2004. (14% accepted) [PDF], [Full version], [PPT]
  24. On Containment of Conjunctive Queries with Arithmetic Comparisons. Foto Afrati, Chen Li, Prasenjit Mitra. EDBT, Crete, Greece, March 2004. (14% accepted) [PDF].
  25. Materializing Views with Minimal Size to Answer Queries. Rada Chirkova and Chen Li. ACM PODS, June 2003, San Diego, CA. (20% accepted). [PDF], [PPT]
  26. Efficient Record Linkage in Large Data Sets, Liang Jin, Chen Li, and Sharad Mehrotra, in the 8th International Conference on Database Systems for Advanced Applications (DASFAA 2003) 26 - 28 March, 2003, Kyoto, Japan. (33% accepted) [PS], [PDF], [PPT], [Source Code]. Received DASFAA 2013 10-year Best Paper Award.
  27. Executing SQL over Encrypted Data in the Database-Service-Provider Model. Hakan Hacigumus, Bala Iyer, Chen Li, and Sharad Mehrotra. In ACM SIGMOD, June 3-6, 2002 Madison, Wisconsin. (18% accepted). Received SIGMOD 2012 10-year Test-of-Time Award. [PDF]
  28. Answering Queries Using Views with Arithmetic Comparisons. Foto Afrati, Chen Li, and Prasenjit Mitra. In ACM Symposium on Principles of Database Systems (PODS), June 3-6, 2002 Madison, Wisconsin. (22% accepted)
  29. Generating Efficient Plans for Queries Using Views. Foto Afrati, Chen Li, and Jeff Ullman. In the Proc. of the 30th ACM SIGMOD Conference, Santa Barbara, CA, May, 2001. (15% accepted) [PS] [PDF] [PPT]
  30. Minimizing View Sets without Losing Query-Answering Power. Chen Li, Mayank Bawa, and Jeff Ullman. In the 8th International Conference on Database Theory (ICDT), London, UK, January, 2001. [PS] [PDF], [PPT]. Full version: [PS] [PDF]. (35% accepted)
  31. On Answering Queries in the Presence of Limited Access Patterns. Chen Li and Edward Chang. In the 8th International Conference on Database Theory (ICDT), London, UK, January, 2001. [PS] [PDF] [PPT]. (35% accepted)
  32. Query Planning with Limited Source Capabilities. Chen Li and Edward Chang. International Conference on Database Engineering (ICDE), pages 401-412, San Diego, CA, February, 2000. (14% accepted) [PS] [PDF] [PPT]. Full version: [PS] [PDF]
  33. Computing Capabilities of Mediators. Ramana Yerneni, Chen Li, Hector Garcia-Molina, Jeffrey Ullman. SIGMOD'99, Philadelphia, PA, May 1999. (20% accepted) [PS] [PDF]. Full version: [PS] [PDF]
  34. Optimizing Large Join Queries in Mediation Systems. Ramana Yerneni, Chen Li, Jeffrey Ullman, Hector Garcia-Molina. International Conference on Database Theory (ICDT), Jerusalem, Israel, January, 1999. (29% accepted) [PS] [PDF]. Full version: [PS] [PDF]
  35. Searching Near-Replicas of Images via Clustering. Edward Chang, Chen Li, James Wang, Peter Mork, and Gio Wiederhold. Proc. of SPIE Symposium of Voice, Video, and Data Communications, Multimedia Storage and Archiving Systems VI, pages 281-292, Boston, MA, September, 1999. [PS] [PDF]
  36. RIME: A Replicated Image Detector for the World-Wide Web. Edward Chang, James Ze Wang, Chen Li, and Gio Wiederhold. Proceedings of SPIE Symposium of Voice, Video, and Data Communications, pages 58--67, Boston, MA, November 1998. [PS] [PDF]
  37. 2D BubbleUp: Managing Parallel Disks for Media Servers. Edward Chang, Hector Garcia-Molina, and Chen Li. The 5th International Conference of Foundations of Data Organization (FODO), pages 221-230, Kobe, Japan, 1998. [PS] [PDF]
  38. Performance Analysis of the Communication Mechanism for POE Workstation Cluster. Weiqiang Zhuang, Chen Li, Meiming Shen. Microcomputer & Micro-system, Jan, 1995

Refereed Journal Articles

  1. Hobbes: optimized gram-based methods for efficient read alignment, Athena Ahmadi, Alexander Behm, Nagesh Honnalli, Chen Li, Lingjie Weng, and Xiaohui Xie, Nucleic Acids Research 2011; doi: 10.1093/nar/gkr1246. [PDF]
  2. SKIF-P: a point-based indexing and ranking of web documents for spatial-keyword search, Ali Khodaei, Cyrus Shahabi, and Chen Li, Geoinformatica, Springer, 2011. [PDF]
  3. Supporting BioMedical Information Retrieval: The BioTracer Approach, Heri Ramampiaro and Chen Li, In Transactions on Large-Scale Data- and Knowledge-Centered Systems (TLDKS), 2011, No.4. Vol. 6990, Springer. pp. 73–94. [PDF]
  4. ASTERIX: towards a scalable, semistructured data platform for evolving-world models. Alexander Behm, Vinayak R. Borkar, Michael J. Carey, Raman Grover, Chen Li, Nicola Onose, Rares Vernica, Alin Deutsch, Yannis Papakonstantinou, Vassilis J. Tsotras, Distributed and Parallel Databases,  2011, 29(3), 185-216. [PDF]
  5. Efficient fuzzy full-text type-ahead search, Guoliang Li, Shengyue Ji, Chen Li, Jianhua Feng:. VLDB J. 20(4): 617-640 (2011). [PDF]
  6. Interactive and Fuzzy Search: A Dynamic Way to Explore MEDLINE, Jiannan Wang, Inci Cetindil, ShengyueJi, Chen Li, Xiaohui Xie, Guoliang Li, Jianhua Feng, Journal of Bioinformatics, 2010. [PDF]
  7. Rewriting Queries using Views, Chen Li: Encyclopedia of Database Systems 2009: 2438-2441. [PDF]
  8. SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents, Guoliang Li, Chen Li, Jianhua Feng, Lizhu Zhou: Inf. Sci. 179(21): 3745-3762 (2009). [PDF]
  9. Human genomes as email attachments. Scott Christley, Yiming Lu, Chen Li, and Xiaohui Xie, Bioinformatics 25: 274-275 (2009). [PDF]. [Source Code]. It was the most downloaded article on the Web site of the Journal of Bioinformatics for two months.
  10. SEPIA: Estimating Selectivities of Approximate String Predicates in Large Databases. Liang Jin, Chen Li, and Rares Vernica.  VLDB Journal, Volume 17, Number 5, pages 1213-1229, August 2008. [PDF]
  11. Using Views to Generate Efficient Evaluation Plans for Queries Foto Afrati, Chen  Li, and Jeff Ullman, Journal of Computer and System Sciences, Volume 73, Issue 5, pages 703-724,  August 2007. [PDF]
  12. Rewriting Queries Using Views in the Presence of Arithmetic Comparisons, Foto Afrati, Chen  Li, and Prasenjit Mitra, Theoretical Computer Science, Volume 368, Numbers 1-2, pages 88-123, 2006. [PDF]
  13. Supporting Efficient Record Linkage for Large Data Sets Using Mapping Techniques, Chen Li, Liang Jin, and Sharad Mehrotra, World Wide Web Journal, Volume 9, Number 4, pages 557-584, December 2006. [PDF]
  14. Achieving Communication Efficiency through Push-Pull Partitioning of Semantic Spaces to Disseminate Dynamic Information, Amitabha Bagchi, Amitabh Chaudhary, Michael T. Goodrich, Chen Li, and Michal Shmueli-Scheuer. IEEE Transaction on Knowledge and Data Engineering (TKDE), October 2006 (Vol. 18, No. 10). [PDF]
  15. Answering Queries Using Materialized Views with Minimum Size. Rada Chirkova, Chen Li, and Jia Li. VLDB Journal (2006), Volume 15, Number 3, 191-210. [PDF]
  16. Recent Progress on Selected Topics on Database Research -- A Report from Nine Young Chinese Researchers Working in the United States. Zhiyuan Chen, Chen Li, Jian Pei, Yufei Tao, Haixun Wang, Wei Wang, Jiong Yang, Jun Yang, and Donghui Zhang. The Journal of Computer Science and Technology. Vol. 18, No. 5, Pages 538 - 552, September 2003. [PDF]
  17. Computing Complete Answers to Queries in the Presence of Limited Access Patterns. Chen Li. The VLDB Journal (2003) 12: 211-227 [PS] [PDF]
  18. Answering Queries with Useful Bindings. Chen Li and Edward Chang. ACM Transactions on Database Systems (TODS), Volume 26 , Issue 3 (September 2001).[PS] [PDF]
  19. Clustering for Approximate Similarity Search in High-Dimensional Spaces. Chen Li, Edward Chang, Hector Garcia-Molina, and Gio Wiederhold. IEEE Transaction on Knowledge and Data Engineering, Volume 14, Number 4, pp.792-808, July/August 2002 [PS] [PDF]


Refereed Workshop, Conference Demo Papers, Tutorials, and Other Publications

1.     ASTERIX: An Open Source System for "Big Data" Management and Analysis, Sattam Alsubaiee, Yasser Altowim, Hotham Altwaijry, Alexander Behm, Vinayak R. Borkar, Yingyi Bu, Michael J. Carey, Raman Grover, Zachary Heilbron, Young-Seok Kim, Chen Li, Nicola Onose, Pouria Pirzadeh, Rares Vernica, Jian Wen. PVLDB 2012 (demo).

2.     Big data platforms: what's next? Vinayak R. Borkar, Michael J. Carey, Chen Li. ACM Crossroads 19(1): 44-49, 2012.

3.     qSpell: Spelling Correction of Web Search Queries using Ranking Models and Iterative Correction. Yasser Ganjisaffar, Andrea Zilio, Sara Javanmardi, Inci Cetindil, Manik Sikka, Sandeep Katumalla, Narges Khatib, Chen Li, Cristina Lopes, Spelling Alteration for Web Search Workshop, July 2011. [PDF], [Dataset] (The authors won the third place in the Microsoft's speller challenge in 2011.)

  1. The Flamingo Software Package on Approximate String Queries. Chen Li, DASFAA Workshops 2011, 477. [PDF], [Source Code]
  2. Seaform: Search-As-You-Type in Forms, Hao Wu, Guoliang Li, Chen Li, Lizhu Zhou, VLDB 2010 (Demo). [PDF]
  3. Search-As-You-Type: Opportunities and Challenges, Chen Li, Guoliang Li, IEEE Data Eng. Bull. 33(1): 37-45 (2010). [PDF]
  4. Fuzzy Keyword Search on Spatial Data, Sattam Alsubaiee, Chen Li: DASFAA, Excellent Demo Award, 2010: 464-467. [PDF], [Demos]
  5. Efficient top-k algorithms for fuzzy search in string collections, Rares Vernica, Chen Li, KEYS 2009: 9-14, [PDF], [Talk Slides]
  6. Efficient Approximate Search on String Collections (Tutorial), Marios Hadjeleftheriou and Chen Li, VLDB 2009. [PDF], [Part I], [Part II].
  7. Efficient Approximate Search on String Collections (Tutorial), Marios Hadjieleftheriou, Chen Li, ICDE 2009, [PPT-Part1], [PPT-part2].
  8. Quality-Aware Retrieval of Data Objects from Autonomous Sources for Web-Based Repositories, Houtan Shirani-Mehr, Chen Li, Gang Liang, Michal Shmueli-Scheuer, ICDE 2008 (poster). [PDF] [Technical Report]
  9. Communication-Efficient Query Answering with Quality Guarantees in Client-Server Applications.  Michal Shmueli-Scheuer, Amitabh Chaudhary, Avigdor Gal, Chen Li.  WebDB 2007. [PDF]
  10. Quality-Driven Approximate Methods for GIS Data Integration. Ramaswamy Hariharan, Michal Schmueli-Scheuer, Chen Li, and Sharad Mehrotra. ACM GIS 2005, November 4-5th, 2005 Bremen, Germany. [PDF]
  11. Answering Aggregation Queries on Hierarchical Web Sites Using Adaptive Sampling. Foto Afrati, Paraskevas Lekeas, and Chen Li. Technical Report, UCI ICS, August 2005. A short version appears in CIKM'2005, 31st October - 5th November, 2005 Bremen, Germany.
  12. XGuard: A System for Publishing XML Documents without Information Leakage in the Presence of Data Inference. Xiaochun Yang, Chen Li, Ge Yu, and Lei Shi. Proc. of ICDE'2005, demo track, Tokyo, Japan, March 2005.
  13. RACCOON: A Peer-Based System for Data Integration and Sharing. Chen Li, Jia Li, Qi Zhong. Proc. of ICDE'2004, demo track. [PDF]
  14. Schema-Guided Wrapper Maintenance for Web-Data Extraction. Xiaofeng Meng, Dongdong Hu, Chen Li. To appear in the Fifth International Workshop on Web Information and Data Management (WIDM'03), New Orleans, Louisiana. [PDF] [PPT].
  15. A Supervised Visual Wrapper Generator for Web-Data Extraction. . Xiaofeng Meng, Haiyan Wang, Dongdong Hu, Chen Li. COMPSAC 2003: 657-662. [PDF]
  16. Using Constraints to Describe Source Contents in Data Integration Systems. Chen Li. IEEE Intelligent Systems 18(5): 49-53 (2003). [PDF]
  17. Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems. Chen Li. IJCAI 2003 workshop on Information Integration on the Web, August 2003, Acapulco, Mexico. [PDF], [PPT]
  18. Towards Perception-Based Image Retrieval. Edward Chang, Beitao Li, and Chen Li. Proceedings of IEEE Workshop on Content-based Access of Image and Video Libraries, p. 401-412, South Carolina, June, 2000. [PS] [PDF]
  19. Managing Parallel Disks for Continuous Media Data. Edward Chang, Chen Li, and Hector Garcia-Molina. A Book Chapter in Information Organization & Databases, p.107-120, Kluwer Publisher, 2000. [PS] [PDF]Answering Queries with Database Restrictions (Research Summary). Chen Li. Symposium on Abstraction, Reformulation and Approximation (SARA), pages 328 - 329, July, 2000, Horseshoe Bay (Lake LBJ), Texas. [PS] [PDF]
  20. I wrote a report of the Workshop on Data Mining in the Internet Age, which was held May 1 - 2, 2000, IBM Almaden Center, San Jose, California. [PS] [PDF]
  21. Capability Based Mediation in TSIMMIS. Chen Li, Ramana Yerneni, Vasilis Vassalos, Hector Garcia-Molina, Yannis Papakonstantinou, Jeffrey Ullman, Murty Valiveti. Proc. of ACM SIGMOD'98, demo track, pages 564 - 566, Seattle, WA, June, 1998. [PS] [PDF]
  22. HiComm -- A New Technique for Improving Communication Performance in Workstation Cluster. Chen Li, Weiqiang Zhuang, Meiming Shen, Dingxing Wang, Weimin Zheng, Proc. of International Workshop on Advanced Parallel Processing Technologies (APPT), October, 1995, Beijing, China.


Ph.D. Thesis

Query Processing and Optimization in Information-Integration Systems. Chen Li. Ph.D. Thesis, Computer Science Department, Stanford University, August, 2001.