Yingyi Bu
Office: Room 2064 DBH
Tel: 949-8789-0298
Department of Computer Science
University of California, Irvine
Irvine, CA 92697
Email: yingyib@ics.uci.edu
I am now a Ph.D. student in Department of Computer Science, University of California, Irvine, working with Prof. Michael J. Carey. Before that, I got a B.Sc. from Nanjing University, China, an M.Phil. from The Chinese University of Hong Kong, and also fulltimely worked in Microsoft SQL Server group .
News:
- 2013/06/11 I'm awarded the 2013 Google Fellowship in Structured Data!
- 2013/06/06 AsterixDB 0.8.0 and Pregelix 0.2.6 are released!
- 2013/04/06 We are very excited to release our AsterixDB alpha!
- 2013/03/27 We are happy to announce the second release of Pregelix with quite a few new features!
- 2013/03/26 Our paper on the bloat-aware design for Big Data applications is accetped to SIGPLAN ISMM'13!
- 2013/03/25 I'm selected as a Facebook fellowship finalist.
- 2012/10/28 We are happy to announce the first release of Pregelix -- an open-source Big Graph analytics system!
- 2012/06/26 Start my summer inernship in Google Research, a lot of fun!
- 2012/03/01 Interested in Big Data machine learning? Check out our fresh and exciting technical report!
Research Interests
My primary area of research interest is in large scale data management systems, especially data-intensive computing systems.
Current projects:
- Pregelix. Pregelix is an open-source implementation of Google's Pregel programming model. We architect the Pregel programming model on top of the Hyracks general-purpose data-parallel execution engine. This leads to a much simpler design and implementation than building from scratch. Pregelix also corresponds to the Pregel part of our technical report!
- AsterixDB. We are working towards an open source data-intensive computing platform, with new technologies for ingesting, storing, managing, indexing, querying, analyzing, and subscribing intensive semi-structured data.
Past projects:
- HaLoop. In HaLoop (with Bill, Magda, and Michael), we designed and implemented a modified version of the Hadoop MapReduce framework for efficiently support data-intensive iterative data analysis. A paper describing HaLoop system is in VLDB 2010. HaLoop also got sponsored by Yahoo! KSC Program !
- In the past, I also worked on stream monitoring (a paper in KDD 2009 and a paper in SDM 2007), and data privacy (a paper in VLDB 2008).
Publications (dlbp entry)
A Bloat-Aware Design for Big Data Applications [PDF][PPT]
(Open-source systems using our design paradigm: AsterixDB, Hyracks, Pregelix )
(Open-source systems using our design paradigm: AsterixDB, Hyracks, Pregelix )
In Proceedings of the 2013 ACM SIGPLAN International Symposium on Memory Management (ISMM 2013)
Seattle, WA, June 20-21, 2013.
Declarative Systems for Large-Scale Machine Learning [PDF]
IEEE Data Engineering Bulletin. Volume 35, Number 2, June 2012
The HaLoop Approach to Large-Scale Iterative Data Analysis [PDF][Implementation]
The VLDB Journal (VLDBJ), Volume 21, Number 2, April 2012.
Combined Static and Dynamic Automated Test Generation [PDF][Implementation]
Sai Zhang, David Suff, Yingyi Bu, Michael D. Ernst
In Proceedings of the 11th International Symposium on Software Testing and Analysis
(ISSTA
2011)
Toronto, ON, Canada, July 17 - 21, 2011 (acceptance rate: 35/121=28.9%)
HaLoop: Efficient Iterative Data Processing on Large Clusters [PDF][PPT][Talk
in Berkeley][Implementation]
(selected for Best of VLDB 2010 issue of VLDB Journal )
(selected for Best of VLDB 2010 issue of VLDB Journal )
In Proceedings of the 36th International Conference on Very Large Data Bases (VLDB
2010)
Singapore, 11-17 September, 2010. (Acceptance Rate: 33/204 = 16.1%)
In Proceedings of the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD
2009)
Paris, France, June 28-July 1, 2009. (Acceptance Rate: 105/537 = 19.6%)
In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB 2008)
Auckland, New Zealand on 24-30 Aug, 2008. (Acceptance Rate: 46/273 = 16.8%)
WAT: Finding Top-K Discords in Time Series Database
[PDF][Source
Code]
In Proceedings of the 2007 SIAM International Conference on Data Mining (SDM
2007)
Minneapolis, MN, USA, April 26-28, 2007. (Acceptance Rate: 25%)
System Demos
ASTERIX: An Open Source System for "Big Data" Management and Analysis
[PDF]
In Proceedings of the 38th International Conference on Very Large Data Bases (VLDB
2012)
İstanbul, Turkey, August 27-31, 2012.
Honors
- 2013-2015 Google Fellowship in Structured Data
- 2013-2014 Facebook Fellowship Finalist
- 2010 Yahoo! Key Scientific Challenage Award