About Me

I am a second-year PhD student at Department of Computer Science, University of California Irvine. Before that, I obtained my Master's degree in Software Engineering from School of Software, Tsinghua University in July, 2016, and my bachelor's degreee in Software Engineering from School of Software, Tongji University in July, 2013.

I'm working on database systems under supervision of Professor Michael J. Carey. Here is my CV.



LSM-Based Storage Management in AsterixDB. AsterixDB is a BDMS (Big Data Management System) with a rich feature set that sets it apart from other Big Data platforms. This project aims at improving the LSM-based storage management engine inside AsterixDB. Specially, we attempt to answer and solve the following issues around LSM-based storage systems: (1) how to build and manage secondary indexes efficiently for LSM stores, (2) how to define and enforce maximum ingestion throughput of LSM without long merge pauses, (3) how to organize memory and immutable data in a modern complex storage hierarchy, and (4) how to efficiently scale out as maximum-sized components are being accumulated.

Umzi: Unified Multi-Zone Indexing in HTAP Systems. The rising demands of real-time analytics emphasized the need for Hybrid Transactional and Analytical Processing (HTAP) systems which can handle both fast transactions and large-scale analytics concurrently. In these systems, efficient indexing is critical to enable fast lookups and transaction processing. However, indexing the distributed data in HTAP systems is highly non-trivial because of its scale and evolving nature. To address indexing challenges in HTAP systems, we propose Umzi, a multi-version and multi-zone LSM-like indexing method. In addition to flush and merge operations in traditional LSM-like indexes, Umzi dymamically evolves itself as data is moved from one zone to another.

PSpec-SQL. Organizations often share business data with third-parties to perform data analytics. However, one major concern is the privacy issue. In this project, we collaborated with researchers in Intel Labs China to build a privacy-integrated data analytic system. The system allows data owners to specify their data usage policies into a privacy specification, which are automatically checked against the submitted queries. The system is implemented on top of Spark-SQL, and the source code is available on github.

Large-Scale Software Model Inference. Softwares are often built without precise models. One approach to tackle this problem is automatically inferring software models from execution logs. However, real-world software systems often produce large amount of logs. In this project, I combined formal methods with modern parallel data processing techniques, and used MapReduce to tackle the problem of inferring software behavioral models from large logs.





Adapted from a template by Andreas Viklund