- Khanh Nguyen, Lu Fang, Christian Navasca, Guoqing Xu, Brian Demsky, and Shan Lu.
"Skyway: Connecting Managed Heaps in Distributed Big Data Systems",
the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Williamsburg, VA, March 24-28, 2018.
(Acceptance rate: 56/319, 18%)
- Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, and Guoqing Xu.
"Understanding and Combating Memory Bloat in Managed Data-Intensive Systems",
ACM Transactions on Software Engineering and Methodology (TOSEM),
Accepted for publication
- Khanh Nguyen, Lu Fang, Guoqing Xu, Brian Demsky, Shan Lu, Sanazsadat Alamian, and Onur Mutlu.
"Yak: A High-Performance Big-Data-Friendly Garbage Collector",
the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Savannah, GA, November 2-4, 2016.
(Acceptance rate: 47/267, 18%)
- Khanh Nguyen, Lu Fang, Guoqing Xu, and Brian Demsky.
"Speculative Region-based Memory Management for Big Data Systems",
the 8th Workshop on Programming Languages and Operating Systems (PLOS), Monterey, CA, October 4, 2015.
(Acceptance rate: 7/16, 44%)
- Lu Fang, Khanh Nguyen, Guoqing Xu, Brian Demsky, and Shan Lu.
"Interruptible Tasks: Treating Memory Pressure As Interrupts for Highly Scalable Data-Parallel Programs",
the 25th ACM Symposium on Operating Systems Principles (SOSP), Monterey, CA, October 4-7, 2015.
(Acceptance rate: 30/186, 16%)
- Khanh Nguyen, Kai Wang, Yingyi Bu, Lu Fang, Jianfei Hu, and Guoqing Xu.
"Facade: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications",
the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Istanbul, Turkey, March 14-18, 2015.
(Acceptance rate: 48/287, 17%)
- Khanh Nguyen, and Guoqing Xu.
"Cachetor: Detecting Cacheable Data to Remove Bloat",
Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), Saint Petersburg, Russia, August 18-26, 2013.
(Acceptance rate: 51/251, 20%)
- Skyway: Connecting managed heaps in distributed Big Data systems
- Abstract: Managed languages such as Java and
Scala while are prevalently used in the development of
large-scale distributed systems, incur non-trivial
overheads. For instance, when performing data transfer
across machines, a task frequently conducted in a Big Data system,
the source system needs to serialize
objects while the receiver must perform
deserialization. This process is both
performance-inefficient and labor-intensive. Skyway is a
JVM-based technique that can directly connect managed
heaps of different JVM processes. Under Skyway, objects
in the source heap can be directly written into a remote
heap. Skyway provides performance benefit to any
JVM-based system by completely eliminating the need (1)
of invoking serialization/deserialization functions,
thus saving CPU time, and (2) of requiring developers
to hand-write serialization functions.
- The paper was accepted in ASPLOS'18. [PDF].
- A poster of this work was presented in the Computer Science Research Showcase at UC Irvine in Spring 2017.
- This work was presented in the SoCal PLS Workshop in Fall 2017 at UC Riverside.
- BigSAT: SAT solving at scale
- Abstract: The mainstream SAT solvers are all
based on the DPLL (David-Putnam-Logemann-Loveland)
algorithm that is memory efficient but difficult to
parallelize. To build a truly scalable SAT solver, we
shift our focus to a long overlooked algorithm, the DP
(David-Putnam) procedure, that performs explicit
resolutions over clauses. DP consumes more memory but
exhibits data parallelism, lending itself to systems
optimizations that achieve efficiency and scalability by
utilizing large computing resources.
- This work is under development. We will report the result soon.
- Yak: a high-performance Big-Data-friendly garbage collector
- Abstract: Big Data applications often are
written in managed languages and have a clear logical
distinction between a control path and a data path. These
two paths follow different heap usage
patterns. State-of-the-art garbage collectors while are
effective in the control path, are fundamentally limited
in the data path, where most objects are created. This
project aims to provide an automated and systematic
JVM-based solution while requiring minimal user
effort. Our results shows that we can effectively reduce
the GC time and improve end-to-end performance
- The paper was accepted in OSDI'16. [PDF].
- The poster presented in PLDI'16 SRC.
- A poster of this work was presented in the Computer Science Research Showcase at UC Irvine in Spring 2016.
- This work was presented in the SoCal PLS Workshop in Fall 2016 at UC Irvine.
- Interruptible Tasks: Treating memory pressure as interrupts
- Abstract: Real-world data-parallel tasks
developed in managed languages such as Java and C#
commonly suffer from great memory pressure. This leads to
excessive GC effort and out-of-memory errors,
significantly hurting system performance and
scalability. Interruptible task is a new type of
data-parallel task that can be interrupted upon memory
pressure—with part or all of their used memory
reclaimed—and resumed when the pressure goes
away. Experiments on two state-of-the-art platforms Hadoop
and Hyracks show the effectiveness of the technique. All
13 reproduced real-world out-of-memory problems reported
on Hadoop are solved using our system. A second set of
experiments with 5 already well-tuned programs in Hyracks
on datasets of different sizes shows that the our versions
are 1.5–3× faster and scale to 3–24×+ larger datasets.
- The paper was accepted in SOSP'15. [PDF] [1-column PDF].
- The poster presented in SOSP'15 and an earlier version in ASPLOS'15 SRC.
- Facade: a compiler and runtime support for (almost) object-bounded Big Data applications
- Abstract: Popular Big Data platforms use
managed object-oriented programming language such as Java
due to its quick development cycle and rich community
resources. However, when object-orientation meets Big
Data, the cost of the managed runtime system is
significantly magnified and becomes a
scalability-prohibiting bottleneck. This project aims to
remove this bottleneck in Big Data applications by
introducing the novel compiler framework called
Facade. Facade can automatically generate highly efficient
data manipulation code by transforming the data path of an
existing Big Data application. The key to efficiency is
that in the generated code, the number of heap objects
created for data types in each thread is statically
bounded regardless of how much data an application has to
process. Using Facade, one can obtain significantly
reduced memory management costs and improved scalability.
- The paper was accepted in ASPLOS'15. [PDF].
- Using this project, I competed and won the third prize (bronze medal) in the ACM Student Research Competition in PLDI'14.
The poster presented in PLDI'14 is here.
- This work was presented in the SoCal PLS Workshop in Spring 2014 at Harvey Mudd College.
- Cachetor: Detecting cacheable data to remove bloat
- Abstract: Modern object-oriented software
commonly suffers from runtime bloat that significantly
affects its performance and scalability. One pattern of
bloat is the work repeatedly done to compute the same data
values. Cachetor is a novel runtime profiling tool that is
effective in exposing caching opportunities and
substantial performance gains can be achieved by modifying
the program to cache the reported data.
- The paper was accepted in ESEC/FSE'13. [PDF]
- This work was presented in the SoCal PLS Workshop in Fall 2012 at UC Riverside.
Asides from my research and programming, I am interested in
individual taxation laws. I volunteered in the IRS-sponsored
VITA program, which helps low-income individuals and families
prepare their tax returns, in 2011-2014.
In 2015, I was the UCI liaison and project manager for the
Orange County United Way's Free Tax Preparation Campaign. I
managed the development of an online scheduling and volunteer
management platform with several
I am the research group's coordinator, working with
prospective summer interns. I'm proud to say that in the
school of ICS, our group is one of the few groups that have
been accepting high school students and undergraduates for
summer internship for a number of years. The experience has
been very positive for the interns and the group.
Starting Fall 2017, I am supported by a Google PhD Fellowship. Thanks, Google!
I greatly appreciate the Donald Bren School of Information
and Computer Sciences for awarding me the Dean's Fellowship
with four years of support for my study. My gratitude also goes to
organizations such as NSF, ACM, USENIX, ACM SIGSOFT, ACM
SIGPLAN (PAC), and ACM SIGOPS for their travel grants.