My research focuses on merging the areas of systems and programming languages. In particular, I am interested in how graph systems could benefit scalability problems in program analysis. My advisor is Prof. Harry Xu.
There is more than a decade-long history of using static analysis to find bugs in systems such as Linux. Most existing static analyses developed for these systems are simple checkers that find bugs based on pattern matching. Despite many sophisticated interprocedural analyses, few of them have been employed to improve checkers for systems code due to their complex implementations and their poor scalability and parallelizability.
In this project, we revisit the scalability problem of interprocedural static analysis from a “Big Data” perspective. That is, we turn Big Code analysis into Big Data analytics and leverage novel data processing techniques to solve this traditional programming language problem. We develop Graspan, a disk-based parallel graph system that uses an edge-pair centric computation model to compute dynamic transitive closures on large program graphs.
We implement fully context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations. Moreover, we show that these analyses can be used to augment existing checkers; these augmented checkers uncovered 132 new NULL pointer bugs and 1308 unnecessary NULL tests, as well as reported 401 fewer false positives in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
Software restructuring techniques based on hierarchical agglomerative clustering (HAC) algorithms have been widely used to restructure large modules with low cohesion into smaller modules with high cohesion. These techniques generate clustering trees of modules, that are sliced at different cut-points in order to obtain desired restructurings. Choosing appropriate cut-points has always been a difficult problem in clustering. Previous HACs generate clustering trees that have large number of cut-points. Moreover, many of those cut-points return clusters of which only a few lead to meaningful restructurings.
We designed a new hierarchical clustering technique for restructuring software at the function level that generates clustering trees with lower number of cut-points, which yield a lower number of redundant clusters. Our technique yielded good restructurings in comparison with four previous HAC techniques, when applied on real-life industrial programs.
* Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code, Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, Ardalan Amiri Sani, Architectural Support for Programming Languages and Operating Systems (ASPLOS '17), Xi'an, ACM, 2017 (to appear)
* From query to usable code: an analysis of stack overflow code snippets, Di Yang, Aftab Hussain, Cristina Videira Lopes, 13th International Workshop on Mining Software Repositories (MSR '16'), ACM, Austin, 2016
* L-Shaped Drawings of Series-Parallel Graphs, Md. Iqbal Hossain, Shaheena Sultana, Aftab Hussain, Nazmun Nessa Moon, Md. Saidur Rahman, Dhaka, 2013
* A new hierarchical clustering technique for restructuring software at the function level, Aftab Hussain, Md. Saidur Rahman, 6th India Software Engineering Conference (ISEC '13'), ACM, New Delhi, 2013