Cache-Aware Synchronization and Scheduling of Data-Parallel Programs for Multi-Core Processors

Multi-core (parallel) processors are becoming ubiquitous. The use of such systems is key to science, engineering, finance, and other major areas of the economy. However, increased applications performance on such systems can only be achieved with advances in mapping such applications to multi-core machines. This task is made more difficult by the presence of complex memory organizations which is perhaps the key bottleneck to efficient execution, and which was not previously addressed effectively. This research involves making the mapping of the program to the machine aware of the complexities of the memory-hierarchy in all phases of the compilation process. This will ensure a good fit between the application code and the actual machine and thereby guarantee much more effective utilization of the hardware (and thus efficient/fast execution) than was previously possible.

Modern processors (multi-cores) employ increasingly complex memory hierarchies. Management of such hierarchies is becoming critical to the overall success of the compilation process since effective utilization of the memory hierarchy dominates overall performance. This research develops a new cache-hierarchy-aware compilation and runtime system (i.e., including compilation, scheduling, and static/dynamic processor mapping of parallel programs). These tasks have one thing in common: they all need accurate estimates of data element (iteration, task) computation and memory access times which are currently beyond the (cache-oblivious) state-of-the-art. This research thus develops new techniques for iteration space partitioning, scheduling, and synchronization which capture the variability due to cache, memory, and conditional statement behavior and their interaction. This research will have a broad impact on the computer industry as it will allow the ubiquitous multi-core systems of the future to be efficiently exploited by parallel programs.