The Schark Research Group's Work for NASA REE
The Schark Research Group
Isaac D. Scherson
Luis Miguel Campos
About the REE Project
REE is the Remote Exploration and Experimentation Project for NASA. The goal of the REE Project is to enable the use of high performance parallel processing on-board spacecraft. Due to fundamental downlink-bandwidth limitations and speed-of-light delays, the only way to increase the scientific return for many spacecraft is to implement high performance computing on-board. This must be done under very severe power constraints, with computing systems in the range of several hundred MOPS per watt being desired. COTS-based (Consumer Off The Shelf) parallel processing clusters are the architecture of choice to achieve scaleable performance and to maximize cost-effectiveness. In addition, it is desireable to use a COTS operating system based on UNIX and to use MPI for communication between processors in order to insure that scientists' ground-developed applications can be ported to REE testbeds and spacecraft computers quickly and at low cost.
The REE Project is interested in research in areas that are needed to facilitate the development of fault-tolerant, COTS-based, parallel processors for use in space. Specific areas of interest include:
- developing test techniques to localize and characterize the effects of Single-Event Upsets under terrestrial or space-based radiation testing.
- development of synthetic workload models for use in characterizing REE system performance over a wide range of application-types and fault conditions
- fault-tolerant/real-time scheduling and resource management
- developing models to evaluate reliability, availability and performability of REE architectures while running applications
The Schark Group's Research for REE
Our research group has already made important contributions to the area of performance evaluation using benchmark sets. A novel technique and approach to benchmarking was proposed and allows the characterization of different features of the machine under test and of the quality of the benchmark used for the test. Our benchmarking strategy combines the principles of coarse grain and fine grain benchmarks- it measures fine grain architectural properties with real world workloads. The technique is referred to as machine characterization. In addition, the technique provides a figure of merit for the set of programs which compose the benchmark set. This figure of merit is dubbed compliance.
The problem of generating synthetic workloads and characterizing system performance is a natural extention to our work in machine characterization. Synthetic workloads can be generated by examining the kernels and primitive operations of programs used in REE computers and then recreating synthetic "dummy" operations to simulate the workload. The synthetic workloads can be directly evaluated in a precise fashion using performance vectors generated from a compliant bechmark set. The resultant data can then be used as input to our evaluation tool called the Simulator to yield an accurate and flexable representation of the performance of any given set of programs.
The Simulator is a tool developed and maintained by the Schark Research Group to accurately simulate massively parallel computer systems. It is very flexable and extemely customizable to be able to simulate virtually any set of programs on any given architecture. The Simulator, along with documentation, source code, and the API, are available online for download.