ACM Computing Surveys 28A(4), December 1996, http://www.cs.jhu.edu/~goodrich/pubs/io.html. Copyright © 1996 by the Association for Computing Machinery, Inc. See the permissions statement below.
Thomas H. Cormen
Abstract: We present the challenge of synthesizing a coherent model that combines the best aspects of the Parallel Disk Model and Bulk Synchronous Parallel models to develop and analyze algorithms that use parallel I/O, computation, and communication.
Categories and Subject Descriptors: B.3.2 [Memory Structures]: Design Styles - Mass storage (e.g., magnetic, optical), Primary memory; B.4.4 [Input/Output and Data Communications]: Performance Analysis and Design Aids - Formal models, Worst-case analysis; D.1.3 [Programming Techniques]: Concurrent Programming - Parallel programming; D.4.2 [Operating Systems]: Storage Management - Secondary Storage; D.4.4 [Operating Systems]: Communications Management - Input/Output; Message sending; Network communication; E.2 [Data Storage Representations]: Contiguous representations; E.5 [Files]: Sorting/searching; F.1.2 [Computation by Abstract Devices]: Modes of Computation - Parallelism and concurrency; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems - Sorting and searching;
General Terms: Algorithms, Design, Languages, Performance, Theory.
Additional Key Words and Phrases: I/O, external memory, secondary memory, communication, disk drive, parallel disks, sorting, Parallel Disk Model, Bulk Synchronous Parallel Model.
The past decade has seen the introduction of new and useful models for analyzing the computational and communication complexities of parallel algorithms, as well as useful models to measure I/O complexity. Yet no useful model measures all of computational, communication, and I/O complexity simultaneously.
Usefulness of a model implies two characteristics. First, the model should be realistic in the sense that its prediction for any algorithm should correspond to observed behavior of real systems. Second, the model should be simple enough to use and understand that one can design, analyze, and implement algorithms without having had extensive experience with the model.
We maintain that the Bulk Synchronous Parallel, or BSP, model [Valiant 1990] and LogP [Culler et al. 1993] models are useful for computational and communication complexities of parallel algorithms. The Parallel Disk Model, or PDM, [Vitter Shriver 1994] is useful for I/O complexity. The BSP and LogP models, however, ignore I/O, and the PDM does not account for computation or communication. Because we think of I/O as so much slower than computation or communication, the PDM apparently captures the most salient factor in the wall-clock time for algorithms that use parallel I/O.
What is apparent may not be the case, however.
Early experiences with algorithms implemented in the PDM indicate that although wall-clock time for a given algorithm follows the prediction of the model, the algorithms themselves are not I/O bound. Even with synchronous (i.e., blocking) I/O, the time spent waiting for I/O is typically less than 50% of the total wall-clock time. This behavior suggests that each parallel disk access gives rise to a given amount of computation and communication for a particular algorithm.
A sorting algorithm, for example, might repeatedly process "memoryloads" of data by performing a large parallel read, an in-memory sort across all processors, and a large parallel write. The time to perform the in-memory sorts might exceed the combined times of the parallel reads and writes, although it is roughly the same among the memoryloads. Typical algorithms developed for the PDM are similar to this hypothetical sorting algorithm in that they make repeated passes over the data and each pass repeatedly reads in a large amount of data, processes it, and writes out a large amount of data. Processing time (including communication) tends to be about the same each time for a given algorithm. Of course it varies from algorithm to algorithm.
If these early observations continue to hold as we gain more experience in implementing algorithms for the PDM, we will draw the conclusion that the PDM's predictive power is helpful (for analyzing I/O time) but limited (by omitting computation and communication).
Note, however, a striking similarity between the BSP model and typical PDM algorithms: bulk processing. In the BSP model, for example, communcation in a network of processors is considered to be the prime computational bottleneck; hence, a computation is organized as a sequence of rounds, where each round consists of each processor performing computations on its internal memory, followed by the sending and receiving of a limited number of messages. Rounds and communication in BSP algorithms are like memoryload processing and I/O, respectively, in PDM algorithms.
The challenge, therefore, is to synthesize a coherent model that combines the best aspects of the PDM and BSP models to develop and analyze algorithms that use parallel I/O, computation, and communication. Along with such a model, we need programming primitives to enable algorithm implementation under the model. These primitives must be portable and have performance matching the model's requirements.
We view developing such a model as a challenge because we believe that it will be difficult to simultaneously satisfy the two requirements that it be realistic yet easy to use. Our concern is that a realistic model may have so many parameters as to make it unusable. The PDM, without considering processing, has four parameters: problem size, memory size, disk block size, and disk count. The BSP model also has four parameters: problem size, processor count, latency of the network, and the "gap" time between consecutive messages in pipelined computations. Thus, some natural questions to ask include the following:
In summary, we think that it would be valuable to propose a bridging parallel computational model that incorporates computation, communication and I/O in an accurate and easy to use manner. We hope that discussions at the workshop will lead to such a model.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or firstname.lastname@example.org.