CompSci 230: Winter 2011
Distributed Systems

Reading List

Virtual Time and Global State in Distributed Systems

  1. L. Lamport, "Time, Clocks and the Ordering of Events in a Distributed System", Communications of the ACM, 1978
  2. K. M. Chandy and L. Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems", ACM Transactions on Computer Systems, 1985
  3. D. Jefferson, "Distributed Simulation and the Time Warp Operating System", ACM Symposium on Operating Systems Principles, 1987
  4. F. Mattern, "Virtual Time and Global States of Distributed Systems", Proc. Workshop on Parallel and Distributed Algorithms, 1989
  5. C. Fetzer and F. Cristian, "An optimal internal clock synchronization algorithm", COMPASS 1995
  6. F. Cristian and C. Fetzer, "Fault-tolerant external clock synchronization", ICDCS 1995
  7. A. Kshemkalyanit, M. Raynalt and M. Singhals, "An introduction to snapshot algorithms in distributed computing"

Distributed Operating Systems

  • Remote Procedure Calls and Distributed Shared Memory
    1. A. Birrell, and B. Nelson, "Implementing remote procedure calls", ACM Transactions on Computer Systems, 1984
    2. P. G. Soares, "On remote procedure call", Proc. of the 1992 conference of the Centre for Advanced Studies on Collaborative research, 1992
    3. A. L. Ananda, B. H. Tay and E. K. Koh, "A survey of asynchronous remote procedure calls", SIGOPS Operating Systems Review, 1992
    4. A lecture of RPC, "http://www.cs.cf.ac.uk/Dave/C/node33.html"
  • Mutual Exclusion
    1. G. Ricart and A. Agrawala, "An optimal algorithm for mutual exclusion in computer networks Communications of the ACM, 1981
    2. L. Lamport, "Mutual Exclusion Problem": part1", "part 2", Journal of the ACM, 1986
    3. L. Lamport, "A Fast Mutual Exclusion Algorithm", ACM Transactions on Computer Systems, 1987
    4. K. Raymond, "A Tree Based Algorithm for Distributed Mutual Exclusion", ACM Transactions on Computer Systems, 1989
  • Leader Election
    1. H. Garcia-Molina, "Elections in a Distributed Computing Systems"
  • Distributed Deadlocks
    1. A. K. Elmagarmid, "A survey of distributed deadlock detection algorithms", ACM SIGMOD, 1986
    2. M. Singhal, "Deadlock detection in distributed systems", IEEE Computer, 1989
  • Distributed File Systems
    1. M. Satyanarayanan, "A Survey of Distributed File Systems", Annual Review of Computer Science, 1989
    2. B. Noble and M. Satyanarayanan, "An Empirical Study of a Highly Available File System", ACM Sigmetrics, 1994
    3. M. Spasojevic and M. Satyanarayanan, "An Empirical Study of a Wide-Area Distributed File System", ACM Transactions on Computer Systems, 1996
    4. J. Kubiatowicz, "OceanStore: An Architecture for Global-Scale Persistent Storage", ACM ASPLOS 2000
    5. J. Kubiatowicz, "The Google File System", ACM SOSP, 2003
  • Process Migration
    1. J. M. Smith, "A survey of process migration mechanisms", ACM SIGOPS Operating Systems, 1988
  • Processing and Load Balancing
    1. M. H. Willebeek-LeMair, A. P. Reeves, "Strategies for Dynamic Load Balancing on Highly Parallel Computers", IEEE Transactions on Parallel and Distributed Systems, 1993
    2. N. Venkatasubramanian, S. Ramanathan, "Load Management in Distributed Video Servers", ICDCS 1997
    3. V. Cardellini, M. Colajanni, "Dynamic Load Balancing on Web-server Systems", Journal IEEE Internet Computing, 1999
    4. T. Schnekenburger, "Load Balancing in CORBA: A Survey, Response to the Aggregated Computing RFI".
    5. Distributed Operating Systems
      1. W. J. Bolosky, R. P. Draves, R. P. Fitzgerald, C. W. Fraser, M. B. Jones, T. B. Knoblock and R. Rashid "Operating System Directions for the Next Millenium", Proc. of the 6th Workshop on Hot Topics in Operating Systems, 1997
      2. M. Rozier, V. Abrossimov, F. Armund et al, Overview of the Chorus Distributed Operating System
      3. Andrew S. Tanenbaum, M. Frans Kaashoek, Robert van Renesse, Henri E. Bal, The Amoeba Distributed Operating System - A Status Report
    6. Case Studies
      1. Distributed Computing Frameworks: DCE, "http://www.opengroup.org/dce/"
      2. Object-based Middleware: CORBA specification, www.omg.org
      3. Java
        1. Jini: "Architectural Overview", Sun Microsystems
        2. Java RMI: "Java RMI Tutorial"
        3. EJB: "Enterprise JavaBeans Technology", Sun Developer Network
        4. J2EE: "Overview", Sun Developer Network
      4. Service Oriented Architectures
        1. Web services: "Part of the lectures" by M. Fisher
        2. .NET: "The .NET Framework"
        3. SOAP: "Specification"

      Messaging and Group Communication in Distributed Systems

      1. D. Dolev and D. Malkhi, "The Transis Approach to High Availability Cluster Communication". Other Interesting Reading: Documentation and papers about Transis are also avaiable at "http://www.cs.huji.ac.il/labs/transis/
      2. Y. Amir, et al, "Group Communication as an Infrastructure for Distributed System Management", Proc. of the 3rd Workshop on Services in Distributed and Networked Environments, 1996
      3. Y. Amir, et al, "The Spread Wide Area Group Communication System".
      4. R. V. Renesse, K. P. Birman, and S. Maffeis, "Horus: A Flexible Group Communication System", Communications of the ACM, 1996
      5. S. Banerjee, B. Bhattacharjee and C. Kommareddy, "Scalable Application Layer Multicast", ACM SIGCOMM 2002
      6. Y. Amir, C. Nita-Rotaru, J. Stanton, G. Tsudik , "Secure Spread: An Integrated Architecture for Secure Group Communication", IEEE Transactions on Dependable and Secure Computing, 2005
      7. M. Deshpande, B. Xing, I. Lazardis, B. Hore, N. Venkatasubramanian and S. Mehrotra, "CREW: A Gossip-based Flash-Dissemination System", ICDCS 2006
      8. K. Kim, N. Venkatasubramanian and S. Mehrotra, "FaReCast: Fast, Reliable Application Layer Multicast for Flash Dissemination", ACM Middleware 2010

      Fault Tolerance and Reliability

    7. Consensus
      1. M. J. Fischer, N. A. Lynch, and M. S. Paterson, "Impossibility of Distributed Consensus with One Faulty Process", Journal of ACM, 1985
      2. D. Dolev, C. Dwork, L. Stockmeyer, "On the Minimal Synchronism Needed for Distributed Consensus", Journal of ACM, 1987
    8. Failure Detectors
      1. T. D. Chandra and S. Toueg, "Unreliable Failure Detectors for Reliable Distributed Systems", Journal of ACM, 1985
      2. T. D. Chandra, V. Hadzilacos and S. Toueg, "The Weakest Failure Detector for Solving Consensus", Journal of ACM, 1996
      3. M. K. Aguilera, W. Chen, and S. Toueg, "Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication, Cornel, 1997
    9. Replication
      1. H. S. Sandhu and S. Zhou, "Cluster-based file replication in large-scale distributed systems", ACM SIGMETRICS, 1992
      2. J. Gray, P. Helland, P. Neil and D. Shasha , "The dangers of replication and a solution", ACM SIGMOD, 1996
    10. Logging
      1. A. P. Sistla and J. L. Welch, "Efficient distributed recovery using message logging", ACM SIOPS, 1989

      Mobile Computing

      1. I. Giurgiu, O. Riva, D. Juric, I. Krivulev, and G. Alonso, Calling the Cloud: Enabling mobile phones as interfaces to cloud applications, Journal of ACM, 1985
      2. B. Chun, S. Ihm, P. Maniatis, M. Naik, A. Patti, CloneCloud: Elastic Execution between Mobile Device and Cloud, To appear in Proceedings of the 6th European Conference on Computer Systems (EuroSys 2011), April 2011.
      3. Continue updating...