CS237
Distributed Systems Middleware

Course Materials

How to read a paper:

How to Read a Paper S. Keshav David R. Cheriton School of Computer Science, University of Waterloo Waterloo, ON, Canada

Reference Books:

Coulouris et al Distributed Systems: Concepts & Design,4th ed. ISBN: 0-321-26354-5.

Tanenbaum & van Steen Distributed Systems: Principles and Paradigms, 2nd ed.
ISBN: 0-132-39227-5.

M. Ben-Ari Principles of Concurrent and Distributed Programming Prentice-Hall International
Series in Computer Science, 1990.

Sape Mullender Distributed Systems Second Edition, Addison-Wesley, 1998.

Haggit Attiya and Jennifer Welch Distributed Computing: Fundamentals, Simulations and Advanced Topics
McGraw Hill, 1998.

Robert Orfali and Dan Harkey Client/Server Programming with Java and CORBA, Second Edition
John Wiley and Sons Inc., 1998

Middleware and Distributed Systems Fundamentals:

Middleware David E. Bakken: Encyclopedia of Distributed Computing, Kluwer Academic Publisher.
Managing Complexity: Middleware Explained Andrew T. Campbell, Geoff Coulson, and Michael E. Kounavis IT Professional, Vol. 1, No. 5, September/October 1999.
Middleware a model for distributed system services Philip A. Bernstein; Commun. ACM 39, 2 (Feb. 1996), Pages 86 - 98

Virtual Time and Global State in Distributed Systems:

L. Lamport, "Time, Clocks and the Ordering of Events in a Distributed System", Communications of the ACM, 1978
K. M. Chandy and L. Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems", ACM Transactions on Computer Systems, 1985
D. Jefferson, "Distributed Simulation and the Time Warp Operating System", ACM Symposium on Operating Systems Principles, 1987.
F. Mattern, "Virtual Time and Global States of Distributed Systems", Proc. Workshop on Parallel and Distributed Algorithms, 1989
C. Fetzer and F. Cristian, "An optimal internal clock synchronization algorithm", COMPASS 1995
F. Cristian and C. Fetzer, "Fault-tolerant external clock synchronization", ICDCS 1995
A. Kshemkalyanit, M. Raynalt and M. Singhals, "An introduction to snapshot algorithms in distributed computing"
C.l Fidge, Timestamps in message-passing systems that preserve the partial ordering , Australian Computer Sci. Comm. 10 (I) (February 1988) 56-66.
C.l Fidge, Fundamentals of distributed system observation , IEEE Software 13 (6) (November 1996) 77-83.
Raynal M. and Singhal M., Logical time: Capturing causality in distributed systems, Computer, vol. 29, pp. 49-56, 1996.
C.l Fidge, A limitation of vector timestamps for reconstructing distributed computations, in: Elsevier Science, 1998, Information Processing Letters 87-91.
Mukesh Singhal and Ajay Kshemkalyani, An efficient implementation of vector clocks in Elsevier Science publishers.
Facebook's Cassandra uses synchronized clocks for its 'Last Write Wins' policy for conflict resolution
Spanner: Google’s Globally-Distributed Database estimates worst-case clock drift.
LinkedIn's Project Voldemort uses vector clocks for versioning, conflict resolution, and repairing replicas.

Distributed Operating Systems:

Remote Procedure Calls and Distributed Shared Memory:
A. Birrell, and B. Nelson, "Implementing remote procedure calls", ACM Transactions on Computer Systems, 1984
P. G. Soares, "On remote procedure call", Proc. of the 1992 conference of the Centre for Advanced Studies on Collaborative research, 1992
A. L. Ananda, B. H. Tay and E. K. Koh, "A survey of asynchronous remote procedure calls", SIGOPS Operating Systems Review, 1992
A lecture of RPC, "http://www.cs.cf.ac.uk/Dave/C/node33.html"

Mutual Exclusion:
G. Ricart and A. Agrawala, "An optimal algorithm for mutual exclusion in computer networks", Communications of the ACM, 1981
L. Lamport, "Mutual Exclusion Problem": part1", "part 2", Journal of the ACM, 1986
L. Lamport, "A Fast Mutual Exclusion Algorithm", ACM Transactions on Computer Systems, 1987
K. Raymond, "A Tree Based Algorithm for Distributed Mutual Exclusion", ACM Transactions on Computer Systems, 1989

Leader Election:
H. Garcia-Molina, "Elections in a Distributed Computing Systems"

Distributed Deadlocks:
A. K. Elmagarmid, "A survey of distributed deadlock detection algorithms", ACM SIGMOD, 1986
M. Singhal, "Deadlock detection in distributed systems", IEEE Computer, 1989

Distributed File Systems:
M. Satyanarayanan, "A Survey of Distributed File Systems", Annual Review of Computer Science, 1989
B. Noble and M. Satyanarayanan, "An Empirical Study of a Highly Available File System", ACM Sigmetrics, 1994
M. Spasojevic and M. Satyanarayanan, "An Empirical Study of a Wide-Area Distributed File System", ACM Transactions on Computer Systems, 1996
J. Kubiatowicz, "OceanStore : An Architecture for Global-Scale Persistent Storage ", ACM ASPLOS 2000
J. Kubiatowicz, "The Google File System", ACM SOSP, 2003.

Process Migration:
J. M. Smith, "A survey of process migration mechanisms", ACM SIGOPS Operating Systems, 1988.
A Barak, O Laden, Y Yarom - Citeseer, "The NOW MOSIX and its preemptive process migration scheme", 1995.

Processing and Load Balancing:
M. H. Willebeek-LeMair, A. P. Reeves, "Strategies for Dynamic Load Balancing on Highly Parallel Computers", IEEE Transactions on Parallel and Distributed Systems, 1993
N. Venkatasubramanian, S. Ramanathan, "Load Management in Distributed Video Servers", ICDCS 1997
V. Cardellini, M. Colajanni, "Dynamic Load Balancing on Web-server Systems", Journal IEEE Internet Computing, 1999
T. Schnekenburger, "Load Balancing in CORBA: A Survey, Response to the Aggregated Computing RFI".

Distributed Operating Systems:
W. J. Bolosky, R. P. Draves, R. P. Fitzgerald, C. W. Fraser, M. B. Jones, T. B. Knoblock and R. Rashid "Operating System Directions for the Next Millenium", Proc. of the 6th Workshop on Hot Topics in Operating Systems, 1997
M. Rozier, V. Abrossimov, F. Armund et al, Overview of the Chorus Distributed Operating System
Andrew S. Tanenbaum, M. Frans Kaashoek, Robert van Renesse, Henri E. Bal, The Amoeba Distributed Operating System - A Status Report

Messaging Technologies:
A Case for Message Oriented Middleware, G. Banavar et al.
D. Dolev and D. Malkhi, " The Transis Approach to High Availability Cluster Communication", Other Interesting Reading: Documentation and papers about Transis are also avaiable at "http://www.cs.huji.ac.il/labs/transis/
Y. Amir, et al, " Group Communication as an Infrastructure for Distributed System Management", Proc. of the 3rd Workshop on Services in Distributed and Networked Environments, 1996
Y. Amir, et al, " The Spread Wide Area Group Communication System
R. V. Renesse, K. P. Birman, and S. Maffeis, " Horus: A Flexible Group Communication System", Communications of the ACM, 1996
S. Banerjee, B. Bhattacharjee and C. Kommareddy, " Scalable Application Layer Multicast", ACM SIGCOMM 2002
Y. Amir, C. Nita-Rotaru, J. Stanton, G. Tsudik, " Secure Spread: An Integrated Architecture for Secure Group Communication", IEEE Transactions on Dependable and Secure Computing, 2005

XML Based Middleware
The Many Faces of Publish/Subscribe, PATRICK TH. EUGSTER

Middleware Frameworks:

Distributed Computing Frameworks:
DCE
The DCE security service, Hewlett-Packard Journal, 1995.
MapReduce: simplified data processing on large clusters
Hadoop: The Hadoop Distributed File System: Architecture and Design
Yahoo! Hadoop Tutorial

Object-based Middleware:

CORBA specification, www.omg.org
RT CORBA: Realt time CORBA
Fault tolerance CORBA: A Fault Tolerance Framework for CORBA
ZEN: Optimizing the ORB Core to Enhance Real-time CORBA Predictability and Performance
Data Access and Integration: ODBC/JDBC
Java Jini: "Architectural Overview", Sun Microsystems
Java RMI: "Java RMI Tutorial"
EJB: "Enterprise JavaBeans Technology", Sun Developer Network
J2EE: "Overview", Sun Developer Network

Service Oriented Architectures and Web Services:

Web services: "Part of the lectures" by M. Fisher
.NET: "The .NET Framework"
SOAP Web Service: (http://www.w3.org/TR/soap/)
A comparison of SOAP and REST implementations of a service based interaction independence middleware framework
SOAP-binQ: high-performance SOAP with continuous quality management
SOA: Service-Oriented Computing: State of the Art and Research Challenges
Restful Web-Service: Original work is done by Roy Fielding at UCI as his Ph.D thesis: (http://roy.gbiv.com/vita.html)
Principled design of the modern Web architecture

Middleware queues for job submission, messaging, etc.

Cloud Computing, Mobile Cloud Computing Platforms:

I. Giurgiu, O. Riva, D. Juric, I. Krivulev, and G. Alonso, Calling the Cloud: Enabling mobile phones as interfaces to cloud applications, Journal of ACM, 1985.
B. Chun, S. Ihm, P. Maniatis, M. Naik, A. Patti, CloneCloud: Elastic Execution between Mobile Device and Cloud, To appear in Proceedings of the 6th European Conference on Computer Systems (EuroSys 2011), April 2011.
Above the Clouds: A Berkeley View of Cloud Computing: Technical Report No. UCB/EECS-2009-28.
Y. Wen, W. Zhang,and H. Luo, "Energy Optimal Mobile Application Execution: Taming Resource-Poor Mobile Devices with Cloud Clones", In IEEE INFOCOM 2012.
Michael P. Papazoglou, "Cloud Blueprints for Integrating and Managing Cloud Federations", In Springer Software Service and Application Engineering, 2012.
Tobias Kurze, Markus Klemsy, David Bermbachy, Alexander Lenkz, Stefan Taiy and Marcel Kunze, "Cloud Federation".
"Towards Characterizing Cloud Backend Workloads: Insights from Google Compute Clusters".
"CloudNaaS: A Cloud Networking Platform for Enterprise Applications".
"Effects of virtualization and cloud computing on data center networks".
"The Hadoop Distributed File System: Architecture and Design".
"MapReduce: Simplified Data Processing on Large Clusters".
"The Case for Enterprise-Ready Virtual Private Clouds".
What is (isn't) Google App Engine?, https://developers.google.com/appengine/training/intro/whatisgae
Introducing Azure, http://azure.microsoft.com/en-us/documentation/articles/fundamentals-introduction-to-azure/

Fault Tolerance and Reliability:

Consensus
M. J. Fischer, N. A. Lynch, and M. S. Paterson, " Impossibility of Distributed Consensus with One Faulty Process", Journal of ACM, 1985
D. Dolev, C. Dwork, L. Stockmeyer, "On the Minimal Synchronism Needed for Distributed Consensus", Journal of ACM, 1987.

Failure Detectors
T. D. Chandra and S. Toueg, "Unreliable Failure Detectors for Reliable Distributed Systems", Journal of ACM, 1985
T. D. Chandra, V. Hadzilacos and S. Toueg, "The Weakest Failure Detector for Solving Consensus", Journal of ACM, 1996
M. K. Aguilera, W. Chen, and S. Toueg, "Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication", Cornel, 1997

Replication
H. S. Sandhu and S. Zhou, "Cluster-based file replication in large-scale distributed systems", ACM SIGMETRICS, 1992
J. Gray, P. Helland, P. Neil and D. Shasha , "The dangers of replication and a solution", ACM SIGMOD, 1996

Logging
A. P. Sistla and J. L. Welch, "Efficient distributed recovery using message logging", ACM SIOPS, 1989