CS237 Distributed Systems Middleware

Lecture Notes

Lecture 1: Middleware and Distributed Systems Fundamentals
Lecture 2: Virtual Time and Global States in Distributed Systems.
Lecture 3: Distributed Operating Systems Concepts.
Lecture 4: Distributed OS Case Studies (Amoeba).
Lecture 5: Messaging Middlewares, Messaging Group, Distributed Pub/Sub
Lecture 6: Fault-Tolerance,Middleware Frameworks: DCE
Midterm: Midterm Review, Sample
Lecture 7: Middleware Frameworks: CORBA
Lecture 8: Middleware Frameworks Java-based Technologies, Jini, EJB,
Lecture 9: Middleware Frameworks XML, Web Services, Service Oriented Architectures
Lecture 10: Middleware for Cloud Computing
Middleware for QoS-Enabled Environments
Middleware for Embedded Environments
Middleware for Secure Environments
Middleware for Mobile and Ubiquitous Environments

Course Reading Materials

How to read a paper:

How to Read a Paper S. Keshav David R. Cheriton School of Computer Science, University of Waterloo Waterloo, ON, Canada

Reference Books:

Coulouris et al Distributed Systems: Concepts & Design,4th ed. ISBN: 0-321-26354-5.
Tanenbaum & van Steen Distributed Systems: Principles and Paradigms, 2nd ed. ISBN: 0-132-39227-5.
Ben-Ari Principles of Concurrent and Distributed Programming Prentice-Hall International Series in Computer Science, 1990.
Sape Mullender Distributed Systems Second Edition, Addison-Wesley, 1998.
Haggit Attiya and Jennifer Welch Distributed Computing: Fundamentals, Simulations and Advanced Topics
McGraw Hill, 1998.
Robert Orfali and Dan Harkey Client/Server Programming with Java and CORBA, Second Edition John Wiley and Sons Inc., 1998

Middleware and Distributed Systems Fundamentals (NO REVIEW REQUIRED):

Middleware David E. Bakken: Encyclopedia of Distributed Computing, Kluwer Academic Publisher.
Managing Complexity: Middleware Explained Andrew T. Campbell, Geoff Coulson, and Michael E. Kounavis IT Professional, Vol. 1, No. 5, September/October 1999.
Middleware a model for distributed system services Philip A. Bernstein; Commun. ACM 39, 2 (Feb. 1996), Pages 86 - 98

Virtual Time and Global State in Distributed Systems (review required, any TWO papers):

Lamport, "Time, Clocks and the Ordering of Events in a Distributed System", Communications of the ACM, 1978
M. Chandy and L. Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems", ACM Transactions on Computer Systems, 1985
Jefferson, "Distributed Simulation and the Time Warp Operating System", ACM Symposium on Operating Systems Principles, 1987.
Mattern, "Virtual Time and Global States of Distributed Systems", Proc. Workshop on Parallel and Distributed Algorithms, 1989
Fetzer and F. Cristian, "An optimal internal clock synchronization algorithm", COMPASS 1995
Cristian and C. Fetzer, "Fault-tolerant external clock synchronization", ICDCS 1995
Kshemkalyanit, M. Raynalt and M. Singhals, "An introduction to snapshot algorithms in distributed computing"
C.l Fidge, Timestamps in message-passing systems that preserve the partial ordering , Australian Computer Sci. Comm. 10 (I) (February 1988) 56-66.
C.l Fidge, Fundamentals of distributed system observation, IEEE Software 13 (6) (November 1996) 77-83.
Raynal M. and Singhal M., Logical time: Capturing causality in distributed systems, Computer, vol. 29, pp. 49-56, 1996.
C.l Fidge, A limitation of vector timestamps for reconstructing distributed computations, in: Elsevier Science, 1998, Information Processing Letters 87-91.
Mukesh Singhal and Ajay Kshemkalyani, An efficient implementation of vector clocks in Elsevier Science publishers.
Facebook's Cassandra uses synchronized clocks for its 'Last Write Wins' policy for conflict resolution
Spanner: Google’s Globally-Distributed Database estimates worst-case clock drift.
LinkedIn's Project Voldemort uses vector clocks for versioning, conflict resolution, and repairing replicas.

Distributed Operating Systems (review required: any TWO papers):

Remote Procedure Calls and Distributed Shared Memory:

Birrell, and B. Nelson, "Implementing remote procedure calls", ACM Transactions on Computer Systems, 1984
G. Soares, "On remote procedure call", Proc. of the 1992 conference of the Centre for Advanced Studies on Collaborative research, 1992
L. Ananda, B. H. Tay and E. K. Koh, "A survey of asynchronous remote procedure calls", SIGOPS Operating Systems Review, 1992
A lecture of RPC, "http://www.cs.cf.ac.uk/Dave/C/node33.html"

Mutual Exclusion:

Ricart and A. Agrawala, "An optimal algorithm for mutual exclusion in computer networks", Communications of the ACM, 1981
Lamport, "Mutual Exclusion Problem":part1", "part 2", Journal of the ACM, 1986
Lamport, "A Fast Mutual Exclusion Algorithm", ACM Transactions on Computer Systems, 1987
Raymond, "A Tree Based Algorithm for Distributed Mutual Exclusion", ACM Transactions on Computer Systems, 1989

Leader Election:

Garcia-Molina, "Elections in a Distributed Computing Systems"

Distributed Deadlocks:

K. Elmagarmid, "A survey of distributed deadlock detection algorithms", ACM SIGMOD, 1986
Singhal, "Deadlock detection in distributed systems", IEEE Computer, 1989

Distributed File Systems:

Satyanarayanan, "A Survey of Distributed File Systems", Annual Review of Computer Science, 1989
Noble and M. Satyanarayanan, "An Empirical Study of a Highly Available File System", ACM Sigmetrics, 1994
Spasojevic and M. Satyanarayanan, "An Empirical Study of a Wide-Area Distributed File System", ACM Transactions on Computer Systems, 1996
Kubiatowicz, "OceanStore : An Architecture for Global-Scale Persistent Storage", ACM ASPLOS 2000
Kubiatowicz, "The Google File System", ACM SOSP, 2003.

Process Migration

M. Smith, "A survey of process migration mechanisms", ACM SIGOPS Operating Systems, 1988.
A Barak, O Laden, Y Yarom - Citeseer, "The NOW MOSIX and its preemptive process migration scheme", 1995.

Processing and Load Balancing:

H. Willebeek-LeMair, A. P. Reeves, "Strategies for Dynamic Load Balancing on Highly Parallel Computers", IEEE Transactions on Parallel and Distributed Systems, 1993
Venkatasubramanian, S. Ramanathan, "Load Management in Distributed Video Servers", ICDCS 1997
Cardellini, M. Colajanni, "Dynamic Load Balancing on Web-server Systems", Journal IEEE Internet Computing, 1999
Schnekenburger, "Load Balancing in CORBA: A Survey, Response to the Aggregated Computing RFI".

Distributed Operating Systems:

J. Bolosky, R. P. Draves, R. P. Fitzgerald, C. W. Fraser, M. B. Jones, T. B. Knoblock and R. Rashid "Operating System Directions for the Next Millenium", Proc. of the 6th Workshop on Hot Topics in Operating Systems, 1997
Rozier, V. Abrossimov, F. Armund et al,Overview of the Chorus Distributed Operating System
Andrew S. Tanenbaum, M. Frans Kaashoek, Robert van Renesse, Henri E. Bal, The Amoeba Distributed Operating System - A Status Report

Messaging Technologies (review required: any TWO papers):

A Case for Message Oriented Middleware, G. Banavar et al.
Dolev and D. Malkhi, "The Transis Approach to High Availability Cluster Communication", Other Interesting Reading: Documentation and papers about Transis are also avaiable at "http://www.cs.huji.ac.il/labs/transis/
Amir, et al, "Group Communication as an Infrastructure for Distributed System Management", Proc. of the 3rd Workshop on Services in Distributed and Networked Environments, 1996
Amir, et al, "The Spread Wide Area Group Communication System
V. Renesse, K. P. Birman, and S. Maffeis, "Horus: A Flexible Group Communication System", Communications of the ACM, 1996
Banerjee, B. Bhattacharjee and C. Kommareddy, "Scalable Application Layer Multicast", ACM SIGCOMM 2002
Amir, C. Nita-Rotaru, J. Stanton, G. Tsudik, "Secure Spread: An Integrated Architecture for Secure Group Communication", IEEE Transactions on Dependable and Secure Computing, 2005
The Many Faces of Publish/Subscribe, PATRICK TH. EUGSTER

Fault Tolerance and Reliability (review required: any TWO papers):

Consensus

J. Fischer, N. A. Lynch, and M. S. Paterson, "Impossibility of Distributed Consensus with One Faulty Process", Journal of ACM, 1985
Dolev, C. Dwork, L. Stockmeyer, "On the Minimal Synchronism Needed for Distributed Consensus", Journal of ACM, 1987.

Failure Detectors

D. Chandra and S. Toueg, "Unreliable Failure Detectors for Reliable Distributed Systems", Journal of ACM, 1985
D. Chandra, V. Hadzilacos and S. Toueg, "The Weakest Failure Detector for Solving Consensus", Journal of ACM, 1996
K. Aguilera, W. Chen, and S. Toueg, "Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication", Cornel, 1997

Replication

S. Sandhu and S. Zhou, "Cluster-based file replication in large-scale distributed systems", ACM SIGMETRICS, 1992
Gray, P. Helland, P. Neil and D. Shasha , "The dangers of replication and a solution", ACM SIGMOD, 1996

Logging

P. Sistla and J. L. Welch, "Efficient distributed recovery using message logging", ACM SIOPS, 1989

Middleware Frameworks (NO REVIEW REQUIRED):

Distributed Computing Frameworks:

Object-based Middleware:

CORBA specification, www.omg.org
RT CORBA: Realt time CORBA
Fault tolerance CORBA: A Fault Tolerance Framework for CORBA
ZEN: Optimizing the ORB Core to Enhance Real-time CORBA Predictability and Performance
Data Access and Integration: ODBC/JDBC
Java Jini: "Architectural Overview", Sun Microsystems
Java RMI: "Java RMI Tutorial"
EJB: "Enterprise JavaBeans Technology", Sun Developer Network
J2EE: "Overview", Sun Developer Network

Service Oriented Architectures and Web Services:

Web services: "Part of the lectures" by M. Fisher
.NET: "The .NET Framework"
SOAP Web Service: (http://www.w3.org/TR/soap/)
A comparison of SOAP and REST implementations of a service based interaction independence middleware framework
SOAP-binQ: high-performance SOAP with continuous quality management
SOA: Service-Oriented Computing: State of the Art and Research Challenges
Restful Web-Service: Original work is done by Roy Fielding at UCI as his Ph.D thesis: (http://roy.gbiv.com/vita.html)
Principled design of the modern Web architecture
Middleware queues for job submission, messaging, etc.

Cloud Computing, Mobile Cloud Computing Platforms (NO REVIEW REQUIRED):

Giurgiu, O. Riva, D. Juric, I. Krivulev, and G. Alonso,Calling the Cloud: Enabling mobile phones as interfaces to cloud applications, Journal of ACM, 1985.
Chun, S. Ihm, P. Maniatis, M. Naik, A. Patti,CloneCloud: Elastic Execution between Mobile Device and Cloud, To appear in Proceedings of the 6th European Conference on Computer Systems (EuroSys 2011), April 2011.
Above the Clouds: A Berkeley View of Cloud Computing: Technical Report No. UCB/EECS-2009-28.
Wen, W. Zhang,and H. Luo, "Energy Optimal Mobile Application Execution: Taming Resource-Poor Mobile Devices with Cloud Clones", In IEEE INFOCOM 2012.
Michael P. Papazoglou, "Cloud Blueprints for Integrating and Managing Cloud Federations", In Springer Software Service and Application Engineering, 2012.
Tobias Kurze, Markus Klemsy, David Bermbachy, Alexander Lenkz, Stefan Taiy and Marcel Kunze, "Cloud Federation".
"Towards Characterizing Cloud Backend Workloads: Insights from Google Compute Clusters".
"CloudNaaS: A Cloud Networking Platform for Enterprise Applications".
"Effects of virtualization and cloud computing on data center networks".
"The Hadoop Distributed File System: Architecture and Design".
"MapReduce: Simplified Data Processing on Large Clusters".
"The Case for Enterprise-Ready Virtual Private Clouds".
What is (isn't) Google App Engine?, https://developers.google.com/appengine/training/intro/whatisgae
Introducing Azure, http://azure.microsoft.com/en-us/documentation/articles/fundamentals-introduction-to-azure/