[
General information | Announcements
| Course overview | Schedule
& Handouts |
Assignments
| Presentations
| Exams |
Project | Grading
| Suggested
Readings
| Resources
]
Recommended
references
o
Distributed Systems - Principles and Paradigms
(2nd Edition), by Andrew S.
Tanenbaum and Maarten Van Steen, Prentice Hall, 2006.
o
Distributed
Systems: Concepts and Design (5th Edition), by George Coulouris, Jean
Dollimore, Tim Kindberg, and Gordon Blair, Addison Wesley, 2011.
o Online references will be listed in the course schedule.
o April 24: the final exam is posted (due May 11). Thanks to Amazon for funding the accesses to Amazon Web Services.
o March 28: Solutions to the midterm exam are posted.
o March 24: Schedule for paper presentation is posted. Please find the panels that lead discussion of presentations.
o Midterm exam on March 14.
o Solutions to the homework are posted here.
o Project proposal is due February 24.
o February 22: Homework 2 is announced (due March 7).
o February 8: Homework 1 is announced (due February 22).
o February 7: A list of suggested papers for presentations is posted.
o February 1: Handout - Term Project.
o
January 18: Starting from January 25, we will have
class in F223 on Wednesdays 4:30pm-7:20pm.
o
January 18: Welcome to CSCE 6680-001. Here is the
Course Syllabus.
In the past decade, we have witnessed the explosive growth of network-centric distributed applications, ranging from web-based information sharing and dissemination to high-performance distributed computing on clusters, grids, and the Internet, and recently to on-demand computing on the cloud. This course is designed to provide graduate students in computer science and engineering with deep insights into key enabling technologies of advanced distributed computing systems and applications.
The objectives of this course are
Students
are expected to do significant reading and programming, as well as present
assigned papers and projects in class.
Time and
synchronization
Distributed mutual exclusion
Fault recovery and fault tolerance
Security in distributed systems
Autonomic management
Datacenter and cloud computing
Upon completion of this course, students are expected to be able to design and implement distributed approaches for various scientific, engineering, and cloud computing applications.
Tentative schedule
o (1/18) Course
Administration
o (1/25) Overview of Distributed Systems and OS Principals
o (2/1) Time, Clock, and Distributed
Synchronization
• Time, clocks, and the ordering of events in a distributed system, Lamport, Communications of ACM 1978.
o (2/8) Global States and Distributed Snapshots
• Distributed snapshots: determining global states of distributed systems, Chandy and Lamport, ACM TOCS 1985.
• Impossibility of distributed consensus with one faulty process, Fischer, Lynch and Patterson, Journal of ACM 1985.
o (2/15) Consensus Protocols and Distributed Mutual Exclusion
• A sqrt(N) Algorithm for Mutual Exclusion in Decentralized Systems, Maekawa, ACM TOCS 1985.
• Cheating Husbands and Other Stories: A Case Study of Knowledge, Action, and Communication, Moses et al., ACM PODC 1985.
• A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems, Lin et al., IPTPS 2004.
o (2/22) Distributed Mutual Exclusion and Fault Tolerance
• An Optimal Algorithm for Mutual Exclusion in Computer Networks, Ricart and Agrawala, Communications of ACM 1981.
• Distributed Systems: Principals and Paradigms, Chapter 8.
o (2/29) Distributed Resource Management
• Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors, Kwok and Ahmad, ACM Computing Surveys 1999.
o (3/7) Cloud Computing and Cloud Programming
• Above the Clouds: A Berkeley View of Cloud Computing, Armbrust et al., UC-Berkeley 2009.
o (3/14) Midterm Exam
o (3/21) Spring Break (No Class)
o (3/28)
Distributed Mutual
Exclusion
• A Practical Distributed Mutual Exclusion Protocol in Dynamic Peer-to-Peer Systems, Lin et al., IPTPS 2004. (Sonal Tanpure)
• A sqrt(N) Algorithm for Mutual Exclusion in Decentralized Systems, Maekawa, ACM TOCS 1985. (Qiang Guan)
o (4/4) Distributed Storage Systems
• Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis, Chen et al., ACM SOSP 2011. (Mahendra Talasila)
• Availability in Globally Distributed Storage Systems, Ford et al., USENIX OSDI 2010. (Mano Nandu Manem)
o (4/11) Performance and Resource Management
• WebProphet: Automating Performance Prediction for Web Services, Li et al., USENIX NSDI 2010. (Aly Esparza)
• Cells: A Virtual Mobile Smartphone Architecture, Andrus et al., ACM SOSP 2011. (Geng Zheng)
o (4/18) Data Centers and Utility Clouds
• Optimal Power Cost Management Using Stored Energy in Data Centers, Urgaonkar et al., ACM SIGMETRICS 2011. (Ziming Zhang)
• Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency, Calder et al, ACM SOSP 2011. (Devender Singh)
o (4/25)
• Concurrent Control with Readers and Writers, Courtois et al., CACM 1971. (Uttara Sawant)
• Developing Dependable and Energy-Efficient Cloud Computing Systems. (Dr. Fu's research)
o (5/2) Final Review
o (5/9) Project Presentations (Project reports are due May 11)
Final exam [due May 11]
Homework 2 [due Mar. 7] (Solutions)
Homework 1 [due Feb. 22] (Solutions: Sonal's answers)
Students are
required to attend all lectures, understand and critique papers, give
presentations, and participate in class discussion. In addition, students are
expected to carry out a semester-long project.
Success of this
class depends heavily on students' active participation in class discussion. To
ensure class discussion fruitful, students are expected to read papers in
advance. The reading list will be posted in the course homepage. At the
beginning of each class, students are required to turn in a short critique of
the paper to be discussed. You may skip critiques of no more than 30% of the papers
without penalty. No late critiques will be accepted.
Each student
will be expected to give one presentation and lead class discussion on the
topic of the presented papers. Guideline for presentation will be handed out in
class.
There will be
one midterm exam covering the fundamental concepts presented in this course, and
one take-home final.
The goal of the term project is to develop deep understanding of the technologies learned in class, accumulate hands-on experience in the application of the technologies to solve real-life problems, and possibly advance the state of the art in the field through your own research. You are expected to complete a term project in line with this objective. The project must be accompanied by a detailed project proposal, a half-time progress report and a comprehensive final report describing the problem, the implementation, experiments and results as well as their interpretation. The projects should be conducted individually or in a small group of up to two members. Projects will focus on new applications of distributed computing, new mechanisms implemented in the context of an existing distributed system, performance evaluation of distributed computing mechanisms/systems, etc. [ Project handout ]
The final grade will be determined by the
following factors:
Performance and Resource Management
Cloud Computing
Security and Virtualization
System Reliability