Cloud Computing - Theory and Practice

Cloud Computing This page is intentionally left blank Dan C. Marinescu Cloud Computing Theory and Practice AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier Acquiring Editor: Steve Elliot Editorial Project Manager: Lindsay Lawrence Project Manager: Anitha Kittusamy Ramasamy Cover Designer: Russell Purdy Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, 02451, USA Copyright © 2013 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our ­understanding, changes in research methods, or professional practices, or medical treatment may become ­necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of ­products liability, negligence or otherwise, or from any use or operation of any methods, products, ­instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Application submitted British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12404-627-6 Printed in the United States of America 13  14  15  16  17 10  9  8  7  6  5  4  3  2  1 For information on all MK publications visit our website at To Vera Rae and Luke Bell This page is intentionally left blank Preface ������������������������������������������������������������������������������������������������������������������������������������������������� xiii Foreword �������������������������������������������������������������������������������������������������������������������������������������������� xvii CHAPTER 1 Introduction ���������������������������������������������������������������������������������1 1.1 Network-Centric Computing and Network-Centric Content ���������������������������������������3 1.2 Peer-to-Peer Systems ���������������������������������������������������������������������������������������������������7 1.3 Cloud Computing: An Old Idea Whose Time has Come ���������������������������������������������9 1.4 Cloud Computing Delivery Models and Services �����������������������������������������������������11 1.5 Ethical Issues in Cloud Computing ���������������������������������������������������������������������������14 1.6 Cloud Vulnerabilities �������������������������������������������������������������������������������������������������15 1.7 Major Challenges Faced by Cloud Computing ���������������������������������������������������������16 1.8 Further Reading ���������������������������������������������������������������������������������������������������������17 1.9 History Notes �������������������������������������������������������������������������������������������������������������18 1.10 Exercises and Problems ���������������������������������������������������������������������������������������������18 CHAPTER 2 Parallel and Distributed Systems �������������������������������������������������21 2.1 Parallel Computing ����������������������������������������������������������������������������������������������������21 2.2 Parallel Computer Architecture ���������������������������������������������������������������������������������25 2.3 Distributed Systems ���������������������������������������������������������������������������������������������������27 2.4 Global State of a Process Group ��������������������������������������������������������������������������������28 2.5 Communication Protocols and Process Coordination �����������������������������������������������32 2.6 Logical Clocks �����������������������������������������������������������������������������������������������������������34 2.7 Message Delivery Rules; Causal Delivery �����������������������������������������������������������������35 2.8 Runs and Cuts; Causal History ����������������������������������������������������������������������������������38 2.9 Concurrency ���������������������������������������������������������������������������������������������������������������41 2.10 Atomic Actions ����������������������������������������������������������������������������������������������������������44 2.11 Consensus Protocols ��������������������������������������������������������������������������������������������������48 2.12 Modeling Concurrency with Petri Nets ���������������������������������������������������������������������51 2.13 Enforced Modularity: The Client-Server Paradigm ���������������������������������������������������57 2.14 Further Reading ���������������������������������������������������������������������������������������������������������62 2.15 History Notes �������������������������������������������������������������������������������������������������������������62 2.16 Exercises and Problems ���������������������������������������������������������������������������������������������64 CHAPTER 3 Cloud Infrastructure ��������������������������������������������������������������������67 3.1 Cloud Computing at Amazon ������������������������������������������������������������������������������������67 3.2 Cloud Computing: The Google Perspective ��������������������������������������������������������������77 Contents vii viii Contents 3.3 Microsoft Windows Azure and Online Services ���������������������������������������������������������79 3.4 Open-Source Software Platforms for Private Clouds ������������������������������������������������80 3.5 Cloud Storage Diversity and Vendor Lock-in ������������������������������������������������������������84 3.6 Cloud Computing Interoperability: The Intercloud ���������������������������������������������������86 3.7 Energy Use and Ecological Impact of Large-Scale Data Centers �����������������������������88 3.8 Service- and Compliance-Level Agreements ������������������������������������������������������������91 3.9 Responsibility Sharing Between User and Cloud Service Provider ��������������������������92 3.10 User Experience ���������������������������������������������������������������������������������������������������������93 3.11 Software Licensing ����������������������������������������������������������������������������������������������������95 3.12 Further Reading ���������������������������������������������������������������������������������������������������������96 3.13 History Notes �������������������������������������������������������������������������������������������������������������97 3.14 Exercises and Problems ���������������������������������������������������������������������������������������������97 CHAPTER 4 Cloud Computing: Applications and Paradigms �����������������������������99 4.1 Challenges for Cloud Computing ����������������������������������������������������������������������������100 4.2 Existing Cloud Applications and New Application Opportunities ��������������������������101 4.3 Architectural Styles for Cloud Applications ������������������������������������������������������������102 4.4 Workflows: Coordination of Multiple Activities �����������������������������������������������������104 4.5 Coordination Based on a State Machine Model: The ZooKeeper ���������������������������112 4.6 The MapReduce Programming Model ���������������������������������������������������������������������115 4.7 A Case Study: The GrepTheWeb Application ����������������������������������������������������������118 4.8 Clouds for Science and Engineering ������������������������������������������������������������������������120 4.9 High-Performance Computing on a Cloud ��������������������������������������������������������������121 4.10 Cloud Computing for Biology Research �����������������������������������������������������������������125 4.11 Social Computing, Digital Content, and Cloud Computing ������������������������������������128 4.12 Further Reading �������������������������������������������������������������������������������������������������������130 4.13 Exercises and Problems �������������������������������������������������������������������������������������������130 CHAPTER 5 Cloud Resource Virtualization ���������������������������������������������������131 5.1 Virtualization �����������������������������������������������������������������������������������������������������������132 5.2 Layering and Virtualization �������������������������������������������������������������������������������������133 5.3 Virtual Machine Monitors ����������������������������������������������������������������������������������������136 5.4 Virtual Machines ������������������������������������������������������������������������������������������������������136 5.5 Performance and Security Isolation �������������������������������������������������������������������������139 5.6 Full Virtualization and Paravirtualization ����������������������������������������������������������������140 5.7 Hardware Support for Virtualization �����������������������������������������������������������������������142 5.8 Case Study: Xen, a VMM Based on Paravirtualization �������������������������������������������144 5.9 Optimization of Network Virtualization in Xen 2.0 �������������������������������������������������149 5.10 vBlades: Paravirtualization Targeting an x86-64 Itanium Processor �����������������������152 5.11 A Performance Comparison of Virtual Machines ����������������������������������������������������154 5.12 The Darker Side of Virtualization ����������������������������������������������������������������������������156 5.13 Software Fault Isolation �������������������������������������������������������������������������������������������158 5.14 Further Reading �������������������������������������������������������������������������������������������������������159 5.15 History Notes �����������������������������������������������������������������������������������������������������������159 5.16 Exercises and Problems �������������������������������������������������������������������������������������������160 CHAPTER 6 Cloud Resource Management and Scheduling ����������������������������163 6.1 Policies and Mechanisms for Resource Management ���������������������������������������������164 6.2 Applications of Control Theory to Task Scheduling on a Cloud �����������������������������166 6.3 Stability of a Two-Level Resource Allocation Architecture ������������������������������������169 6.4 Feedback Control Based on Dynamic Thresholds ���������������������������������������������������171 6.5 Coordination of Specialized Autonomic Performance Managers ���������������������������172 6.6 A Utility-Based Model for Cloud-Based Web Services ������������������������������������������174 6.7 Resource Bundling: Combinatorial Auctions for Cloud Resources ������������������������178 6.8 Scheduling Algorithms for Computing Clouds �������������������������������������������������������182 6.9 Fair Queuing ������������������������������������������������������������������������������������������������������������184 6.10 Start-Time Fair Queuing ������������������������������������������������������������������������������������������185 6.11 Borrowed Virtual Time ��������������������������������������������������������������������������������������������190 6.12 Cloud Scheduling Subject to Deadlines ������������������������������������������������������������������194 6.13 Scheduling MapReduce Applications Subject to Deadlines ������������������������������������199 6.14 Resource Management and Dynamic Application Scaling �������������������������������������201 6.15 Further Reading �������������������������������������������������������������������������������������������������������202 6.16 Exercises and Problems �������������������������������������������������������������������������������������������203 CHAPTER 7 Networking Support ������������������������������������������������������������������205 7.1 Packet-Switched Networks ��������������������������������������������������������������������������������������205 7.2 The Internet ��������������������������������������������������������������������������������������������������������������207 7.3 Internet Migration to IPv6 ���������������������������������������������������������������������������������������210 7.4 The Transformation of the Internet ��������������������������������������������������������������������������211 7.5 Web Access and the TCP Congestion Control Window ������������������������������������������214 7.6 Network Resource Management ������������������������������������������������������������������������������217 7.7 Interconnection Networks for Computer Clouds �����������������������������������������������������219 7.8 Storage Area Networks ��������������������������������������������������������������������������������������������222 7.9 Content-Delivery Networks �������������������������������������������������������������������������������������226 7.10 Overlay Networks and Small-World Networks �������������������������������������������������������228 7.11 Scale-Free Networks ������������������������������������������������������������������������������������������������230 7.12 Epidemic Algorithms �����������������������������������������������������������������������������������������������236 7.13 Further Reading �������������������������������������������������������������������������������������������������������238 7.14 History Notes �����������������������������������������������������������������������������������������������������������238 7.15 Exercises and Problems �������������������������������������������������������������������������������������������239 CHAPTER 8 Storage Systems �����������������������������������������������������������������������241 8.1 The Evolution of Storage Technology ���������������������������������������������������������������������242 8.2 Storage Models, File Systems, and Databases ���������������������������������������������������������243 ixContents Contentsx 8.3 Distributed File Systems: The Precursors ����������������������������������������������������������������246 8.4 General Parallel File System �����������������������������������������������������������������������������������252 8.5 Google File System ��������������������������������������������������������������������������������������������������255 8.6 Apache Hadoop �������������������������������������������������������������������������������������������������������258 8.7 Locks and Chubby: A Locking Service �������������������������������������������������������������������260 8.8 Transaction Processing and NoSQL Databases �������������������������������������������������������264 8.9 BigTable �������������������������������������������������������������������������������������������������������������������266 8.10 Megastore �����������������������������������������������������������������������������������������������������������������268 8.11 History Notes �����������������������������������������������������������������������������������������������������������269 8.12 Further Reading �������������������������������������������������������������������������������������������������������270 8.13 Exercises and Problems �������������������������������������������������������������������������������������������271 CHAPTER 9 Cloud Security ��������������������������������������������������������������������������273 9.1 Cloud Security Risks �����������������������������������������������������������������������������������������������274 9.2 Security: The Top Concern for Cloud Users ������������������������������������������������������������277 9.3 Privacy and Privacy Impact Assessment ������������������������������������������������������������������279 9.4 Trust �������������������������������������������������������������������������������������������������������������������������281 9.5 Operating System Security ��������������������������������������������������������������������������������������283 9.6 Virtual Machine Security �����������������������������������������������������������������������������������������284 9.7 Security of Virtualization �����������������������������������������������������������������������������������������286 9.8 Security Risks Posed by Shared Images ������������������������������������������������������������������289 9.9 Security Risks Posed by a Management OS ������������������������������������������������������������292 9.10 Xoar: Breaking the Monolithic Design of the TCB ������������������������������������������������295 9.11 A Trusted Virtual Machine Monitor ������������������������������������������������������������������������298 9.12 Further Reading �������������������������������������������������������������������������������������������������������299 9.13 Exercises and Problems �������������������������������������������������������������������������������������������299 CHAPTER 10 Complex Systems and Self-Organization ������������������������������������301 10.1 Complex Systems ����������������������������������������������������������������������������������������������������301 10.2 Abstraction and Physical Reality �����������������������������������������������������������������������������303 10.3 Quantifying Complexity ������������������������������������������������������������������������������������������304 10.4 Emergence and Self-Organization ���������������������������������������������������������������������������306 10.5 Composability Bounds and Scalability ��������������������������������������������������������������������308 10.6 Modularity, Layering, and Hierarchy ����������������������������������������������������������������������310 10.7 More on the Complexity of Computing and Communication Systems �������������������312 10.8 Systems of Systems: Challenges and Solutions �������������������������������������������������������314 10.9 Further Reading �������������������������������������������������������������������������������������������������������315 10.10 Exercises and Problems �������������������������������������������������������������������������������������������315 CHAPTER 11 Cloud Application Development �������������������������������������������������317 11.1 Amazon Web Services: EC2 Instances ���������������������������������������������������������������������318 11.2 Connecting Clients to Cloud Instances Through Firewalls �������������������������������������319 11.3 Security Rules for Application and Transport Layer Protocols in EC2 ����������������������������������������������������������������������������������������������������������������������324 11.4 How to Launch an EC2 Linux Instance and Connect to it ��������������������������������������327 11.5 How to Use S3 in Java ���������������������������������������������������������������������������������������������328 11.6 How to Manage SQS Services in C# �����������������������������������������������������������������������331 11.7 How to Install the Simple Notification Service on Ubuntu 10.04 �����������������������������������������������������������������������������������������������������������332 11.8 How to Create an EC2 Placement Group and Use MPI ������������������������������������������334 11.9 How to Install Hadoop on Eclipse on a Windows System ���������������������������������������336 11.10 Cloud-Based Simulation of a Distributed Trust Algorithm �������������������������������������339 11.11 A Trust Management Service ����������������������������������������������������������������������������������344 11.12 A Cloud Service for Adaptive Data Streaming �������������������������������������������������������352 11.13 Cloud-Based Optimal FPGA Synthesis �������������������������������������������������������������������356 11.14 Exercises and Problems �������������������������������������������������������������������������������������������357 Literature �����������������������������������������������������������������������������������������������������361 Glossary �������������������������������������������������������������������������������������������������������379 Index ������������������������������������������������������������������������������������������������������������385 xiContents This page is intentionally left blank The idea that computing may be organized as a public utility, like water and electricity, was formulated in the 1960s by John McCarthy, a visionary computer scientist who championed mathematical logic in artificial intelligence. Four decades later, utility computing was embraced by major IT companies such as Amazon, Apple, Google, HP, IBM, Microsoft, and Oracle. Cloud computing is a movement started sometime during the middle of the first decade of the new millennium. The movement is motivated by the idea that information processing can be done more efficiently on large farms of computing and storage systems accessible via the Internet. In this book we attempt to sift through the large volume of information and dissect the main ideas related to cloud computing. Computer clouds support a paradigm shift from local to network-centric computing and network- centric content, when computing and storage resources are provided by distant data centers. Scientific and engineering applications, data mining, computational financing, gaming and social networking, and many other computational and data-intensive activities can benefit from cloud computing. Storing information “on the cloud’’ has significant advantages and was embraced by cloud service providers. For example, in 2011 Apple announced the iCloud, a network-centric alternative for content such as music, videos, movies, and personal information. Content previously confined to personal devices such as workstations, laptops, tablets, and smart phones need no longer be stored locally, can be shared by all these devices, and is accessible whenever a device is connected to the Internet. The appeal of cloud computing is that it offers scalable and elastic computing and storage ­services. The resources used for these services can be metered and the users can be charged only for the resources they use. Cloud computing is a business reality today as increasing numbers of organizations are adopt- ing this paradigm. Cloud computing is cost effective because of the multiplexing of resources. Application data is stored closer to the site where it is used in a manner that is device and location independent; potentially, this data storage strategy increases reliability as well as security. The maintenance and security are ensured by service providers; the service providers can operate more efficiently due to economy of scale. Cloud computing is a technical and social reality today; at the same time, it is an emerging techno­ logy. At this time one can only speculate how the infrastructure for this new paradigm will evolve and what applications will migrate to it. The economic, social, ethical, and legal implications of this shift in technology, whereby users rely on services provided by large data centers and store private data and software on systems they do not control, are likely to be significant. Cloud computing represents a dramatic shift in the design of systems capable of providing vast amounts of computing cycles and storage space. During the previous four decades, one-of-a-kind ­systems were built with the most advanced components available at the time at a high cost; but today clouds use off-the shelf, low-cost components. Gordon Bell argued in the early 1990s that one-of-a- kind systems are not only expensive to build, but the cost of rewriting applications for them is prohibi- tive [45]. Cloud computing reinforces the idea that computing and communication are deeply intertwined. Advances in one field are critical for the other. Indeed, cloud computing could not emerge as a feasible Preface xiii alternative to the traditional paradigms for data-intensive applications before the Internet was able to support high-bandwidth, low-latency, reliable, low-cost communication; at the same time, modern networks could not function without powerful computing systems to manage them. High-performance switches are critical elements of both networks and computer clouds. There are virtually no bounds on composition of digital systems controlled by software, so we are tempted to build increasingly complex systems. The behavior and the properties of such systems are not always well understood; thus, we should not be surprised that computing clouds will occasionally exhibit an unexpected behavior and system failures. The architecture, the coordination algorithms, the design methodology, and the analysis techniques for large-scale complex systems like computing clouds will evolve in response to changes in technol- ogy, the environment, and the social impact of cloud computing. Some of these changes will reflect the changes in the Internet itself in terms of speed, reliability, security, capacity to accommodate a larger addressing space by migration to IPv6, and so on. In December 2011, 32.7% of the world population, of slightly less than 7 billion, were Internet users, according to The 528% growth rate of Internet users during the period 2000–2011 is expected to be replicated if not exceeded in the next decade. Some of these new Internet users will discover the appeal of computing clouds and use cloud services explicitly, whereas a very large segment of the population will benefit from services supported by computing clouds without knowing the role the clouds play in their lives. A recent posting on ZDNet reveals that EC2 was made up of 454,600 servers in January 2012; when one adds the number of servers supporting other AWS services, the total number of Amazon systems dedicated to cloud computing is much larger. An unofficial estimation puts the number of servers used by Google in January 2012 close to 1.8 million; this number was expected to be close to 2.4 million by early 2013. The complexity of such systems is unquestionable and raises questions such as: How can we man- age such systems? Do we have to consider radically new ideas, such as self-management and self- repair, for future clouds consisting of millions of servers? Should we migrate from a strictly determin- istic view of such complex systems to a nondeterministic one? Answers to these questions provide a rich set of research topics for the computer science and engineering community. The cloud movement is not without skeptics and critics. The critics argue that cloud computing is just a marketing ploy, that users may become dependent on proprietary systems, that the failure of a large system such as the cloud could have significant consequences for a very large group of users who depend on the cloud for their computing and storage needs. Security and privacy are major concerns for cloud computing users. The skeptics question what a cloud actually is, what is new, how does it differ from other types of large-scale distributed systems, and why cloud computing could be successful when grid ­computing had only limited success. The CEO of Oracle said, “I think the Internet was the last big change. The Internet is maturing. They don’t call it the Internet anymore. They call it cloud computing.’’ In 2012, the Oracle Cloud was announced; the website of the company acknowledges: “Cloud computing represents a fantastic opportunity for technology companies to help customers simplify IT, that often-baffling and always-changing sector of the corporate world that’s become increasingly valuable in today’s global economy.’’ A very important question is whether, under pressure from the user community, the current stan- dardization efforts spearheaded by the National Institute of Standards and Technology (NIST), will succeed. The alternative, the continuing dominance of proprietary cloud computing environments, is Prefacexiv likely to have a negative impact on the field. The three cloud delivery models, Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), will continue to coexist for the foreseeable future. Services based on SaaS will probably be increasingly popular because they are more accessible to lay people, whereas services based on IaaS will be the domain of computer-savvy individuals. If the standardization effort succeeds, we may see PaaS designed to migrate from one infrastructure to another and overcome the concerns related to vendor lock-in. This book attempts to provide a snapshot of the state of the art of a dynamic field likely to expe- rience significant developments in the near future. The first chapter is an informal introduction to network-centric computing and network-centric content, to the entities involved in cloud computing, the paradigms and the services, and the ethical issues. Chapter 2 is a review of basic concepts in paral- lel and distributed computing; the chapter covers a range of subjects, from the global state of a process group to causal history, atomic actions, modeling concurrency with Petri nets, and consensus protocols. The next two chapters address questions of interest for the users of cloud computing. The cloud infrastructure is the subject of Chapter 3; we discuss the cloud services provided by Amazon, Google, and Microsoft, then we analyze the open-source platforms for private clouds, service-level and ­compliance-level agreements, and software licensing. Next we cover the energy use and the social impact of large-scale data centers and the user experience. Chapter 4 discusses cloud applications; after a brief review of workflows we analyze coordination using the Zookeeper and then the MapReduce programming model. The applications of clouds in science and engineering, biology research, and social computing are then discussed, followed by a presentations of benchmarks for high-performance computing on a cloud. Chapters 5 through 9 cover the architecture, algorithms, communication, storage, and cloud secu- rity. Chapter 5 is dedicated to virtualization; we discuss virtual machines, virtual machine monitors, architectural support for virtualization, and performance and security isolation and illustrate the con- cepts with an in‑depth analysis of Xen and vBlades and with a performance comparison of virtual machines. Chapter 5 closes with a discussion of virtual machine security and software fault isolation. Resource management and scheduling are the topics of Chapter 6. First, we present a utility model for cloud-based Web services, then we discuss the applications of control theory to scheduling, two- level resource allocation strategies, and coordination of multiple autonomic performance mangers. We  emphasize the concept of resource bundling and introduce combinatorial auctions for cloud resources. Next, we analyze fair queuing, start-time fair queuing, and borrowed virtual time scheduling algorithms and cloud scheduling subject to deadlines. Chapter 7 presents several aspects of networking pertinent to cloud computing. After a brief discus- sion of the evolution of the Internet we review basic ideas regarding network resource management strategies, interconnects for warehouse-scale computers, and storage area networks. Then we overview content delivery networks and analyze in some depth overlay networks and their potential applications to cloud computing. Finally, we discuss epidemic algorithms. In Chapter 8 we discuss storage systems. First, we review the early distributed file systems of the early 1980s: the Network File System developed by Sun Microsystems, the Andrew File System devel- oped at Carnegie Mellon University as part of the Andrew project, and the Sprite Network File System developed at University of California Berkeley as a component of the Unix-like distributed operating system called Sprite. Then we present the General Parallel File System developed at IBM in the early 2000s. The in-depth discussions of the Google File System, the Bigtable, and the ­Megastore illustrate the new challenges posed to the design of datastores by network-centric computing and network-­centric xvPreface content and the shift from traditional relational database systems to databases capable of supporting online transaction-processing systems. Cloud security is covered in Chapter 9. After a general discussion of cloud security risks, privacy, and trust, the chapter analyzes the security of virtualization and the security risks posed by shared images and by the management operating system. The implementation of a hypervisor based on ­microkernel design principles and a trusted virtual machine monitor are then presented. Chapter 10 presents topics related to complex systems and self-organization. The chapter starts with an introduction to complex systems, followed by an analysis of the relationship between abstrac- tions and the physical reality. A review of the possible means to quantify complexity is followed by a discussion of emergence and self-organization. The discussion of the complexity of computing and communication systems starts with presentation of composability bound and scalability, followed by other means to cope with complexity, including modularity, layering, and hierarchy. Finally we discuss the challenges posed by systems of systems. The last chapter of the book, Chapter 11, is dedicated to practical aspects of application develop- ment. Here we are only concerned with applications developed for the Amazon Web Services (AWS). The chapter starts with a discussion of security-related issues and the practical means of clients to connect to cloud instances through firewalls. The chapter provides recipes for using different AWS services; two AWS applications, one related to trust management in a cognitive network and the other to adaptive data streaming to and from a cloud are discussed in detail. More than 385 references are cited in the text. Many references present recent research results in several areas related to cloud computing; others are classical references on major topics in parallel and distributed systems. A glossary covers terms grouped in several categories, from general to services, virtualization, desirable attributes, and security. The history notes at the end of many chapters present the milestones in a particular field; they serve as reminders of how recently important concepts, now considered classical in the field, have been developed. They also show the impact of technological developments that have challenged the com- munity and motivated radical changes in our thinking. The contents of this book reflect a series of lectures given to graduate classes on cloud computing. The applications discussed in Chapter 11 were developed by several students as follows: Tim Preston contributed to 11.3; Shameek Bhattacharjee to 11.4, 11.10, and 11.11; Charles Schneider to 11.5; Michael Riera to 11.6 and to 11.13; Kumiki Ogawa to 11.7; Wei Dai to 11.8; Gettha Priya Balasubra- manian to 11.9; and Ashkan Paya to 11.2. The author is grateful to several students who contributed ideas, suggested ways to improve the manuscript, and helped identify and correct errors: David Adams, Ragu N. Aula, Surbhi Bhardwaj, Solmaz Gurkan, Brendan Lynch, Kyle Martin, Bart Miller, Ganesh Sundaresan, and Paul Szerlip. ­ Special thanks to Ramya Pradhan and William Strickland for their insightful comments and ­suggestions. The author wants to express his appreciation for the constant guidance and help provided by Steve Elliot and Lindsay Lawrence from the publisher, Morgan Kaufmann. We also acknowledge Gabriela Marinescu’s effort during the final stages of manuscript preparation. Supplemental Materials Supplemental materials for instructors or students can be downloaded from Elsevier: http://store. Prefacexvi This book is a timely, comprehensive introduction to cloud computing. The phrase cloud computing, which was almost never used a decade ago, is now part of the standard vocabulary. Millions of people around the world use cloud services, and the numbers are growing rapidly. Even education is being transformed in radical ways by cloud computing in the form of massive open online courses (MOOCs). This book is particularly valuable at this time because the phrase cloud computing covers so many dif- ferent types of computing services, and the many people participating in conversations about clouds need to be aware of the space that it spans. The introductory material in this book explains the key con- cepts of cloud computing and is accessible to almost everybody; such basic, but valuable, information should be required reading for the many people who use some form of cloud computing today. The book provides a signal service by describing the wide range of applications of cloud comput- ing. Most people are aware of cloud services such as email and social networks, but many are not famil- iar with its applications in science and medicine. Teams of scientists, collaborating around the world, find that cloud computing is efficient. This book will help people dealing with a variety of applications evaluate the benefit of cloud computing for their specific needs. This book describes the wide range of cloud services available today and gives examples of services from multiple vendors. The examples are particularly helpful because they give readers an idea of how applications work on different platforms. The market for cloud computing is dynamic, and as time goes on new vendors and new platforms will become available. The examples provided in the book will help readers develop a framework for understanding and evaluating new platforms as they become available. Cloud computing is based on many decades of work on parallel and distributed computing systems. This book describes some of the central ideas in this work as it applies to cloud computing. Relatively few books integrate theory with applications and with practical examples from a variety of vendors; this book is an excellent source for the increasing numbers of students interested in the area. Server farms consume an increasing amount of the nation’s energy. Sustainability requires mecha- nisms for server farms to provide the same quality of cloud services while reducing the amount of ­energy required. This book discusses this important issue as well as other critical issues such as security and privacy. Indeed, this is an excellent single source for the range of critical issues in cloud computing. The wide span of material covered, from the introductory to the advanced; the integration of theory and practice; the range of applications; and the number of examples the book includes make this an excel- lent book for a variety of readers. K. Mani Chandi Simon Ramo Professor and Professor of Computer Science, California Institute of Technology Foreword xvii This page is intentionally left blank 1 CHAPTER Introduction The last decades have reinforced the idea that information processing can be done more efficiently centrally, on large farms of computing and storage systems accessible via the Internet. When com- puting resources in distant data centers are used rather than local computing systems, we talk about network-centric computing and network-centric content. Advancements in networking and other areas are responsible for the acceptance of the two new computing models and led to the grid computing movement in the early 1990s and, since 2005, to utility computing and cloud computing. In utility computing the hardware and software resources are concentrated in large data centers and users can pay as they consume computing, storage, and communication resources. Utility computing often requires a cloud-like infrastructure, but its focus is on the business model for providing the computing services. Cloud computing is a path to utility computing embraced by major IT companies such as Amazon, Apple, Google, HP, IBM, Microsoft, Oracle, and others. Cloud computing delivery models, deployment models, defining attributes, resources, and organiza- tion of the infrastructure discussed in this chapter are summarized in Figure 1.1. There are three cloud delivery models: Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS), and Infrastructure-as-a- Service (IaaS), deployed as public, private, community, and hybrid clouds. The defining attributes of the new philosophy for delivering computing services are as follows: ¥ Cloud computing uses Internet technologies to offer elastic services. The term elastic computing refers to the ability to dynamically acquire computing resources and support a variable workload. A cloud service provider maintains a massive infrastructure to support elastic services. ¥ The resources used for these services can be metered and the users can be charged only for the resources they use. ¥ Maintenance and security are ensured by service providers. ¥ Economy of scale allows service providers to operate more efficiently due to specialization and centralization. ¥ Cloud computing is cost-effective due to resource multiplexing; lower costs for the service provider are passed on to the cloud users. ¥ The application data is stored closer to the site where it is used in a device- and location-independent manner; potentially, this data storage strategy increases reliability and security and, at the same time, it lowers communication costs. Cloud computing is a technical and social reality and an emerging technology. At this time, one can only speculate how the infrastructure for this new paradigm will evolve and what applications will migrate to it. The economical, social, ethical, and legal implications of this shift in technology, in which users rely on services provided by large data centers and store private data and software on systems they do not control, are likely to be significant. Cloud Computing. © 2013 Elsevier Inc. All rights reserved. 1 2 CHAPTER 1 Introduction Delivery models Infrastructure-as-a-Service (IaaS) Software-as-a-Service (SaaS) Platform-as-a-Service (PaaS) Deployment models Private cloud Hybrid cloud Public cloud Community cloud Massive infrastructure Accessible via the Internet Utility computing. Pay-per-usage Elasticity Applications Defining attributes ServicesNetworks Compute & storage servers Resources Cloud computing Distributed infrastructure Resource virtualization Autonomous systems Infrastructure FIGURE 1.1 Cloud computing: Delivery models, deployment models, defining attributes, resources, and organization of the infrastructure. Scientific and engineering applications, data mining, computational financing, gaming, and social networking as well as many other computational and data-intensive activities can benefit from cloud computing. A broad range of data, from the results of high-energy physics experiments to financial or enterprise management data to personal data such as photos, videos, and movies, can be stored on the cloud. In early 2011 Apple announced the iCloud, a network-centric alternative for storing content such as music, videos, movies, and personal information; this content was previously confined to personal devices such as workstations, laptops, tablets, or smartphones. The obvious advantage of network- centric content is the accessibility of information from any site where users can connect to the Internet. Clearly, information stored on a cloud can be shared easily, but this approach raises major concerns: Is the information safe and secure? Is it accessible when we need it? Do we still own it? In the next few years, the focus of cloud computing is expected to shift from building the infras- tructure, today’s main front of competition among the vendors, to the application domain. This shift in focus is reflected by Google’s strategy to build a dedicated cloud for government organizations in the United States. The company states: “We recognize that government agencies have unique regulatory and compliance requirements for IT systems, and cloud computing is no exception. So we’ve invested a lot of time in understanding government’s needs and how they relate to cloud computing.” In a discussion of technology trends, noted computer scientist Jim Gray emphasized that in 2003 the cost of communication in a wide area network had decreased dramatically and will continue to do so. Thus, it makes economical sense to store the data near the application [144] Ð in other words, to store 1.1 Network-Centric Computing and Network-Centric Content 3 it in the cloud where the application runs. This insight leads us to believe that several new classes of cloud computing applications could emerge in the next few years [25]. As always, a good idea has generated a high level of excitement that translated into a flurry of publi- cations Ð some of a scholarly depth, others with little merit or even bursting with misinformation. In this book we attempt to sift through the large volume of information and dissect the main ideas related to cloud computing. We first discuss applications of cloud computing and then analyze the infrastructure for the technology. Several decades of research in parallel and distributed computing have paved the way for cloud computing. Through the years we have discovered the challenges posed by the implementation, as well as the algorithmic level, and the ways to address some of them and avoid the others. Thus, it is important to look back at the lessons we learned from this experience through the years; for this reason we start our discussion with an overview of parallel computing and distributed systems. 1.1 Network-centric computing and network-centric content The concepts and technologies for network-centric computing and content evolved through the years and led to several large-scale distributed system developments: ¥ The Web and the semantic Web are expected to support composition of services (not necessarily computational services) available on the Web.1 ¥ The Grid, initiated in the early 1990s by National Laboratories and Universities, is used primarily for applications in the area of science and engineering. ¥ Computer clouds, promoted since 2005 as a form of service-oriented computing by large IT com- panies, are used for enterprise computing, high-performance computing, Web hosting, and storage for network-centric content. The need to share data from high-energy physics experiments motivated Sir Tim Berners-Lee, who worked at the European Organization for Nuclear Research (CERN) in the late 1980s, to put together the two major components of the World Wide Web: HyperText Markup Language (HTML) for data description and HyperText Transfer Protocol (HTTP) for data transfer. The Web opened a new era in data sharing and ultimately led to the concept of network-centric content. The semantic Web2 is an effort to enable laypeople to more easily find, share, and combine infor- mation available on the Web. In this vision, the information can be readily interpreted by machines, so machines can perform more of the tedious work involved in finding, combining, and acting upon information on the Web. Several technologies are necessary to provide a formal description of concepts, terms, and relationships within a given knowledge domain; they include the Resource Description Framework (RDF), a variety of data interchange formats, and notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL). 1The Web is dominated by unstructured or semistructured data, whereas the semantic Web advocates inclusion of semantic content in Web pages. 2The term semantic Web was coined by Tim Berners-Lee to describe “a web of data that can be processed directly and indirectly by machines.” It is a framework for data sharing among applications based on the Resource Description Framework (RDF). The semantic Web is “largely unrealized,” according to Berners-Lee. 4 CHAPTER 1 Introduction Gradually, the need to make computing more affordable and to liberate users from the concerns regarding system and software maintenance reinforced the idea of concentrating computing resources in data centers. Initially, these centers were specialized, each running a limited palette of software systems as well as applications developed by the users of these systems. In the early 1980s major research organizations such as the National Laboratories and large companies had powerful computing centers supporting large user populations scattered throughout wide geographic areas. Then the idea to link such centers in an infrastructure resembling the power grid was born; the model known as network-centric computing was taking shape. A computing grid is a distributed system consisting of a large number of loosely coupled, heteroge- neous, and geographically dispersed systems in different administrative domains. The term computing grid is a metaphor for accessing computer power with similar ease as we access power provided by the electric grid. Software libraries known as middleware have been furiously developed since the early 1990s to facilitate access to grid services. The vision of the grid movement was to give a user the illusion of a very large virtual supercomputer. The autonomy of the individual systems and the fact that these systems were connected by wide-area networks with latency higher than the latency of the interconnection network of a supercomputer posed serious challenges to this vision. Nevertheless, several “Grand Challenge” problems, such as protein folding, financial modeling, earthquake simulation, and climate and weather modeling, run successfully on specialized grids. The Enabling Grids for Escience project is arguably the largest computing grid; along with the LHC Computing Grid (LCG), the Escience project aims to support the experiments using the Large Hadron Collider (LHC) at CERN which generate several gigabytes of data per second, or 10 PB (petabytes) per year. In retrospect, two basic assumptions about the infrastructure prevented the grid movement from having the impact its supporters were hoping for. The first is the heterogeneity of the individual systems interconnected by the grid; the second is that systems in different administrative domains are expected to cooperate seamlessly. Indeed, the heterogeneity of the hardware and of system software poses significant challenges for application development and for application mobility. At the same time, critical areas of system management, including scheduling, optimization of resource allocation, load balancing, and fault tolerance, are extremely difficult in a heterogeneous system. The fact that resources are in different administrative domains further complicates many already difficult problems related to security and resource management. Although very popular in the science and engineering communities, the grid movement did not address the major concerns of the enterprise computing communities and did not make a noticeable impact on the IT industry. Cloud computing is a technology largely viewed as the next big step in the development and deploy- ment of an increasing number of distributed applications. The companies promoting cloud computing seem to have learned the most important lessons from the grid movement. Computer clouds are typically homogeneous. An entire cloud shares the same security, resource management, cost and other policies, and last but not least, it targets enterprise computing. These are some of the reasons that several agencies of the US Government, including Health and Human Services (HHS), the Centers for Disease Con- trol (CDC), the National Aeronautics and Space Administration (NASA), the Navy’s Next Generation Enterprise Network (NGEN), and the Defense Information Systems Agency (DISA), have launched cloud computing initiatives and conduct actual system development intended to improve the efficiency and effectiveness of their information processing needs. 1.1 Network-Centric Computing and Network-Centric Content 5 The term content refers to any type or volume of media, be it static or dynamic, monolithic or modular, live or stored, produced by aggregation, or mixed. Information is the result of functions applied to content. The creation and consumption of audio and visual content are likely to transform the Internet to support increased quality in terms of resolution, frame rate, color depth, and stereoscopic information, and it seems reasonable to assume that the Future Internet3 will be content-centric. The content should be treated as having meaningful semantic connotations rather than a string of bytes; the focus will be the information that can be extracted by content mining when users request named data and content providers publish data objects. Content-centric routing will allow users to fetch the desired data from the most suitable location in terms of network latency or download time. There are also some challenges, such as providing secure services for content manipulation, ensuring global rights management, control over unsuitable content, and reputation management. Network-centric computing and network-centric content share a number of characteristics: ¥ Most applications are data-intensive. Computer simulation becomes a powerful tool for scientific research in virtually all areas of science, from physics, biology, and chemistry to archeology. Sophisti- cated tools for computer-aided design, such as Catia (Computer Aided Three-dimensional Interactive Application), are widely used in the aerospace and automotive industries. The widespread use of sen- sors contributes to increases in the volume of data. Multimedia applications are increasingly popular; the ever-larger media increase the load placed on storage, networking, and processing systems. ¥ Virtually all applications are network-intensive. Indeed, transferring large volumes of data requires high-bandwidth networks; parallel computing, computation steering,4 and data streaming are exam- ples of applications that can only run efficiently on low-latency networks. ¥ The systems are accessed using thin clients running on systems with limited resources. In June 2011 Google released Google Chrome OS, designed to run on primitive devices and based on the browser with the same name. ¥ The infrastructure supports some form of workflow management. Indeed, complex computational tasks require coordination of several applications; composition of services is a basic tenet of Web 2.0. The advantages of network-centric computing and network-centric content paradigms are, at the same time, sources for concern; we discuss some of them: ¥ Computing and communication resources (CPU cycles, storage, network bandwidth) are shared and resources can be aggregated to support data-intensive applications. Multiplexing leads to a higher resource utilization; indeed, when multiple applications share a system, their peak demands for resources are not synchronized and the average system utilization increases. On the other hand, the management of large pools of resources poses new challenges as complex systems are subject to phase transitions. New resource management strategies, such as self-organization, and decisions based on approximate knowledge of the state of the system must be considered. Ensuring quality-of- service (QoS) guarantees is extremely challenging in such environments because total performance isolation is elusive. 3The term Future Internet is a generic concept referring to all research and development activities involved in the development of new architectures and protocols for the Internet. 4Computation steering in numerical simulation means to interactively guide a computational experiment toward a region of interest. 6 CHAPTER 1 Introduction ¥ Data sharing facilitates collaborative activities. Indeed, many applications in science, engineering, and industrial, financial, and governmental applications require multiple types of analysis of shared data sets and multiple decisions carried out by groups scattered around the globe. Open software development sites are another example of such collaborative activities. Data sharing poses not only security and privacy challenges but also requires mechanisms for access control by authorized users and for detailed logs of the history of data changes. ¥ Cost reduction. Concentration of resources creates the opportunity to pay as you go for computing and thus eliminates the initial investment and reduces significantly the maintenance and operation costs of the local computing infrastructure. ¥ User convenience and elasticity, that is the ability to accommodate workloads with very large peak- to-average ratios. It is very hard to point out a single technological or architectural development that triggered the movement toward network-centric computing and network-centric content. This movement is the result of a cumulative effect of developments in microprocessor, storage, and networking technologies coupled with architectural advancements in all these areas and, last but not least, with advances in software systems, tools, programming languages, and algorithms to support distributed and parallel computing. Through the years we have witnessed the breathtaking evolution of solid-state technologies which led to the development of multicore and many-core processors. Quad-core processors such as the AMD Phenom II X4, the Intel i3, i5, and i7 and hexa-core processors such as the AMD Phenom II X6 and Intel Core i7 Extreme Edition 980X are now used in the servers populating computer clouds. The proximity of multiple cores on the same die allows the cache coherency circuitry to operate at a much higher clock rate than would be possible if the signals were to travel off-chip. Storage technology has also evolved dramatically. For example, solid-state disks such as RamSan- 440 allow systems to manage very high transaction volumes and larger numbers of concurrent users. RamSan-440 uses DDR2 (double-data-rate) RAM to deliver 600,000 sustained random input/output operations per second (IOPS) and over 4 GB/s of sustained random read or write bandwidth, with latency of less than 15 microseconds, and it is available in 256 GB and 512 GB configurations. The price of memory has dropped significantly; at the time of this writing the price of a 1 GB module for a PC is approaching $10. Optical storage technologies and Flash memories are widely used nowadays. The thinking in software engineering has also evolved and new models have emerged. The three-tier model is a software architecture and a software design pattern. The presentation tier is the topmost level of the application; typically, it runs on a desktop PC or workstation, uses a standard graphical user interface (GUI) and displays information related to services such as browsing merchandise, purchasing products, and managing shopping cart contents. The presentation tier communicates with other tiers by sending the results to the browser/client tier and all other tiers in the network. The application/logic tier controls the functionality of an application and may consist of one or more separate modules running on a workstation or application server; it may be multitiered itself, in which case the architecture is called an n-tier architecture.Thedata tier controls the servers where the information is stored; it runs a relational database management system (RDBMS) on a database server or a mainframe and con- tains the computer data storage logic. The data tier keeps data independent from application servers or processing logic and improves scalability and performance. Any of the tiers can be replaced indepen- dently; for example, a change of operating system in the presentation tier would only affect the user interface code. 1.2 Peer-to-Peer Systems 7 1.2 Peer-to-peer systems The distributed systems discussed in Chapter 2 allow access to resources in a tightly controlled envi- ronment. System administrators enforce security rules and control the allocation of physical rather than virtual resources. In all models of network-centric computing prior to utility computing, a user maintains direct control of the software and the data residing on remote systems. This user-centric model, in place since the early 1960s, was challenged in the 1990s by the peer-to- peer (P2P) model. P2P systems can be regarded as one of the precursors of today’s clouds. This new model for distributed computing promoted the idea of low-cost access to storage and central processing unit (CPU) cycles provided by participant systems; in this case, the resources are located in different administrative domains. Often the P2P systems are self-organizing and decentralized, whereas the servers in a cloud are in a single administrative domain and have a central management. P2P systems exploit the network infrastructure to provide access to distributed computing resources. Decentralized applications developed in the 1980s, such as Simple Mail Transfer Protocol (SMTP), a protocol for email distribution, and Network News Transfer Protocol (NNTP), an application protocol for dissemination of news articles, are early examples of P2P systems. Systems developed in the late 1990s, such as the music-sharing system Napster, gave participants access to storage distributed over the network, while the first volunteer-based scientific computing, SETI@home, used free cycles of participating systems to carry out compute-intensive tasks. The P2P model represents a significant departure from the client-server model, the cornerstone of distributed applications for several decades. P2P systems have several desirable properties [306]: ¥ They require a minimally dedicated infrastructure, since resources are contributed by the participat- ing systems. ¥ They are highly decentralized. ¥ They are scalable; the individual nodes are not required to be aware of the global state. ¥ They are resilient to faults and attacks, since few of their elements are critical for the delivery of service and the abundance of resources can support a high degree of replication. ¥ Individual nodes do not require excessive network bandwidth the way servers used in case of the client-server model do. ¥ Last but not least, the systems are shielded from censorship due to the dynamic and often unstructured system architecture. The undesirable properties of peer-to-peer systems are also notable: Decentralization raises the question of whether P2P systems can be managed effectively and provide the security required by various applications. The fact that they are shielded from censorship makes them a fertile ground for illegal activities, including distribution of copyrighted content. In spite of its problems, the new paradigm was embraced by applications other than file sharing. Since 1999 new P2P applications such as the ubiquitous Skype, a Voice-over-Internet Protocol (VoIP) tele- phony service,5 data-streaming applications such as Cool Streaming [386] and BBC’s online video 5Skype allows close to 700 million registered users from many countries around the globe to communicate using a proprietary VoIP protocol. The system developed in 2003 by Niklas Zennström and Julius Friis was acquired by Microsoft in 2011 and nowadays is a hybrid P2P and client-server system. 8 CHAPTER 1 Introduction service, content distribution networks such as CoDeeN [368], and volunteer computing applica- tions based on the Berkeley Open Infrastructure for Networking Computing (BOINC) platform [21] have proved their appeal to users. For example, Skype reported in 2008 that 276 million regis- tered Skype users have used more than 100 billion minutes for voice and video calls. The site reports that at the end of June 2012 volunteer computing involved more than 275,000 individuals and more than 430,000 computers providing a monthly average of almost 6.3 petaFLOPS. It is also reported that peer-to-peer traffic accounts for a very large fraction of Internet traffic, with estimates ranging from 40% to more than 70%. Many groups from industry and academia rushed to develop and test new ideas, taking advantage of the fact that P2P applications do not require a dedicated infrastructure. Applications such as Chord [334] and Credence [366] address issues critical to the effective operation of decentralized systems. Chord is a distributed lookup protocol to identify the node where a particular data item is stored. The routing tables are distributed and, whereas other algorithms for locating an object require the nodes to be aware of most of the nodes of the network, Chord maps a key related to an object to a node of the network using routing information about a few nodes only. Credence is an object reputation and ranking scheme for large-scale P2P file-sharing systems. Repu- tation is of paramount importance for systems that often include many unreliable and malicious nodes. In the decentralized algorithm used by Credence, each client uses local information to evaluate the repu- tation of other nodes and shares its own assessment with its neighbors. The credibility of a node depends only on the votes it casts; each node computes the reputation of another node based solely on the degree of matching with its own votes and relies on like-minded peers. Overcite [337] is a P2P application to aggregate documents based on a three-tier design. The Web front-ends accept queries and display the results while servers crawl through the Web to generate indexes and to perform keyword searches; the Web back-ends store documents, meta-data, and coordination state on the participating systems. The rapid acceptance of the new paradigm triggered the development of a new communication protocol allowing hosts at the network periphery to cope with the limited network bandwidth available to them. BitTorrent is a peer-to-peer file-sharing protocol that enables a node to download/upload large files from/to several hosts simultaneously. The P2P systems differ in their architecture. Some do not have any centralized infrastructure, whereas others have a dedicated controller, but this controller is not involved in resource-intensive operations. For example, Skype has a central site to maintain user accounts; users sign in and pay for specific activities at this site. The controller for a BOINC platform maintains membership and is involved in task distribution to participating systems. The nodes with abundant resources in systems without any centralized infrastructure often act as supernodes and maintain information useful to increasing the system efficiency, such as indexes of the available content. Regardless of the architecture, P2P systems are built around an overlay network, a virtual network superimposed over the real network. Methods to construct such an overlay, discussed in Section 7.10, consider a graph G = (V , E), where V is the set of N vertices and E is the set of links between them. Each node maintains a table of overlay links connecting it with other nodes of this virtual network, each node being identified by its IP address. Two types of overlay networks, unstructured and structured, are used by P2P systems. Random walks starting from a few bootstrap nodes are usually used by systems desiring to join an unstructured overlay. Each node of a structured overlay has a unique key that determines its position in the structure; the keys are selected to guarantee a uniform distribution in a 1.3 Cloud Computing: An Old Idea Whose Time Has Come 9 very large name space. Structured overlay networks use key-based routing (KBR); given a starting node v0 and a key k, the function KBR(v0, k) returns the path in the graph from v0 to the vertex with key k. Epidemic algorithms discussed in Section 7.12 are often used by unstructured overlays to disseminate network topology. 1.3 Cloud computing: an old idea whose time has come Once the technological elements were in place, it was only a matter of time until the economical advantages of cloud computing became apparent. Due to the economy of scale, large data centers Ð centers with more than 50,000 systems Ð are more economical to operate than medium-sized centers that have around 1,000 systems. Large data centers equipped with commodity computers experience a five to seven times decrease of resource consumption, including energy, compared to medium-sized centers [25]. The networking costs, in dollars per Mbit/s/month, are 95/13 = 7.1 times larger, and the storage costs, in dollars per Gbyte/month, are 2.2/0.4 = 5.7 times larger for medium-sized centers. Medium-sized centers have a larger administrative overhead Ð one system administrator for 140 systems versus one for 1,000 systems for large centers. Data centers are very large consumers of electric energy to keep servers and the networking infras- tructure running and for cooling. For example, there are 6,000 data centers in the United States and in 2006 they reportedly consumed 61 billion KWh, 1.5% of all electric energy in the U.S., at a cost of $4.5 billion. The power demanded by data centers was predicted to double from 2006 to 2011. Peak instantaneous demand was predicted to increase from 7 GW in 2006 to 12 GW in 2011, requiring the construction of 10 new power plants. In the United States the energy costs differ from state to state; for example 1 KWh costs 3.6 cents in Idaho, 10 cents in California, and 18 cents in Hawaii. Thus, data centers should be placed at sites with low energy cost. The term computer cloud is overloaded, since it covers infrastructures of different sizes, with different management and different user populations. Several types of cloud are envisioned: ¥ Private cloud. The infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on or off the premises of the organization. ¥ Community cloud. The infrastructure is shared by several organizations and supports a specific community that has shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be managed by the organizations or a third party and may exist on premises or off premises. ¥ Public cloud. The infrastructure is made available to the general public or a large industry group and is owned by an organization selling cloud services. ¥ Hybrid cloud. The infrastructure is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds). A private cloud could provide the computing resources needed for a large organization, such as a research institution, a university, or a corporation. The argument that a private cloud does not support utility computing is based on the observation that an organization has to invest in the infrastructure and a user of a private cloud pays as it consumes resources [25]. Nevertheless, a private cloud could use the 10 CHAPTER 1 Introduction same hardware infrastructure as a public one; its security requirements will be different from those for a public cloud and the software running on the cloud is likely to be restricted to a specific domain. A natural question to ask is: Why could cloud computing be successful when other paradigms have failed? The reasons that cloud computing could be successful can be grouped into several general categories: technological advances, a realistic system model, user convenience, and financial advantages. A nonexhaustive list of reasons for the success of cloud computing includes these points: ¥ Cloud computing is in a better position to exploit recent advances in software, networking, storage, and processor technologies. Cloud computing is promoted by large IT companies where these new technological developments take place, and these companies have a vested interest in promoting the new technologies. ¥ A cloud consists of a homogeneous set of hardware and software resources in a single administrative domain. In this setup, security, resource management, fault tolerance, and quality of service are less challenging than in a heterogeneous environment with resources in multiple administrative domains. ¥ Cloud computing is focused on enterprise computing; its adoption by industrial organizations, finan- cial institutions, healthcare organizations, and so on has a potentially huge impact on the economy. ¥ A cloud provides the illusion of infinite computing resources; its elasticity frees application designers from the confinement of a single system. ¥ A cloud eliminates the need for up-front financial commitment, and it is based on a pay-as-you-go approach. This has the potential to attract new applications and new users for existing applications, fomenting a new era of industrywide technological advancements. In spite of the technological breakthroughs that have made cloud computing feasible, there are still major obstacles for this new technology; these obstacles provide opportunity for research. We list a few of the most obvious obstacles: ¥ Availability of service. What happens when the service provider cannot deliver? Can a large company such as General Motors move its IT to the cloud and have assurances that its activity will not be negatively affected by cloud overload? A partial answer to this question is provided by service-level agreements (SLAs).6 A temporary fix with negative economical implications is overprovisioning, that is, having enough resources to satisfy the largest projected demand. ¥ Vendor lock-in. Once a customer is hooked to one provider, it is hard to move to another. The standardization efforts at National Institute of Standards and Technology (NIST) attempt to address this problem. ¥ Data confidentiality and auditability. This is indeed a serious problem; we analyze it in Chapter 9. ¥ Data transfer bottlenecks. Many applications are data-intensive. A very important strategy is to store the data as close as possible to the site where it is needed. Transferring 1 TB of data on a 1 Mbps network takes 8 million seconds, or about 10 days; it is faster and cheaper to use courier service and send data recoded on some media than to send it over the network. Very high-speed networks will alleviate this problem in the future; for example, a 1 Gbps network would reduce this time to 8,000 s, or slightly more than 2 h. 6SLAs are discussed in Section 3.8. 1.4 Cloud Computing Delivery Models and Services 11 ¥ Performance unpredictability. This is one of the consequences of resource sharing. Strategies for performance isolation are discussed in Section 5.5. ¥ Elasticity, the ability to scale up and down quickly.New algorithms for controlling resource allocation and workload placement are necessary. Autonomic computing based on self-organization and self- management seems to be a promising avenue. There are other perennial problems with no clear solutions at this time, including software licensing and system bugs. 1.4 Cloud computing delivery models and services According to the NIST reference model in Figure 1.2 [260], the entities involved in cloud computing are the service consumer, the entity that maintains a business relationship with and uses service from service providers; the service provider, the entity responsible for making a service available to service consumers; the carrier, the intermediary that provides connectivity and transport of cloud services between providers and consumers; the broker, an entity that manages the use, performance, and delivery of cloud services and negotiates relationships between providers and consumers; and the auditor, a party that can conduct independent assessment of cloud services, information system operations, performance, and security of the cloud implementation. An audit is a systematic evaluation of a cloud system that measures how well it conforms to a set of established criteria. For example, a security audit evaluates Carrier S e c u r i t y P r i v a c y Service Consumer BrokerService Provider Auditor Security audit Privacy impact audit Performance audit Service Management Business support Provisioning Portability/ Interoperability IAASIaaS SaaS Service Layer PaaS Carrier Hardware Facility Physical resource layer Resource abstraction and control layer Intermediation Aggregation Arbitrage FIGURE 1.2 The entities involved in service-oriented computing and, in particular, in cloud computing, according to NIST. The carrier provides connectivity among service providers, service consumers, brokers, and auditors. 12 CHAPTER 1 Introduction cloud security, a privacy-impact audit evaluates cloud privacy assurance, and a performance audit evaluates cloud performance. We start with the observation that it is difficult to distinguish the services associated with cloud computing from those that any computer operations center would include [332]. Many of the services discussed in this section could be provided by a cloud architecture, but note that they are available in noncloud architectures as well. Figure 1.3 presents the structure of the three delivery models, SaaS, PaaS, and IaaS, according to the Cloud Security Alliance [98]. Software-as-a-Service (SaaS) gives the capability to use applications supplied by the service provider in a cloud infrastructure. The applications are accessible from various client devices through a thin-client Facilities Hardware Core connectivity Abstraction API Facilities Hardware Core connectivity Abstraction API Integration and middleware Data Metadata Applications API Presentation Facilities Hardware Core connectivity Abstraction API Integration and middleware Infrastructure-as-a-Service Platform-as-a-Service Software-as-a-Service FIGURE 1.3 The structure of the three delivery models, SaaS, PaaS,andIaaS. SaaS gives users the capability to use applications supplied by the service provider but allows no control of the platform or the infrastructure. PaaS gives the capability to deploy consumer-created or acquired applications using programming languages and tools supported by the provider. IaaS allows the user to deploy and run arbitrary software, which can include operating systems and applications. 1.4 Cloud Computing Delivery Models and Services 13 interface such as a Web browser (e.g., Web-based email). The user does not manage or control the under- lying cloud infrastructure, including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings. Services offered include: ¥ Enterprise services such as workflow management, groupware and collaborative, supply chain, communications, digital signature, customer relationship management (CRM), desktop software, financial management, geo-spatial, and search [32]. ¥ Web 2.0 applications such as metadata management, social networking, blogs, wiki services, and portal services. The SaaS is not suitable for applications that require real-time response or those for which data is not allowed to be hosted externally. The most likely candidates for SaaS are applications for which: ¥ Many competitors use the same product, such as email. ¥ Periodically there is a significant peak in demand, such as billing and payroll. ¥ There is a need for Web or mobile access, such as mobile sales management software. ¥ There is only a short-term need, such as collaborative software for a project. Platform-as-a-Service (PaaS) gives the capability to deploy consumer-created or acquired applica- tions using programming languages and tools supported by the provider. The user does not manage or control the underlying cloud infrastructure, including network, servers, operating systems, or stor- age. The user has control over the deployed applications and, possibly, over the application hosting environment configurations. Such services include session management, device integration, sandboxes, instrumentation and testing, contents management, knowledge management, and Universal Description, Discovery, and Integration (UDDI), a platform-independent Extensible Markup Language (XML)-based registry providing a mechanism to register and locate Web service applications. PaaS is not particulary useful when the application must be portable, when proprietary programming languages are used, or when the underlaying hardware and software must be customized to improve the performance of the application. The major PaaS application areas are in software development where multiple developers and users collaborate and the deployment and testing services should be automated. Infrastructure-as-a-Service (IaaS) is the capability to provision processing, storage, networks, and other fundamental computing resources; the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of some networking components, such as host firewalls. Services offered by this delivery model include: server hosting, Web servers, storage, computing hardware, operating systems, virtual instances, load balancing, Internet access, and bandwidth provisioning. The IaaS cloud computing delivery model has a number of characteristics, such as the fact that the resources are distributed and support dynamic scaling, it is based on a utility pricing model and variable cost, and the hardware is shared among multiple users. This cloud computing model is particulary useful when the demand is volatile and a new business needs computing resources and does not want to invest in a computing infrastructure or when an organization is expanding rapidly. 14 CHAPTER 1 Introduction A number of activities are necessary to support the three delivery models; they include: 1. Service management and provisioning, including virtualization, service provisioning, call center, operations management, systems management, QoS management, billing and accounting, asset management, SLA management, technical support, and backups. 2. Security management, including ID and authentication, certification and accreditation, intrusion prevention, intrusion detection, virus protection, cryptography, physical security, incident response, access control, audit and trails, and firewalls. 3. Customer services such as customer assistance and online help, subscriptions, business intelligence, reporting, customer preferences, and personalization. 4. Integration services, including data management and development. This list shows that a service-oriented architecture involves multiple subsystems and complex inter- actions among these subsystems. Individual subsystems can be layered; for example, in Figure 1.2 we see that the service layer sits on top of a resource abstraction layer, which controls the physical resource layer. 1.5 Ethical issues in cloud computing Cloud computing is based on a paradigm shift with profound implications for computing ethics. The main elements of this shift are: (i) the control is relinquished to third-party services; (ii) the data is stored on multiple sites administered by several organizations; and (iii) multiple services interoperate across the network. Unauthorized access, data corruption, infrastructure failure, and service unavailability are some of the risks related to relinquishing the control to third-party services; moreover, whenever a problem occurs, it is difficult to identify the source and the entity causing it. Systems can span the boundaries of multiple organizations and cross security borders, a process called deperimeterization. As a result of deperimeterization, “not only the border of the organization’s IT infrastructure blurs, also the border of the accountability becomes less clear” [350]. The complex structure of cloud services can make it difficult to determine who is responsible in case something undesirable happens. In a complex chain of events or systems, many entities contribute to an action, with undesirable consequences. Some of them have the opportunity to prevent these consequences, and therefore no one can be held responsible Ð the so-called “problem of many hands.” Ubiquitous and unlimited data sharing and storage among organizations test the self-determination of information, the right or ability of individuals to exercise personal control over the collection, and use and disclosure of their personal data by others; this tests the confidence and trust in today’s evolving information society. Identity fraud and theft are made possible by the unauthorized access to personal data in circulation and by new forms of dissemination through social networks, which could also pose a danger to cloud computing. Cloud service providers have already collected petabytes of sensitive personal information stored in data centers around the world. The acceptance of cloud computing therefore will be determined by pri- vacy issues addressed by these companies and the countries where the data centers are located. Privacy is affected by cultural differences; though some cultures favor privacy, other cultures emphasize com- munity, and this leads to an ambivalent attitude toward privacy on the Internet, which is a global system. 1.6 Cloud Vulnerabilities 15 The question of what can be done proactively about ethics of cloud computing does not have easy answers; many undesirable phenomena in cloud computing will only appear in time. However, the need for rules and regulations for the governance of cloud computing is obvious. The term governance means the manner in which something is governed or regulated, the method of management, or the system of regulations. Explicit attention to ethics must be paid by governmental organizations providing research funding for cloud computing; private companies are less constrained by ethics oversight and governance arrangements are more conducive to profit generation. Accountability is a necessary ingredient of cloud computing; adequate information about how data is handled within the cloud and about allocation of responsibility are key elements for enforcing ethics rules in cloud computing. Recorded evidence allows us to assign responsibility; but there can be tension between privacy and accountability, and it is important to establish what is being recorded and who has access to the records. Unwanted dependency on a cloud service provider, the so-called vendor lock-in, is a serious concern, and the current standardization efforts at NIST attempt to address this problem. Another concern for users is a future with only a handful of companies that dominate the market and dictate prices and policies. 1.6 Cloud vulnerabilities Clouds are affected by malicious attacks and failures of the infrastructure (e.g., power failures). Such events can affect Internet domain name servers and prevent access to a cloud or can directly affect the clouds. For example, an attack at Akamai on June 15, 2004 caused a domain name outage and a major blackout that affected Google, Yahoo!, and many other sites. In May 2009 Google was the target of a serious denial-of-service (DoS) attack that took down services such Google News and Gmail for several days. Lightning caused a prolonged downtime at Amazon on June 29 and 30, 2012; the AWS cloud in the Eastern region of the United States, which consists of 10 data centers across four availability zones, was initially troubled by utility power fluctuations, probably caused by an electrical storm. A June 29, 2012 storm on the East Coast took down some Virginia-based Amazon facilities and affected companies using systems exclusively in this region. Instagram, a photo-sharing service, was one of the victims of this outage, according to The recovery from the failure took a very long time and exposed a range of problems. For example, one of the 10 centers failed to switch to backup generators before exhausting the power that could be supplied by uninterruptible power supply (UPS) units. AWS uses “control planes” to allow users to switch to resources in a different region, and this software component also failed. The booting process was faulty and extended the time to restart EC2 (Elastic Computing) and EBS (Elastic Block Store) services. Another critical problem was a bug in the elastic load balancer (ELB), which is used to route traffic to servers with available capacity. A similar bug affected the recovery process of the Relational Database Service (RDS). This event brought to light “hidden” problems that occur only under special circumstances. A recent paper [126] identifies stability risks due to interacting services. A cloud application provider, a cloud storage provider, and a network provider could implement different policies, and the unpre- dictable interactions between load-balancing and other reactive mechanisms could lead to dynamic instabilities. The unintended coupling of independent controllers that manage the load, the power 16 CHAPTER 1 Introduction consumption, and the elements of the infrastructure could lead to undesirable feedback and instability similar to the ones experienced by the policy-based routing in the Internet Border Gateway Protocol (BGP). For example, the load balancer of an application provider could interact with the power optimizer of the infrastructure provider. Some of these couplings may only manifest under extreme conditions and be very hard to detect under normal operating conditions, but they could have disastrous consequences when the system attempts to recover from a hard failure, as in the case of the AWS 2012 failure. Clustering the resources in data centers located in different geographical areas is one of the means used today to lower the probability of catastrophic failures. This geographic dispersion of resources could have additional positive side effects; it can reduce communication traffic and energy costs by dispatching the computations to sites where the electric energy is cheaper, and it can improve performance by an intelligent and efficient load-balancing strategy. Sometimes a user has the option to decide where to run an application; we shall see in Section 3.1 that an AWS user has the option to choose the regions where the instances of his or her applications will run, as well as the regions of the storage sites. System objectives (e.g., maximize throughput, resource utilization, and financial benefits) have to be carefully balanced with user needs (e.g., low cost and response time and maximum availability). The price to pay for any system optimization is increased system complexity, as we shall see in Section 10.7. For example, the latency of communication over a wide area network (WAN) is con- siderably larger than the one over a local area network (LAN) and requires the development of new algorithms for global decision making. 1.7 Major challenges faced by cloud computing Cloud computing inherits some of the challenges of parallel and distributed computing discussed in Chapter 2; at the same time, it faces major challenges of its own. The specific challenges differ for the three cloud delivery models, but in all cases the difficulties are created by the very nature of utility computing, which is based on resource sharing and resource virtualization and requires a different trust model than the ubiquitous user-centric model we have been accustomed to for a very long time. The most significant challenge is security [19]; gaining the trust of a large user base is critical for the future of cloud computing. It is unrealistic to expect that a public cloud will provide a suitable environment for all applications. Highly sensitive applications related to the management of the critical infrastructure, healthcare applications, and others will most likely be hosted by private clouds. Many real-time applications will probably still be confined to private clouds. Some applications may be best served by a hybrid cloud setup; such applications could keep sensitive data on a private cloud and use a public cloud for some of the processing. The SaaS model faces similar challenges as other online services required to protect private infor- mation, such as financial or healthcare services. In this case a user interacts with cloud services through a well-defined interface; thus, in principle it is less challenging for the service provider to close some of the attack channels. Still, such services are vulnerable to DoS attack and the users are fearful of mali- cious insiders. Data in storage is most vulnerable to attack, so special attention should be devoted to the protection of storage servers. Data replication necessary to ensure continuity of service in case of storage system failure increases vulnerability. Data encryption may protect data in storage, but eventually data must be decrypted for processing, and then it is exposed to attack. 1.8 Further Reading 17 The IaaS model is by far the most challenging to defend against attacks. Indeed, an IaaS user has considerably more degrees of freedom than the other two cloud delivery models. An additional source of concern is that the considerable resources of a cloud could be used to initiate attacks against the network and the computing infrastructure. Virtualization is a critical design option for this model, but it exposes the system to new sources of attack. The trusted computing base (TCB) of a virtual environment includes not only the hardware and the hypervisor but also the management operating system. As we shall see in Section 9.7, the entire state of a virtual machine (VM) can be saved to a file to allow migration and recovery, both highly desirable operations; yet this possibility challenges the strategies to bring the servers belonging to an organization to a desirable and stable state. Indeed, an infected VM can be inactive when the systems are cleaned up, and it can wake up later and infect other systems. This is another example of the deep intertwining of desirable and undesirable effects of basic cloud computing technologies. The next major challenge is related to resource management on a cloud. Any systematic rather than ad hoc resource management strategy requires the existence of controllers tasked to implement several classes of policies: admission control, capacity allocation, load balancing, energy optimization, and last but not least, to provide QoS guarantees. To implement these policies the controllers need accurate information about the global state of the system. Determining the state of a complex system with 106 servers or more, distributed over a large geographic area, is not feasible. Indeed, the external load, as well as the state of individual resources, changes very rapidly. Thus, controllers must be able to function with incomplete or approximate knowl- edge of the system state. It seems reasonable to expect that such a complex system can only function based on self-management principles. But self-management and self-organization raise the bar for the implementation of logging and auditing procedures critical to the security and trust in a provider of cloud computing services. Under self-management it becomes next to impossible to identify the reasons that a certain action that resulted in a security breach was taken. The last major challenge we want to address is related to interoperability and standardization. Vendor lock-in, the fact that a user is tied to a particular cloud service provider, is a major concern for cloud users (see Section 3.5). Standardization would support interoperability and thus alleviate some of the fears that a service critical for a large organization may not be available for an extended period of time. But imposing standards at a time when a technology is still evolving is not only challenging, it can be counterproductive because it may stifle innovation. From this brief discussion the reader should realize the complexity of the problems posed by cloud computing and understand the wide range of technical and social problems cloud computing raises. If successful, the effort to migrate the IT activities of many government agencies to public and private clouds will have a lasting effect on cloud computing. Cloud computing can have a major impact on education, but we have seen little effort in this area. 1.8 Further reading A very good starting point for understanding the major issues in cloud computing is the 2009 paper “Above the clouds: a Berkeley view of cloud computing” [25]. A comprehensive survey of peer-to-peer systems was published in 2010 [306]. Content distribution systems are discussed in [368]. The BOINC 18 CHAPTER 1 Introduction platform is presented in [21]. Chord [334] and Credence [366] are important references in the area of peer-to-peer systems. Ethical issues in cloud computing are discussed in [350]. A recent book covers topics in the area of distributed systems, including grids, peer-to-peer systems, and clouds [173]. The standardization effort at NIST is described by a wealth of documents [259Ð267] on the Web site 1.9 History notes John McCarthy was a visionary in computer science; in the early 1960s he formulated the idea that computation may be organized as a public utility, like water and electricity. In 1992 Gordon Bell was invited to and delivered an address at a conference on parallel computations with the provocative title Massively parallel computers: why not parallel computers for the masses? [45]; he argued that one-of-a- kind systems are not only expensive to build, but the cost of rewriting applications for them is prohibitive. Google Inc. was founded by Page and Brin, two graduate students in computer science at Stanford University; in 1998 the company was incorporated in California after receiving a contribution of $100, 000 from the co-founder and chief hardware designer of Sun Microsystems, Andy Bechtolsheim. Amazon EC2 was initially released as a limited public beta cloud computing service on August 25, 2006. The system was developed by a team from Cape Town, South Africa. In October 2008 Microsoft announced the Windows Azure platform; in June 2010 the platform became commercially available. iCloud, a cloud storage and cloud computing service from Apple Inc., stores content such as music, photos, calendars, and documents and allows users to access it from Apple devices. The system was announced on June 6, 2011. In 2012 the Oracle Cloud was announced (see corporate/features/oracle-cloud/index.html ) 1.10 Exercises and problems Problem 1. Mobile devices could benefit from cloud computing; explain the reasons you think that this statement is true or provide arguments supporting the contrary. Discuss several cloud appli- cations for mobile devices, then explain which one of the three cloud computing delivery models, SaaS, PaaS,orIaaS, would be used by each one of the applications and why. Problem 2. Do you believe that the homogeneity of large-scale distributed systems is an advantage? Discuss the reasons for your answer. What aspects of hardware homogeneity are the most relevant in your view, and why? What aspects of software homogeneity do you believe are the most relevant, and why? Problem 3. Peer-to-peer systems and clouds share a few goals but not the means to accomplish them. Compare the two classes of systems in terms of architecture, resource management, scope, and security. Problem 4. Compare the three cloud computing delivery models, SaaS, PaaS, and IaaS, from the point of view of application developers and users. Discuss the security and the reliability of each model. Analyze the differences between PaaS and IaaS. 1.10 Exercises and Problems 19 Problem 5. Overprovisioning is the reliance on extra capacity to satisfy the needs of a large community of users when the average-to-peak resource demand ratio is very high. Give an example of a large-scale system using overprovisioning and discuss whether overprovisioning is sustainable in that case and what its limitations are. Is cloud elasticity based on overprovisioning sustainable? Give arguments to support your answer. Problem 6. Discuss the possible solution for stabilizing cloud services mentioned in [126] inspired by BGP (Border Gateway Protocol) routing [145,359]. Problem 7. An organization debating whether to install a private cloud or to use a public cloud (e.g., the AWS) for its computational and storage needs asks for your advice. What information will you require to come to your recommendation, and how will you use each one of the following items? (a) The description of the algorithms and the type of the applications the organization will run; (b) the system software used by these applications; (c) the resources needed by each application; (d) the size of the user population; and (e) the relative experience of the user population; and (f) the costs involved. Problem 8. A university is debating the question in Problem 7. What will be your advice, and why? Should software licensing be an important element of the decision? Problem 9. An IT company decides to provide free access to a public cloud dedicated to higher education. Which one of the three cloud computing delivery models, SaaS, PaaS,or IaaS, should it embrace, and why? What applications would be most beneficial for the students? Will this solution have an impact on distance learning? Why or why not? This page is intentionally left blank 2 CHAPTER Parallel and Distributed Systems Cloud computing is based on a large number of ideas and the experience accumulated since the first elec- tronic computer was used to solve computationally challenging problems. In this chapter we overview parallel and distributed systems concepts that are important to understanding the basic challenges in the design and use of computer clouds. Cloud computing is intimately tied to parallel and distributed computing. Cloud applications are based on the client-server paradigm with relatively simple software, a thin client, running on the user’s machine while the computations are carried out on the cloud. Many cloud applications are data-intensive and use a number of instances that run concurrently. Transaction processing systems, such as Web- based services, represent a large class of applications hosted by computing clouds; such applications run multiple instances of the service and require reliable and in-order delivery of messages. The concepts introduced in this section are very important in practice. Communication protocols support coordination of distributed processes and transport information through noisy and unreliable communication channels that may lose messages or deliver duplicate, distorted, or out-of-order mes- sages. To ensure reliable and in-order delivery of messages, the protocols stamp each message with a sequence number; in turn, a receiver sends an acknowledgment with its own sequence number to confirm the receipt of a message. The clocks of a sender and a receiver may not be synchronized, so these sequence numbers act as logical clocks. Timeouts are used to request the retransmission of lost or delayed messages. The concept of consistent cuts and distributed snapshots are at the heart of checkpoint-restart pro- cedures for long-lasting computations. Indeed, many cloud computations are data-intensive and run for extended periods of time on multiple computers in the cloud. Checkpoints are taken periodically in anticipation of the need to restart a software process when one or more systems fail; when a failure occurs, the computation is restarted from the last checkpoint rather than from the beginning. Many functions of a computer cloud require information provided by monitors, system components that collect state information from the individual systems. For example, controllers for cloud resource management, discussed in Chapter 6, require accurate state information; security and reliability can only be implemented using information provided by specialized monitors. Coordination of multiple instances is a critical function of an application controller. 2.1 Parallel computing As demonstrated by nature, the ability to work in parallel as a group represents a very efficient way to reach a common target; human beings have learned to aggregate themselves and to assemble man-made devices in organizations in which each entity may have modest ability, but a network of Cloud Computing. © 2013 Elsevier Inc. All rights reserved. 21 22 CHAPTER 2 Parallel and Distributed Systems entities can organize themselves to accomplish goals that an individual entity cannot. Thus, we should not be surprised that the thought that individual systems should work in concert to solve complex applications was formulated early on in the computer age. Parallel computing allows us to solve large problems by splitting them into smaller ones and solv- ing them concurrently. Parallel computing was considered for many years the “holy grail” for solving data-intensive problems encountered in many areas of science, engineering, and enterprise computing; it required major advances in several areas, including algorithms, programming languages and envi- ronments, performance monitoring, computer architecture, interconnection networks, and last but not least, solid-state technologies. Parallel hardware and software systems allow us to solve problems demanding more resources than those provided by a single system and, at the same time, to reduce the time required to obtain a solution. The speed-up measures the effectiveness of parallelization; in the general case the speed-up of the parallel computation is defined as S(N) = T (1) T (N), (2.1) with T (1) the execution time of the sequential computation and T (N) the execution time when N parallel computations are carried out. Amdahl’s Law1 gives the potential speed-up of a parallel computation; it states that the portion of the computation that cannot be parallelized determines the overall speed-up. If α is the fraction of running time a sequential program spends on nonparallelizable segments of the computation, then S = 1 α . (2.2) To prove this result, call σ the sequential time and π the parallel time and start from the definitions of T (1), T (N), and α: T (1) = σ + π, T (N) = σ + π N , and α = σ σ + π . (2.3) Then S = T (1) T (N) = σ + π σ + π/N = 1 + π/σ 1 + (π/σ ) × (1/N). (2.4) But π/σ = 1 − α α . (2.5) Thus, for large N S = 1 + (1 − α)/α 1 + (1 − α)/(Nα) = 1 α + (1 − α)/N ≈ 1 α . (2.6) Amdahl’s law applies to a fixed problem size; in this case the amount of work assigned to each one of the parallel processes decreases when the number of processes increases, and this affects the efficiency of the parallel execution. 1Gene Amdahl is a theoretical physicist turned computer architect who contributed significantly to the development of several IBM systems, including System/360, and then started his own company, Amdahl Corporation. His company produced high-performance systems in the 1970s. Amdahl is best known for Amdahl’s Law, formulated in 1960. 2.1 Parallel Computing 23 When the problem size is allowed to change, Gustafson’s Law gives the scaled speed-up with N parallel processes as S(N) = N − α(N − 1). (2.7) As before, we call σ the sequential time; now π is the fixed parallel time per process; α is given by Equation 2.3. The sequential execution time, T (1), and the parallel execution time with N parallel processes, T (N),are T (1) = σ + Nπ and T (N) = σ + π. (2.8) Then the scaled speed-up is S(N) = T (1) T (N) = σ + Nπ σ + π = σ σ + π + Nπ σ + π = α + N(1 − α) = N − α(N − 1). (2.9) Amdahl’s Law expressed by Equation 2.2 and the scaled speed-up given by Equation 2.7 assume that all processes are assigned the same amount of work. The scaled speed-up assumes that the amount of work assigned to each process is the same, regardless of the problem size. Then, to maintain the same execution time, the number of parallel processes must increase with the problem size. The scaled speed-up captures the essence of efficiency, namely that the limitations of the sequential part of a code can be balanced by increasing the problem size. Coordination of concurrent computations could be quite challenging and involves overhead, which ultimately reduces the speed-up of parallel computations. Often the parallel computation involves mul- tiple stages, and all concurrent activities must finish one stage before starting the execution of the next one; this barrier synchronization further reduces the speed-up. The subtasks of a parallel program are called processes, whereas threads are lightweight subtasks. Concurrent execution could be very challenging (e.g., it could lead to race conditions, an undesirable effect in which the results of concurrent execution depend on the sequence of events). Often, shared resources must be protected by locks to ensure serial access. Another potential problem for concurrent execution of multiple processes or threads is the presence of deadlocks; a deadlock occurs when pro- cesses or threads competing with one another for resources are forced to wait for additional resources held by other processes or threads and none of the processes or threads can finish. The four Coffman conditions must hold simultaneously for a deadlock to occur: 1. Mutual exclusion. At least one resource must be nonsharable, and only one process/thread may use the resource at any given time. 2. Hold and wait. At least one process/thread must hold one or more resources and wait for others. 3. No preemption. The scheduler or a monitor should not be able to force a process/thread holding a resource to relinquish it. 4. Circular wait. Given the set of n processes/threads {P1, P2, P3,...,Pn}, P1 should wait for a resource held by P2, P2 should wait for a resource held by P3, and so on and Pn should wait for a resource held by P1. There are other potential problems related to concurrency. When two or more processes or threads continually change their state in response to changes in the other processes, we have a livelock condition; the result is that none of the processes can complete its execution. Very often processes/threads running concurrently are assigned priorities and scheduled based on these priorities. Priority inversion occurs when a higher-priority process or task is indirectly preempted by a lower-priority one. 24 CHAPTER 2 Parallel and Distributed Systems Concurrent processes/tasks can communicate using messages or shared memory. Multicore proces- sors sometimes use shared memory, but the shared memory is seldom used in modern supercomputers because shared-memory systems are not scalable. Message passing is the communication method used exclusively in large-scale distributed systems, and our discussion is restricted to this communication paradigm. Shared memory is extensively used by the system software; the stack is an example of shared memory used to save the state of a process or thread. The kernel of an operating system uses control structures such as processor and core tables for multiprocessor and multicore system management, process and thread tables for process/thread management, page tables for virtual memory management, and so on. Multiple application threads running on a multicore processor often communicate via the shared memory of the system. Debugging a message-passing application is considerably easier than debugging a shared memory application. We distinguish fine-grain from coarse-grain parallelism; in the former case relatively small blocks of the code can be executed in parallel without the need to communicate or synchronize with other threads or processes, while in the latter case large blocks of code can be executed in parallel. The speed- up of applications displaying fine-grain parallelism is considerably lower than that of coarse-grained applications; indeed, the processor speed is orders of magnitude higher than the communication speed, even on systems with a fast interconnect. In many cases, discovering parallelism is quite challenging, and the development of parallel algo- rithms requires a considerable effort. For example, many numerical analysis problems, such as solving large systems of linear equations or solving systems of partial differential equations (PDEs), requires algorithms based on domain decomposition methods. Data parallelism is based on partitioning the data into several blocks and running multiple copies of the same program concurrently, each running on a different data block Ð thus the name of the paradigm, Same Program Multiple Data (SPMD). Decomposition of a large problem into a set of smaller problems that can be solved concurrently is sometimes trivial. For example, assume that we want to manipulate the display of a three-dimensional object represented as a 3D lattice of (n × n × n) points; to rotate the image we would apply the same transformation to each one of the n3 points. Such a transformation can be done by a geo- metric engine, a hardware component that can carry out the transformation of a subset of n3 points concurrently. Suppose that we want to search for the occurrence of an object in a set of n images, or of a string of characters in n records; such a search can be conducted in parallel. In all these instances the time required to carry out the computational task using N processing elements is reduced by a factor of N. A very appealing class of applications of cloud computing is numerical simulations of complex systems that require an optimal design; in such instances multiple design alternatives must be compared and optimal ones selected based on several optimization criteria. Consider for example the design of a circuit using field programmable gate arrays (FPGAs). An FPGA is an integrated circuit designed to be configured by the customer using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC). Because multiple choices for the placement of components and for interconnecting them exist, the designer could run concurrently N versions of the design choices and choose the one with the best performance, such as minimum power consumption. Alternative optimization objectives could be to reduce cross-talk among the wires or to minimize the 2.2 Parallel Computer Architecture 25 overall noise. Each alternative configuration requires hours or maybe days of computing; hence, running them concurrently reduces the design time considerably. The list of companies that aimed to support parallel computing and ended up as casualties of this effort is long and includes names such as Ardent, Convex, Encore, Floating Point Systems, Inmos, Kendall Square Research, MasPar, nCube, Sequent, Tandem, and Thinking Machines. The difficulties of developing new programming models and the effort to design programming environments for parallel applications added to the challenges faced by all these companies. From the very beginning it was clear that parallel computing requires specialized hardware and system software. It was also clear that the interconnection fabric was critical for the performance of parallel computing systems. We now take a closer look at parallelism at different levels and the means to exploit it. 2.2 Parallel computer architecture Our discussion of parallel computer architectures starts with the recognition that parallelism at different levels can be exploited. These levels are: ¥ Bit-level parallelism. The number of bits processed per clock cycle, often called a word size, has increased gradually from 4-bit processors to 8-bit, 16-bit, 32-bit, and, since 2004, 64-bit. This has reduced the number of instructions required to process larger operands and allowed a significant performance improvement. During this evolutionary process the number of address bits has also increased, allowing instructions to reference a larger address space. ¥ Instruction-level parallelism. Today’s computers use multi-stage processing pipelines to speed up execution. Once an n-stage pipeline is full, an instruction is completed at every clock cycle. A “classic” pipeline of a Reduced Instruction Set Computing (RISC) architecture consists of five stages2: instruction fetch, instruction decode, instruction execution, memory access, and write back. A Complex Instruction Set Computing (CISC) architecture could have a much large number of pipelines stages; for example, an Intel Pentium 4 processor has a 35-stage pipeline. ¥ Data parallelism or loop parallelism. The program loops can be processed in parallel. ¥ Task parallelism. The problem can be decomposed into tasks that can be carried out concurrently. A widely used type of task parallelism is the Same Program Multiple Data (SPMD) paradigm. As the name suggests, individual processors run the same program but on different segments of the input data. Data dependencies cause different flows of control in individual tasks. In 1966 Michael Flynn proposed a classification of computer architectures based on the number of concurrent control/instruction and data streams: Single Instruction, Single Data (SISD), Single Instruction, Multiple Data (SIMD), and (Multiple Instructions, Multiple Data (MIMD).3 The SIMD architecture supports vector processing. When an SIMD instruction is issued, the oper- ations on individual vector components are carried out concurrently. For example, to add two vectors 2The number of pipeline stages in different RISC processors varies. For example, ARM7 and earlier implementations of ARM processors have a three-stage pipeline: fetch, decode, and execute. Higher performance designs, such as the ARM9, have deeper pipelines: Cortex-A8 has 13 stages. 3Another category, Multiple Instructions Single Data (MISD), is a fourth possible architecture, but it is very rarely used, mostly for fault tolerance. 26 CHAPTER 2 Parallel and Distributed Systems (a1, a2,...,a50) and (b1, b2,...,b50), all 50 pairs of vector elements are added concurrently and all the sums (ai + bi ), 1  i  50 are available at the same time. The first use of SIMD instructions was in vector supercomputers such as the CDC Star-100 and the Texas Instruments ASC in the early 1970s. Vector processing was especially popularized by Cray in the 1970s and 1980s by attached vector processors such as those produced by the FPS (Floating Point Systems), and by supercomputers such as the Thinking Machines CM-1 and CM-2. Sun Microsystems introduced SIMD integer instructions in its VIS instruction set extensions in 1995 in its UltraSPARC I microprocessor; the first widely deployed SIMD for gaming was Intel’s MMX extensions to the x86 architecture. IBM and Motorola then added AltiVec to the POWER architecture, and there have been several extensions to the SIMD instruction sets for both architectures. The desire to support real-time graphics with vectors of two, three, or four dimensions led to the development of graphic processing units (GPUs). GPUs are very efficient at manipulating computer graphics, and their highly parallel structures based on SIMD execution support parallel processing of large blocks of data. GPUs produced by Intel, Nvidia, and AMD/ATI are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. An MIMD architecture refers to a system with several processors that function asynchronously and independently; at any time, different processors may be executing different instructions on different data. The processors can share a common memory of an MIMD, and we distinguish several types of systems: Uniform Memory Access (UMA), Cache Only Memory Access (COMA), and Non-Uniform Memory Access (NUMA). An MIMD system could have a distributed memory; in this case the processors and the memory communicate with one another using an interconnection network, such as a hypercube, a 2D torus, a 3D torus, an omega network, or another network topology. Today most supercomputers are MIMD machines, and some use GPUs instead of traditional processors. Multicore processors with multiple processing units are now ubiquitous. Modern supercomputers derive their power from architecture and parallelism rather than the increase of processor speed. The supercomputers of today consist of a very large number of processors and cores communicating via very fast custom interconnects. In mid-2012 the most powerful supercomputer was a Linux-based IBM Sequoia-BlueGene/Q system powered by Power BQC 16-core processors running at 1.6 GHz. The system, installed at Lawrence Livermore National Laboratory and called Jaguar, has a total of 1,572,864 cores and 1,572,864 GB of memory, achieves a sustainable speed of 16.32 petaFLOPS, and consumes 7.89 MW of power. More recently, a Cray XK7 system called Titan, installed at the Oak Ridge National Laboratory (ORNL) in Tennessee, was coronated as the fastest supercomputer in the world. Titan has 560,640 processors, including 261,632 Nvidia K20x accelerator cores; it achieved a speed of 17.59 petaFLOPS on the Linpack benchmark. Several most powerful systems listed in the “Top 500 supercomputers” (see are powered by the Nvidia 2050 GPU; three of the top 10 use an InfiniBand 4 interconnect. The next natural step was triggered by advances in communication networks when low-latency and high-bandwidth wide area networks (WANs)allowed individual systems, many of them multiprocessors, 4InfiniBand is a switched fabric communications link used in high-performance computing and in datacenters. 2.3 Distributed Systems 27 to be geographically separated. Large-scale distributed systems were first used for scientific and engi- neering applications and took advantage of the advancements in system software, programming models, tools, and algorithms developed for parallel processing. 2.3 Distributed systems A distributed system is a collection of autonomous computers that are connected through a network and distribution software called middleware, which enables computers to coordinate their activities and to share the resources of the system. A distributed system’s users perceive the system as a single integrated computing facility. A distributed system has several characteristics: Its components are autonomous, scheduling and other resource management and security policies are implemented by each system, there are multiple points of control and multiple points of failure, and the resources may not be accessible at all times. Distributed systems can be scaled by adding additional resources and can be designed to maintain availability even at low levels of hardware/software/network reliability. Distributed systems have been around for several decades. For example, distributed file systems and network file systems have been used for user convenience and to improve reliability and functionality of file systems for many years. Modern operating systems allow a user to mount a remote file system and access it the same way a local file system is accessed, yet with a performance penalty due to larger com- munication costs. The remote procedure call (RPC) supports inter-process communication and allows a procedure on a system to invoke a procedure running in a different address space, possibly on a remote system. RPCs were introduced in the early 1970s by Bruce Nelson and used for the first time at Xerox; the Network File System (NFS) introduced in 1984 was based on Sun’s RPC. Many programming lan- guages support RPCs; for example, Java Remote Method Invocation (Java RMI) provides a functionality similar to that of UNIX RPC methods, and XML-RPC uses XML to encode HTML-based calls. The middleware should support a set of desirable properties of a distributed system: ¥ Access transparency. Local and remote information objects are accessed using identical operations. ¥ Location transparency. Information objects are accessed without knowledge of their location. ¥ Concurrency transparency. Several processes run concurrently using shared information objects without interference among them. ¥ Replication transparency. Multiple instances of information objects are used to increase reliability without the knowledge of users or applications. ¥ Failure transparency. The concealment of faults. ¥ Migration transparency. The information objects in the system are moved without affecting the operation performed on them. ¥ Performance transparency. The system can be reconfigured based on the load and quality of service requirements. ¥ Scaling transparency. The system and the applications can scale without a change in the system structure and without affecting the applications. 28 CHAPTER 2 Parallel and Distributed Systems 2.4 Global state of a process group To understand the important properties of distributed systems, we use a model, an abstraction based on two critical components: processes and communication channels. A process is a program in execution, and a thread is a lightweight process. A thread of execution is the smallest unit of processing that can be scheduled by an operating system. A process is characterized by its state; the state is the ensemble of information we need to restart a process after it was suspended. An event is a change of state of a process. The events affecting the state of process p1 are numbered sequentially as e1 i , e2 i , e3 i ,..., as shown in the space-time diagram in Figure 2.1(a). A process p1 is in state σ j i immediately after the occurrence of event e j i and remains in that state until the occurrence of the next event, e j+1 i . (a) (b) (c) e1 1 e 2 1 e3 1 e 4 1 e5 1 e1 1 e 2 1 e3 1 e 4 1 e11 1 e 6 1 e1 2 e2 2 e3 2 e 4 2 e 5 2 e1 2 e 2 2 e3 2 e 4 2 e 1 1 e 2 1 e 3 1 e 4 1 e 5 1 e1 3 e 2 3 e3 3 e 4 3 p2 p3 p1 p 1 p1 p2 FIGURE 2.1 Space-time diagrams display local and communication events during a process lifetime. Local events are small black circles. Communication events in different processes are connected by lines originating at a send event and terminated by an arrow at the receive event. (a) All events in the case of a single process p1 are local; the process is in state σ1 immediately after the occurrence of event e1 1 and remains in that state until the occurrence of event e2 1. (b) Two processes p1 and p2;evente2 1 is a communication event, p1 sends a message to p2;evente3 2 is a communication event, process p2 receives the message sent by p1. (c) Three processes interact by means of communication events. 2.4 Global State of a Process Group 29 A process group is a collection of cooperating processes; these processes work in concert and communicate with one another to reach a common goal. For example, a parallel algorithm to solve a system of partial differential equations (PDEs) over a domain D may partition the data in several segments and assign each segment to one of the members of the process group. The processes in the group must cooperate with one another and iterate until the common boundary values computed by one process agree with the common boundary values computed by another. A communication channel provides the means for processes or threads to communicate with one another and coordinate their actions by exchanging messages. Without loss of generality, we assume that communication among processes is done only by means of send (m) and receive (m) communication events, where m is a message. We use the term message for a structured unit of information, which can be interpreted only in a semantic context by the sender and the receiver. The state of a communication channel is defined as follows: Given two processes pi and p j , the state of the channel, ξi, j ,frompi to p j consists of messages sent by pi but not yet received by p j . These two abstractions allow us to concentrate on critical properties of distributed systems without the need to discuss the detailed physical properties of the entities involved. The model presented is based on the assumption that a channel is a unidirectional bit pipe of infinite bandwidth and zero latency, but unreliable; messages sent through a channel may be lost or distorted or the channel may fail, losing its ability to deliver messages. We also assume that the time a process needs to traverse a set of states is of no concern and that processes may fail or be aborted. A protocol is a finite set of messages exchanged among processes to help them coordinate their actions. Figure 2.1(c) illustrates the case when communication events are dominant in the local history of processes, p1, p2, and p3. In this case only e5 1 is a local event; all others are communication events. The particular protocol illustrated in Figure 2.1(c) requires processes p2 and p3 to send messages to the other processes in response to a message from process p1. The informal definition of the state of a single process can be extended to collections of communicat- ing processes. Theglobal state of a distributed systemconsisting of several processes and communication channels is the union of the states of the individual processes and channels [34]. Call h j i the history of process pi up to and including its j-th event, e j i , and call σ j i the local state of process pi following event e j i . Consider a system consisting of n processes, p1, p2,...,pi ,...,pn with σ ji i the local state of process pi ; then the global state of the system is an n-tuple of local states ( j1, j2,..., jn) =  σ j1 1 ,σj2 2 ,...,σji i ,...,σjnn  . (2.10) The state of the channels does not appear explicitly in this definition of the global state because the state of the channels is encoded as part of the local state of the processes communicating through the channels. The global states of a distributed computation with n processes form an n-dimensional lattice. The elements of this lattice are global states ( j1, j2,..., jn)  σ j1 1 ,σj2 2 ,...,σjnn  . Figure 2.2(a) shows the lattice of global states of the distributed computation in Figure 2.2(b) This is a two-dimensional lattice because we have two processes, p1 and p2. The lattice of global states for the distributed computation in Figure 2.1(c) is a three-dimensional lattice, and the computation consists of three concurrent processes, p1, p2, and p3. 30 CHAPTER 2 Parallel and Distributed Systems Σ0,0 Σ 0,1Σ1,0 Σ1,1Σ 0,2 Σ0,3 Σ2,0 Σ1,2 Σ2,1 Σ1,3 Σ2,2 Σ1,4 Σ2,3 Σ3,2 Σ1,5 Σ2,4 Σ3,3 Σ4,2 Σ2,5 Σ3,4 Σ4,3 Σ5,2 Σ5,3 Σ4,4 Σ5,3 Σ4,5 Σ5,4 Σ5,5 Σ5,6 e1 1 e2 1 p1 p2 e1 2 e2 2 e1 1 e2 1 p1 p2 e1 2 e2 2 e1 1 e2 1 p1 p2 e1 2 e2 2 e1 1 e2 1 p1 p2 e1 2 e2 2 e1 1e2 1 p1 p2 e1 2 e2 2 e1 1 e2 1 p1 p2 e1 2 e2 2 (b)(a) time FIGURE 2.2 (a) The lattice of the global states of two processes with the space-time diagrams in Figure 2.2(b). Only the first two events for each thread are shown in Figure 2.2(b). (b) The six possible sequences of events leading to the state (2,2). The initial state of the system in Figure 2.2(b) is the state before the occurrence of any event and it is denoted by (0,0); the only global states reachable from (0,0) are (1,0) and (0,1). The communication events limit the global states the system may reach; in this example the system cannot reach the state (4,0) because process p1 enters state σ4 only after process p2 has entered the state σ1. Figure 2.2(b) shows the six possible sequences of events to reach the global state (2,2):  e1 1, e2 1, e1 2, e2 2  ,  e1 1, e1 2, e2 1, e2 2  ,  e1 1, e1 2, e2 2, e2 1  ,  e1 2, e2 2, e1 1, e2 1  ,  e1 2, e1 1, e2 1, e2 2  ,  e1 2, e1 1, e2 2, e2 1  . (2.11) An interesting question is how many paths does it take to reach a global state. The more paths exist, the harder it is to identify the events leading to a state when we observe an undesirable behavior of the system. A large number of paths increases the difficulty of debugging the system. We conjecture that in the case of two threads in Figure 2.2(b) the number of paths from the global state (0,0) to (m,n) is N (m,n) p = (m + n)! m!n! . (2.12) 2.4 Global State of a Process Group 31 We have already seen that there are six paths leading to state (2,2); indeed, N (2,2) p = (2 + 2)! 2!2! = 24 4 = 6. (2.13) To prove Equation 2.12 we use a method resembling induction; we notice first that the global state (1,1) can only be reached from the states (1,0) and (0,1) and that N (1,1) p = (2)!/1!1!=2. Thus, the formula is true for m = n = 1. Then we show that if the formula is true for the (m − 1, n − 1) case it will also be true for the (m, n) case. If our conjecture is true, then N [(m−1),n] p = [(m − 1) + n]! (m − 1)!n! . (2.14) and N [m,(n−1)] p = [m + (n − 1)]! m!(n − 1)! . (2.15) We observe that the global state (m,n), ∀(m, n)  1 can only be reached from two states, (m−1,n) and (m,n−1) (see Figure 2.3), thus: N (m,n) p = N (m−1,n) p + N (m,n−1) p . (2.16) It is easy to see that indeed, [(m − 1) + n]! (m − 1)!n! + [m + (n − 1)]! m!(n − 1)! = (m + n − 1)!  1 (m − 1)!n! + 1 m!(n − 1)!  = (m + n)! m!n! . (2.17) This shows that our conjecture is true; thus, Equation 2.12 gives the number of paths to reach the global state (m,n) from (0,0) when two threads are involved. This expression can be generalized for the case of q threads; using the same strategy, it is easy to see that the number of path from the state (0,0,...,0) to the global state (n1,n2,...,nq ) is N (n1,n2,...,nq ) p = (n1 + n2 +···+nq)! n1!n2! ...nq! . (2.18) (m,n) (m-1,n) (m,n-1) FIGURE 2.3 In the two-dimensional case, the global state (m,n), ∀(m, n)  1 can only be reached from two states, (m−1,n) and (m,n−1). 32 CHAPTER 2 Parallel and Distributed Systems Indeed, it is easy to see that N (n1,n2,...,nq ) p = N (n1−1,n2,...,nq ) p + N (n1,n2−1,...,nq ) p +···+N (n1,n2,...,nq −1) p . (2.19) Equation 2.18 gives us an indication of how difficult it is to debug a system with a large number of concurrent threads. Many problems in distributed systems are instances of the global predicate evaluation problem (GPE), where the goal is to evaluate a Boolean expression whose elements are functions of the global state of the system. 2.5 Communication protocols and process coordination A major concern in any parallel and distributed system is communication in the presence of channel failures. There are multiple modes for a channel to fail, and some lead to messages being lost. In the general case, it is impossible to guarantee that two processes will reach an agreement in case of channel failures (see Figure 2.4.) Given two processes p1 and p2 connected by a communication channel that can lose a message with probability >0, no protocol capable of guaranteeing that two processes will reach agreement exists, regardless of how small the probability  is. The proof of this statement is by contradiction. Assume that such a protocol exists and it consists of n messages; recall that a protocol is a finite sequence of messages. Since any message might be lost with probability , the protocol should be able to function when only n − 1 messages reach their destination, the last one being lost. Induction on the number of messages proves that indeed no such protocol exists; indeed, the same reasoning leads us to conclude that the protocol should function correctly with (n −2) messages, and so on. In practice, error detection and error correction allow processes to communicate reliably though noisy digital channels. The redundancy of a message is increased by more bits and packaging a message as a code word; the recipient of the message is then able to decide if the sequence of bits received is a valid code word and, if the code satisfies some distance properties, then the recipient of the message is able to extract the original message from a bit string in error. Process p1 Process p2 12 n-1 n FIGURE 2.4 Process coordination in the presence of errors; each message may be lost with probability . If a protocol consisting of n messages exists, then the protocol should be able to function properly with n − 1 messages reaching their destination, one of them being lost. 2.5 Communication Protocols and Process Coordination 33 Communication protocols implement not only error control mechanisms, but also flow control and congestion control. Flow control provides feedback from the receiver; it forces the sender to transmit only the amount of data the receiver is able to buffer and then process. Congestion control ensures that the offered load of the network does not exceed the network capacity. In store-and-forward networks, individual routers may drop packets when the network is congested and the sender is forced to retransmit. Based on the estimation of the round-trip-time (RTT), the sender can detect congestion and reduce the transmission rate. The implementation of these mechanisms requires the measurement oftime intervals, the time elapsed between two events; we also need a global concept of time shared by all entities that cooperate with one another. For example, a computer chip has an internal clock, and a predefined set of actions occurs at each clock tick. Each chip has an interval timer that helps enhance the system’s fault tolerance; when the effects of an action are not sensed after a predefined interval, the action is repeated. When the entities communicating with each other are networked computers, the precision of the clock synchronization is critical [205]. The event rates are very high and each system goes through state changes at a very fast pace; modern processors run at a 2Ð4 GHz clock rate. That explains why we need to measure time very accurately; indeed, we have atomic clocks with an accuracy of about 10−6 seconds per year. An isolated system can be characterized by its history, expressed as a sequence of events, each event corresponding to a change of the state of the system. Local timers provide relative time measurements. A more accurate description adds to the system’s history the time of occurrence of each event as measured by the local timer. Messages sent by processes may be lost or distorted during transmission. Without additional restric- tions regarding message delays and errors, there are no means to ensure a perfect synchronization of local clocks and there are no obvious methods to ensure a global ordering of events occurring in dif- ferent processes. Determining the global state of a large-scale distributed system is a very challenging problem. The mechanisms described here are insufficient once we approach the problem of cooperating entities. To coordinate their actions, two entities need a common perception of time. Timers are not enough. Clocks provide the only way to measure distributed duration, that is, actions that start in one process and terminate in another. Global agreement on time is necessary to trigger actions that should occur concurrently (e.g., in a real-time control system of a power plant, several circuits must be switched on at the same time). Agreement on the time when events occur is necessary for distributed recording of events Ð for example, to determine a precedence relation through a temporal ordering of events. To ensure that a system functions correctly, we need to determine that the event causing a change of state occurred before the state change Ð for instance, the sensor triggering an alarm has to change its value before the emergency procedure to handle the event is activated. Another example of the need for agreement on the time of occurrence of events is in replicated actions. In this case several replicas of a process must log the time of an event in a consistent manner. Time stamps are often used for event ordering using a global time base constructed on local virtual clocks [235]. The -protocols [94] achieve total temporal order using a global time base. Assume that local virtual clock readings do not differ by more than π, called precision of the global time base. Call g the granularity of physical clocks. First, observe that the granularity should not be smaller than the precision; given two events a and b occurring in different processes, if tb − ta  π + g we cannot tell 34 CHAPTER 2 Parallel and Distributed Systems which one occurred first [361]. Based on these observations, it follows that the order discrimination of clock-driven protocols cannot be better than twice the clock granularity. System specification, design, and analysis require a clear understanding of cause-effect relationships. During the system specification phase we view the system as a state machine and define the actions that cause transitions from one state to another. During the system analysis phase we need to determine the cause that brought the system to a certain state. The activity of any process is modeled as a sequence of events; hence, the binary relation cause-effect relationship should be expressed in terms of events and should express our intuition that the cause must precede the effects. Again, we need to distinguish between local events and communication events. The latter events affect more than one process and are essential for constructing a global history of an ensemble of processes. Let hi denote the local history of process pi and let ek i denote the k-th event in this history. The binary cause-effect relationship between two events has the following properties: 1. Causality of local events can be derived from the process history: if ek i , el i ∈ hi and k < l then ek i → el i . (2.20) 2. Causality of communication events: if ek i = send(m) and el j = receive(m) then ek i → el j . (2.21) 3. Transitivity of the causal relationship: if ek i → el j and el j → en m then ek i → en m. (2.22) Two events in the global history may be unrelated. If so, neither one is the cause of the other; such events are said to be concurrent events. 2.6 Logical clocks A logical clock (LC) is an abstraction necessary to ensure the clock condition in the absence of a global clock. Each process pi maps events to positive integers. Call LC(e) the local variable associated with event e. Each process time stamps each message m sent with the value of the logical clock at the time of sending, TS(m) = LC(send(m)). The rules to update the logical clock are specified by the following relationship: LC(e) =  LC + 1ife is a local event or a send(m) event max (LC, TS(m) + 1) if e = receive(m). (2.23) The concept of logical clocks is illustrated in Figure 2.5 using a modified space-time diagram in which the events are labeled with the logical clock value. Messages exchanged between processes are shown as lines from the sender to the receiver; the communication events corresponding to sending and receiving messages are marked on these diagrams. Each process labels local events and sends events sequentially until it receives a message marked with a logical clock value larger than the next local logical clock value, as shown in Equation 2.23. 2.7 Message Delivery Rules; Causal Delivery 35 m2m1 m3 p1 p2 p3 m4 m5 1 1 1 2 2 2 3 3 45 6789 10 11 12 FIGURE 2.5 Three processes and their logical clocks. The usual labeling of events as e1 1, e2 1, e3 1,... is omitted to avoid overloading the figure; only the logical clock values for the local and communication events are marked. The correspondence between the events and the logical clock values is obvious: e1 1, e1 2, e1 3 → 1, e5 1 → 5, e4 2 → 7, e4 3 → 10, e6 1 → 12, and so on. Global ordering of all events is not possible; there is no way to establish the ordering of events e1 1, e1 2 and e1 3. It follows that logical clocks do not allow a global ordering of all events. For example, there is no way to establish the ordering of events e1 1, e1 2, and e1 3 in Figure 2.5. Nevertheless, communication events allow different processes to coordinate their logical clocks; for example, process p2 labels the event e3 2 as 6 because of message m2, which carries the information about the logical clock value as 5 at the time message m2 was sent. Recall that e j i is the j-th event in process pi . Logical clocks lack an important property, gap detection; given two events e and e and their logical clock values, LC(e) and LC(e), it is impossible to establish if an event e exists such that LC(e)