Architecture Guide March 17, 2015 current i OpenStack Architecture Design Guide current (2015-03-17) Copyright © 2014, 2015 OpenStack Foundation Some rights reserved. To reap the benefits of OpenStack, you should plan, design, and architect your cloud properly, taking user's needs into account and understanding the use cases. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Except where otherwise noted, this document is licensed under Creative Commons Attribution ShareAlike 3.0 License. http://creativecommons.org/licenses/by-sa/3.0/legalcode Architecture Guide March 17, 2015 current iii Table of Contents Preface .................................................................................................. v Conventions ................................................................................... v Document change history .............................................................. v 1. Introduction ....................................................................................... 1 Intended audience ......................................................................... 1 How this book is organized ........................................................... 2 Why and how we wrote this book ................................................. 3 Methodology ................................................................................. 4 2. General purpose .............................................................................. 11 User requirements ........................................................................ 12 Technical considerations ............................................................... 16 Operational considerations ........................................................... 31 Architecture ................................................................................. 34 Prescriptive example .................................................................... 47 3. Compute focused ............................................................................. 51 User requirements ........................................................................ 52 Technical considerations ............................................................... 54 Operational considerations ........................................................... 64 Architecture ................................................................................. 66 Prescriptive examples ................................................................... 77 4. Storage focused ............................................................................... 81 User requirements ........................................................................ 82 Technical considerations ............................................................... 84 Operational considerations ........................................................... 85 Architecture ................................................................................. 91 Prescriptive examples ................................................................. 102 5. Network focused ............................................................................ 107 User requirements ...................................................................... 110 Technical considerations ............................................................. 113 Operational considerations ......................................................... 121 Architecture ............................................................................... 122 Prescriptive examples ................................................................. 127 6. Multi-site ........................................................................................ 133 User requirements ...................................................................... 133 Technical considerations ............................................................. 138 Operational considerations ......................................................... 142 Architecture ............................................................................... 145 Prescriptive examples ................................................................. 148 7. Hybrid ............................................................................................ 155 User requirements ...................................................................... 156 Technical considerations ............................................................. 162 Architecture Guide March 17, 2015 current iv Operational considerations ......................................................... 168 Architecture ............................................................................... 170 Prescriptive examples ................................................................. 174 8. Massively scalable ........................................................................... 179 User requirements ...................................................................... 180 Technical considerations ............................................................. 183 Operational considerations ......................................................... 186 9. Specialized cases ............................................................................ 189 Multi-hypervisor example ........................................................... 190 Specialized networking example ................................................. 192 Software-defined networking ..................................................... 192 Desktop-as-a-Service ................................................................... 195 OpenStack on OpenStack ........................................................... 197 Specialized hardware ................................................................. 199 10. References ................................................................................... 201 A. Community support ....................................................................... 203 Documentation .......................................................................... 203 ask.openstack.org ...................................................................... 205 OpenStack mailing lists ............................................................... 205 The OpenStack wiki ................................................................... 205 The Launchpad Bugs area .......................................................... 205 The OpenStack IRC channel ........................................................ 207 Documentation feedback ........................................................... 207 OpenStack distribution packages ................................................ 207 Glossary ............................................................................................. 209 Architecture Guide March 17, 2015 current v Preface Conventions The OpenStack documentation uses several typesetting conventions. Notices Notices take these forms: Note A handy tip or reminder. Important Something you must be aware of before proceeding. Warning Critical information about the risk of data loss or security is- sues. Command prompts $ prompt Any user, including the root user, can run commands that are prefixed with the $ prompt. # prompt The root user must run commands that are prefixed with the # prompt. You can also prefix these commands with the sudo command, if available, to run them. Document change history This version of the guide replaces and obsoletes all earlier versions. The following table describes the most recent changes: Revision Date Summary of Changes October 15, 2014 • Incorporate edits to follow OpenStack style. July 21, 2014 • Initial release. Architecture Guide March 17, 2015 current 1 1. Introduction Table of Contents Intended audience ................................................................................. 1 How this book is organized ................................................................... 2 Why and how we wrote this book ......................................................... 3 Methodology ......................................................................................... 4 OpenStack is a leader in the cloud technology gold rush, as organizations of all stripes discover the increased flexibility and speed to market that self- service cloud and Infrastructure-as-a-Service (IaaS) provides. However, in order to reap those benefits, the cloud must be designed and architected properly. A well-architected cloud provides a stable IT environment that offers easy access to needed resources, usage-based expenses, extra capacity on de- mand, disaster recovery, and a secure environment. A well-architected cloud does not magically build itself. It requires careful consideration of a multitude of factors both technical and non-technical. There is no single architecture that is "right" for an OpenStack cloud de- ployment. OpenStack can be used for any number of different purposes, each with its own particular requirements and architectural peculiarities. This book is designed to examine some of the most common uses for OpenStack clouds (and some less common uses) and to provide knowledge and advice to help explain the issues that require consideration. These ex- amples, coupled with a wealth of knowledge and advice will help an or- ganization design and build a well-architected OpenStack cloud to fit its unique requirements. Intended audience This book has been written for architects and designers of OpenStack clouds. This book is not intended for people who are deploying Open- Stack. For a guide on deploying and operating OpenStack, please refer to the OpenStack Operations Guide (http://docs.openstack.org/open- stack-ops). Architecture Guide March 17, 2015 current 2 The reader should have prior knowledge of cloud architecture and princi- ples, experience in enterprise system design, Linux and virtualization expe- rience, and a basic understanding of networking principles and protocols. How this book is organized This book has been organized into various chapters that help define the use cases associated with making architectural choices related to an Open- Stack cloud installation. Each chapter is intended to stand alone to encour- age individual chapter readability, however each chapter is designed to contain useful information that may be applicable in situations covered by other chapters. Cloud architects may use this book as a comprehensive guide by reading all of the use cases, but it is also possible to review only the chapters which pertain to a specific use case. When choosing to read specific use cases, note that it may be necessary to read more than one sec- tion of the guide to formulate a complete design for the cloud. The use cases covered in this guide include: • General purpose: A cloud built with common components that should address 80% of common use cases. • Compute focused: A cloud designed to address compute intensive work- loads such as high performance computing (HPC). • Storage focused: A cloud focused on storage intensive workloads such as data analytics with parallel file systems. • Network focused: A cloud depending on high performance and reliable networking, such as a content delivery network (CDN). • Multi-site: A cloud built with multiple sites available for application de- ployments for geographical, reliability or data locality reasons. • Hybrid cloud: An architecture where multiple disparate clouds are con- nected either for failover, hybrid cloud bursting, or availability. • Massively scalable: An architecture that is intended for cloud service providers or other extremely large installations. A chapter titled Specialized cases provides information on architectures that have not previously been covered in the defined use cases. Each chapter in the guide is then further broken down into the following sections: Architecture Guide March 17, 2015 current 3 • Introduction: Provides an overview of the architectural use case. • User requirements: Defines the set of user considerations that typically come into play for that use case. • Technical considerations: Covers the technical issues that must be ac- counted when dealing with this use case. • Operational considerations: Covers the ongoing operational tasks associ- ated with this use case and architecture. • Architecture: Covers the overall architecture associated with the use case. • Prescriptive examples: Presents one or more scenarios where this archi- tecture could be deployed. A glossary covers the terms used in the book. Why and how we wrote this book The velocity at which OpenStack environments are moving from proof- of-concepts to production deployments is leading to increasing questions and issues related to architecture design considerations. By and large these considerations are not addressed in the existing documentation, which typically focuses on the specifics of deployment and configuration options or operational considerations, rather than the bigger picture. We wrote this book to guide readers in designing an OpenStack architec- ture that meets the needs of their organization. This guide concentrates on identifying important design considerations for common cloud use cas- es and provides examples based on these design guidelines. This guide does not aim to provide explicit instructions for installing and configuring the cloud, but rather focuses on design principles as they relate to user re- quirements as well as technical and operational considerations. For spe- cific guidance with installation and configuration there are a number of resources already available in the OpenStack documentation that help in that area. This book was written in a book sprint format, which is a facilitated, rapid development production method for books. For more information, see the Book Sprints website (www.booksprints.net). This book was written in five days during July 2014 while exhausting the M&M, Mountain Dew and healthy options supply, complete with juggling Architecture Guide March 17, 2015 current 4 entertainment during lunches at VMware's headquarters in Palo Alto. The event was also documented on Twitter using the #OpenStackDesign hash- tag. The Book Sprint was facilitated by Faith Bosworth and Adam Hyde. We would like to thank VMware for their generous hospitality, as well as our employers, Cisco, Cloudscaling, Comcast, EMC, Mirantis, Rackspace, Red Hat, Verizon, and VMware, for enabling us to contribute our time. We would especially like to thank Anne Gentle and Kenneth Hui for all of their shepherding and organization in making this happen. The author team includes: • Kenneth Hui (EMC) @hui_kenneth • Alexandra Settle (Rackspace) @dewsday • Anthony Veiga (Comcast) @daaelar • Beth Cohen (Verizon) @bfcohen • Kevin Jackson (Rackspace) @itarchitectkev • Maish Saidel-Keesing (Cisco) @maishsk • Nick Chase (Mirantis) @NickChase • Scott Lowe (VMware) @scott_lowe • Sean Collins (Comcast) @sc68cal • Sean Winn (Cloudscaling) @seanmwinn • Sebastian Gutierrez (Red Hat) @gutseb • Stephen Gordon (Red Hat) @xsgordon • Vinny Valdez (Red Hat) @VinnyValdez Methodology The magic of the cloud is that it can do anything. It is both robust and flex- ible, the best of both worlds. Yes, the cloud is highly flexible and it can do almost anything, but to get the most out of a cloud investment, it is impor- tant to define how the cloud will be used by creating and testing use cases. Architecture Guide March 17, 2015 current 5 This is the chapter that describes the thought process behind how to de- sign a cloud architecture that best suits the intended use. The diagram shows at a very abstract level the process for capturing re- quirements and building use cases. Once a set of use cases has been de- fined, it can then be used to design the cloud architecture. Use case planning can seem counter-intuitive. After all, it takes about five minutes to sign up for a server with Amazon. Amazon does not know in advance what any given user is planning on doing with it, right? Wrong. Amazon's product management department spends plenty of time figur- ing out exactly what would be attractive to their typical customer and hon- ing the service to deliver it. For the enterprise, the planning process is no different, but instead of planning for an external paying customer, for ex- ample, the use could be for internal application developers or a web por- tal. The following is a list of the high level objectives that need to be incor- porated into the thinking about creating a use case. Overall business objectives • Develop clear definition of business goals and requirements • Increase project support and engagement with business, customers and end users. Technology Architecture Guide March 17, 2015 current 6 • Coordinate the OpenStack architecture across the project and leverage OpenStack community efforts more effectively. • Architect for automation as much as possible to speed development and deployment. • Use the appropriate tools for the development effort. • Create better and more test metrics and test harnesses to support con- tinuous and integrated development, test processes and automation. Organization • Better messaging of management support of team efforts • Develop better cultural understanding of Open Source, cloud architec- tures, Agile methodologies, continuous development, test and integra- tion, overall development concepts in general As an example of how this works, consider a business goal of using the cloud for the company's E-commerce website. This goal means planning for applications that will support thousands of sessions per second, vari- able workloads, and lots of complex and changing data. By identifying the key metrics, such as number of concurrent transactions per second, size of database, and so on, it is possible to then build a method for testing the assumptions. Develop functional user scenarios. Develop functional user scenarios that can be used to develop test cases that can be used to measure over- all project trajectory. If the organization is not ready to commit to an ap- plication or applications that can be used to develop user requirements, it needs to create requirements to build valid test harnesses and develop us- able metrics. Once the metrics are established, as requirements change, it is easier to respond to the changes quickly without having to worry overly much about setting the exact requirements in advance. Think of this as cre- ating ways to configure the system, rather than redesigning it every time there is a requirements change. Limit cloud feature set. Create requirements that address the pain points, but do not recreate the entire OpenStack tool suite. The require- ment to build OpenStack, only better, is self-defeating. It is important to limit scope creep by concentrating on developing a platform that will ad- dress tool limitations for the requirements, but not recreating the entire suite of tools. Work with technical product owners to establish critical fea- tures that are needed for a successful cloud deployment. Architecture Guide March 17, 2015 current 7 Application cloud readiness Although the cloud is designed to make things easier, it is important to re- alize that "using cloud" is more than just firing up an instance and drop- ping an application on it. The "lift and shift" approach works in certain sit- uations, but there is a fundamental difference between clouds and tradi- tional bare-metal-based environments, or even traditional virtualized envi- ronments. In traditional environments, with traditional enterprise applications, the applications and the servers that run on them are "pets". They're lovingly crafted and cared for, the servers have names like Gandalf or Tardis, and if they get sick, someone nurses them back to health. All of this is designed so that the application does not experience an outage. In cloud environments, on the other hand, servers are more like cattle. There are thousands of them, they get names like NY-1138-Q, and if they get sick, they get put down and a sysadmin installs another one. Tradition- al applications that are unprepared for this kind of environment, naturally will suffer outages, lost data, or worse. There are other reasons to design applications with cloud in mind. Some are defensive, such as the fact that applications cannot be certain of exact- ly where or on what hardware they will be launched, they need to be flex- ible, or at least adaptable. Others are proactive. For example, one of the advantages of using the cloud is scalability, so applications need to be de- signed in such a way that they can take advantage of those and other op- portunities. Determining whether an application is cloud- ready There are several factors to take into consideration when looking at whether an application is a good fit for the cloud. Structure A large, monolithic, single-tiered lega- cy application typically isn't a good fit for the cloud. Efficiencies are gained when load can be spread over several instances, so that a failure in one part of the system can be mitigated with- out affecting other parts of the system, or so that scaling can take place where the app needs it. Architecture Guide March 17, 2015 current 8 Dependencies Applications that depend on specific hardware—such as a particular chip set or an external device such as a finger- print reader—might not be a good fit for the cloud, unless those dependen- cies are specifically addressed. Similarly, if an application depends on an oper- ating system or set of libraries that can- not be used in the cloud, or cannot be virtualized, that is a problem. Connectivity Self-contained applications or those that depend on resources that are not reachable by the cloud in question, will not run. In some situations, work around these issues with custom net- work setup, but how well this works depends on the chosen cloud environ- ment. Durability and resilience Despite the existence of SLAs, things break: servers go down, network con- nections are disrupted, or too many tenants on a server make a server un- usable. An application must be sturdy enough to contend with these issues. Designing for the cloud Here are some guidelines to keep in mind when designing an application for the cloud: • Be a pessimist: Assume everything fails and design backwards. Love your chaos monkey. • Put your eggs in multiple baskets: Leverage multiple providers, geo- graphic regions and availability zones to accommodate for local avail- ability issues. Design for portability. • Think efficiency: Inefficient designs will not scale. Efficient designs be- come cheaper as they scale. Kill off unneeded components or capacity. • Be paranoid: Design for defense in depth and zero tolerance by building in security at every level and between every component. Trust no one. Architecture Guide March 17, 2015 current 9 • But not too paranoid: Not every application needs the platinum solu- tion. Architect for different SLA's, service tiers and security levels. • Manage the data: Data is usually the most inflexible and complex area of a cloud and cloud integration architecture. Don't short change the ef- fort in analyzing and addressing data needs. • Hands off: Leverage automation to increase consistency and quality and reduce response times. • Divide and conquer: Pursue partitioning and parallel layering wherever possible. Make components as small and portable as possible. Use load balancing between layers. • Think elasticity: Increasing resources should result in a proportional in- crease in performance and scalability. Decreasing resources should have the opposite effect. • Be dynamic: Enable dynamic configuration changes such as auto scaling, failure recovery and resource discovery to adapt to changing environ- ments, faults and workload volumes. • Stay close: Reduce latency by moving highly interactive components and data near each other. • Keep it loose: Loose coupling, service interfaces, separation of concerns, abstraction and well defined API's deliver flexibility. • Be cost aware: Autoscaling, data transmission, virtual software licens- es, reserved instances, and so on can rapidly increase monthly usage charges. Monitor usage closely. Architecture Guide March 17, 2015 current 11 2. General purpose Table of Contents User requirements ................................................................................ 12 Technical considerations ....................................................................... 16 Operational considerations .................................................................. 31 Architecture ......................................................................................... 34 Prescriptive example ............................................................................ 47 An OpenStack general purpose cloud is often considered a starting point for building a cloud deployment. They are designed to balance the com- ponents and do not emphasize any particular aspect of the overall com- puting environment. Cloud design must give equal weight to the compute, network, and storage components. General purpose clouds are found in private, public, and hybrid environments, lending themselves to many dif- ferent use cases. Note General purpose clouds are homogeneous deployments and are not suited to specialized environments or edge case situa- tions. Common uses of a general purpose cloud include: • Providing a simple database • A web application runtime environment • A shared application development platform • Lab test bed Use cases that benefit from scale-out rather than scale-up approaches are good candidates for general purpose cloud architecture. A general purpose cloud is designed to have a range of potential uses or functions; not specialized for specific use cases. General purpose architec- ture is designed to address 80% of potential use cases available. The in- frastructure, in itself, is a specific use case, enabling it to be used as a base model for the design process. General purpose clouds are designed to be platforms that are suited for general purpose applications. Architecture Guide March 17, 2015 current 12 General purpose clouds are limited to the most basic components, but they can include additional resources such as: • Virtual-machine disk image library • Raw block storage • File or object storage • Firewalls • Load balancers • IP addresses • Network overlays or virtual local area networks (VLANs) • Software bundles User requirements When building a general purpose cloud, you should follow the Infrastruc- ture-as-a-Service (IaaS) model; a platform best suited for use cases with simple requirements. General purpose cloud user requirements are not complex. However, it is important to capture them even if the project has minimum business and technical requirements, such as a proof of concept (PoC), or a small lab platform. Note The following user considerations are written from the per- spective of the cloud builder, not from the perspective of the end user. Cost Financial factors are a primary concern for any organization. Cost is an important criterion as general purpose clouds are considered the baseline from which all other cloud architec- ture environments derive. General purpose clouds do not always provide the most cost- effective environment for specialized appli- cations or situations. Unless razor-thin mar- gins and costs have been mandated as a criti- cal factor, cost should not be the sole consid- Architecture Guide March 17, 2015 current 13 eration when choosing or designing a general purpose architecture. Time to market The ability to deliver services or products with- in a flexible time frame is a common business factor when building a general purpose cloud. In today's high-speed business world, the abil- ity to deliver a product in six months instead of two years is a driving force behind the de- cision to build general purpose clouds. Gener- al purpose clouds allow users to self-provision and gain access to compute, network, and storage resources on-demand thus decreasing time to market. Revenue opportunity Revenue opportunities for a cloud will vary greatly based on the intended use case of that particular cloud. Some general purpose clouds are built for commercial customer fac- ing products, but there are alternatives that might make the general purpose cloud the right choice. For example, a small cloud service provider (CSP) might want to build a general purpose cloud rather than a massively scalable cloud because they do not have the deep fi- nancial resources needed, or because they do not, or will not, know in advance the purposes for which their customers are going to use the cloud. For some users, the advantages cloud itself offers mean an enhancement of revenue opportunity. For others, the fact that a gener- al purpose cloud provides only baseline func- tionality will be a disincentive for use, leading to a potential stagnation of potential revenue opportunities. Legal requirements Many jurisdictions have legislative and regulatory requirements governing the storage and management of data in cloud environments. Common ar- eas of regulation include: • Data retention policies ensuring storage of persistent data and records management to meet data archival requirements. Architecture Guide March 17, 2015 current 14 • Data ownership policies governing the possession and responsibility for data. • Data sovereignty policies governing the storage of data in foreign coun- tries or otherwise separate jurisdictions. • Data compliance policies governing certain types of information needing to reside in certain locations due to regulatory issues - and more impor- tantly, cannot reside in other locations for the same reason. Examples of such legal frameworks include the data protection framework of the European Union and the requirements of the Financial Industry Reg- ulatory Authority in the United States. Consult a local regulatory body for more information. Technical requirements Technical cloud architecture requirements should be weighted against the business requirements. Performance As a baseline product, general purpose clouds do not provide optimized per- formance for any particular function. While a general purpose cloud should provide enough performance to satis- fy average user considerations, perfor- mance is not a general purpose cloud customer driver. No predefined usage model The lack of a pre-defined usage mod- el enables the user to run a wide vari- ety of applications without having to know the application requirements in advance. This provides a degree of in- dependence and flexibility that no oth- er cloud scenarios are able to provide. On-demand and self-service ap- plication By definition, a cloud provides end users with the ability to self-provision computing power, storage, networks, and software in a simple and flexible way. The user must be able to scale their resources up to a substantial level without disrupting the underlying host Architecture Guide March 17, 2015 current 15 operations. One of the benefits of us- ing a general purpose cloud architec- ture is the ability to start with limited resources and increase them over time as the user demand grows. Public cloud For a company interested in building a commercial public cloud offering based on OpenStack, the general purpose architecture model might be the best choice. Designers are not always going to know the purposes or workloads for which the end users will use the cloud. Internal consumption (private) cloud Organizations need to determine if it is logical to create their own clouds in- ternally. Using a private cloud, organi- zations are able to maintain complete control over architectural and cloud components. Note Users will want to combine using the internal cloud with access to an external cloud. If that case is likely, it might be worth exploring the possibility of taking a multi-cloud approach with regard to at least some of the architectural elements. Designs that incorporate the use of multiple clouds, such as a private cloud and a public cloud offering, are de- scribed in the "Multi-Cloud" scenario, see Chapter 6, “Multi-site” . Security Security should be implemented ac- cording to asset, threat, and vulnerabil- ity risk assessment matrices. For cloud domains that require increased com- puter security, network security, or in- formation security, a general purpose Architecture Guide March 17, 2015 current 16 cloud is not considered an appropriate choice. Technical considerations General purpose clouds are often expected to include these base services: • Compute • Network • Storage Each of these services has different resource requirements. As a result, you must make design decisions relating directly to the service, as well as pro- vide a balanced infrastructure for all services. Consider the unique aspects of each service that requires design since indi- vidual characteristics and service mass can impact the hardware selection process. Hardware designs are generated for each type of the following re- source pools: • Compute • Network • Storage Hardware decisions are also made in relation to network architecture and facilities planning. These factors play heavily into the overall architecture of an OpenStack cloud. Designing compute resources When designing compute resource pools, a number of factors can impact your design decisions. For example, decisions related to processors, mem- ory, and storage within each hypervisor are just one element of designing compute resources. In addition, decide whether to provide compute re- sources in a single pool or in multiple pools. We recommend the compute design allocates multiple pools of resources to be addressed on-demand. A compute design that allocates multiple pools of resources makes best use of application resources running in the cloud. Each independent re- source pool should be designed to provide service for specific flavors of in- stances or groupings of flavors. Designing multiple resource pools helps to Architecture Guide March 17, 2015 current 17 ensure that, as instances are scheduled onto compute hypervisors, each in- dependent node's resources will be allocated to make the most efficient use of available hardware. This is commonly referred to as bin packing. Using a consistent hardware design among the nodes that are placed with- in a resource pool also helps support bin packing. Hardware nodes select- ed for being a part of a compute resource pool should share a common processor, memory, and storage layout. By choosing a common hardware design, it becomes easier to deploy, support and maintain those nodes throughout their life cycle in the cloud. An overcommit ratio is the ratio of available virtual resources, compared to the available physical resources. OpenStack is able to configure the over- commit ratio for CPU and memory. The default CPU overcommit ratio is 16:1 and the default memory overcommit ratio is 1.5:1. Determining the tuning of the overcommit ratios for both of these options during the de- sign phase is important as it has a direct impact on the hardware layout of your compute nodes. For example, consider a m1.small instance uses 1 vCPU, 20 GB of ephemer- al storage and 2,048 MB of RAM. When designing a hardware node as a compute resource pool to service instances, take into consideration the number of processor cores available on the node as well as the required disk and memory to service instances running at capacity. For a server with 2 CPUs of 10 cores each, with hyperthreading turned on, the default CPU overcommit ratio of 16:1 would allow for 640 (2 × 10 × 2 × 16) total m1.small instances. By the same reasoning, using the default memory overcommit ratio of 1.5:1 you can determine that the server will need at least 853 GB (640 × 2,048 MB / 1.5) of RAM. When sizing nodes for mem- ory, it is also important to consider the additional memory required to ser- vice operating system and service needs. Processor selection is an extremely important consideration in hardware design, especially when comparing the features and performance charac- teristics of different processors. Processors can include features specific to virtualized compute hosts including hardware assisted virtualization and technology related to memory paging (also known as EPT shadowing). These types of features can have a significant impact on the performance of your virtual machine running in the cloud. It is also important to consider the compute requirements of resource nodes within the cloud. Resource nodes refer to non-hypervisor nodes pro- viding the following in the cloud: • Controller Architecture Guide March 17, 2015 current 18 • Object storage • Block storage • Networking services The number of processor cores and threads has a direct correlation to the number of worker threads which can be run on a resource node. As a re- sult, you must make design decisions relating directly to the service, as well as provide a balanced infrastructure for all services. Workload profiles are unpredictable in a general purpose cloud. Addition- al compute resource pools can be added to the cloud later, reducing the stress of unpredictability. In some cases, the demand on certain instance types or flavors may not justify individual hardware design. In either of these cases, initiate the design by allocating hardware designs that are ca- pable of servicing the most common instances requests. If you are looking to add additional hardware designs to the overall architecture, this can be done at a later time. Designing network resources OpenStack clouds traditionally have multiple network segments, each of which provides access to resources within the cloud to both operators and tenants. The network services themselves also require network communi- cation paths which should be separated from the other networks. When designing network services for a general purpose cloud, we recommend planning for a physical or logical separation of network segments that will be used by operators and tenants. We further suggest the creation of an additional network segment for access to internal services such as the mes- sage bus and databse used by the various cloud services. Segregating these services onto separate networks helps to protect sensitive data and pro- tects against unauthorized access to services. Based on the requirements of instances being serviced in the cloud, the choice of network service will be the next decision that affects your design architecture. The choice between legacy networking (nova-network), as a part of Open- Stack Compute, and OpenStack Networking (neutron), has a huge impact on the architecture and design of the cloud network infrastructure. Legacy networking (nova-net- work) The legacy networking (nova-network) service is primarily a layer-2 networking service that functions in two modes. Architecture Guide March 17, 2015 current 19 In legacy networking, the two modes differ in their use of VLANs. When us- ing legacy networking in a flat network mode, all network hardware nodes and devices throughout the cloud are con- nected to a single layer-2 network seg- ment that provides access to applica- tion data. When the network devices in the cloud support segmentation using VLANs, legacy networking can oper- ate in the second mode. In this design model, each tenant within the cloud is assigned a network subnet which is mapped to a VLAN on the physical network. It is especially important to remember the maximum number of 4096 VLANs which can be used within a spanning tree domain. These limita- tions place hard limits on the amount of growth possible within the data cen- ter. When designing a general purpose cloud intended to support multiple ten- ants, we recommend the use of legacy networking with VLANs, and not in flat network mode. Another consideration regarding network is the fact that legacy network- ing is entirely managed by the cloud operator; tenants do not have con- trol over network resources. If tenants require the ability to manage and create network resources such as network segments and subnets, it will be necessary to install the OpenStack Networking service to provide network access to instances. OpenStack Networking (neu- tron) OpenStack Networking (neutron) is a first class networking service that gives full control over creation of virtual net- work resources to tenants. This is often accomplished in the form of tunneling protocols which will establish encapsu- lated communication paths over exist- ing network infrastructure in order to segment tenant traffic. These methods Architecture Guide March 17, 2015 current 20 vary depending on the specific imple- mentation, but some of the more com- mon methods include tunneling over GRE, encapsulating with VXLAN, and VLAN tags. Initially, it is suggested to design at least three network segments, the first of which will be used for access to the cloud's REST APIs by tenants and op- erators. This is referred to as a public network. In most cases, the controller nodes and swift proxies within the cloud will be the only devices necessary to connect to this network segment. In some cases, this network might al- so be serviced by hardware load balancers and other network devices. The next segment is used by cloud administrators to manage hardware re- sources and is also used by configuration management tools when deploy- ing software and services onto new hardware. In some cases, this network segment might also be used for internal services, including the message bus and database services, to communicate with each other. Due to the highly secure nature of this network segment, it may be desirable to se- cure this network from unauthorized access. This network will likely need to communicate with every hardware node within the cloud. The last network segment is used by applications and consumers to pro- vide access to the physical network and also for users accessing applica- tions running within the cloud. This network is generally segregated from the one used to access the cloud APIs and is not capable of communicat- ing directly with the hardware resources in the cloud. Compute resource nodes will need to communicate on this network segment, as will any net- work gateway services which allow application data to access the physical network outside of the cloud. Designing storage resources OpenStack has two independent storage services to consider, each with its own specific design requirements and goals. In addition to services which provide storage as their primary function, there are additional design con- siderations with regard to compute and controller nodes which will affect the overall cloud architecture. Designing OpenStack Object Storage When designing hardware resources for OpenStack Object Storage, the primary goal is to maximize the amount of storage in each resource node while also ensuring that the cost per terabyte is kept to a minimum. This Architecture Guide March 17, 2015 current 21 often involves utilizing servers which can hold a large number of spinning disks. Whether choosing to use 2U server form factors with directly at- tached storage or an external chassis that holds a larger number of drives, the main goal is to maximize the storage available in each node. We do not recommended investing in enterprise class drives for an Open- Stack Object Storage cluster. The consistency and partition tolerance char- acteristics of OpenStack Object Storage will ensure that data stays up to date and survives hardware faults without the use of any specialized data replication devices. One of the benefits of OpenStack Object Storage is the ability to mix and match drives by making use of weighting within the swift ring. When de- signing your swift storage cluster, we recommend making use of the most cost effective storage solution available at the time. Many server chassis on the market can hold 60 or more drives in 4U of rack space, therefore we recommend maximizing the amount of storage per rack unit at the best cost per terabyte. Furthermore, we do not recommend the use of RAID controllers in an object storage node. To achieve durability and availability of data stored as objects it is impor- tant to design object storage resource pools to ensure they can provide the suggested availability. Considering rack-level and zone-level designs to accommodate the number of replicas configured to be stored in the Object Storage service (the defult number of replicas is three) is impor- tant when designing beyond the hardware node level. Each replica of da- ta should exist in its own availability zone with its own power, cooling, and network resources available to service that specific zone. Object storage nodes should be designed so that the number of requests does not hinder the performance of the cluster. The object storage service is a chatty protocol, therefore making use of multiple processors that have higher core counts will ensure the IO requests do not inundate the server. Designing OpenStack Block Storage When designing OpenStack Block Storage resource nodes, it is helpful to understand the workloads and requirements that will drive the use of block storage in the cloud. We recommend designing block storage pools so that tenants can choose appropriate storage solutions for their appli- cations. By creating multiple storage pools of different types, in conjunc- tion with configuring an advanced storage scheduler for the block storage service, it is possible to provide tenants with a large catalog of storage ser- vices with a variety of performance levels and redundancy options. Architecture Guide March 17, 2015 current 22 Block storage also takes advantage of a number of enterprise storage solu- tions. These are addressed via a plug-in driver developed by the hardware vendor. A large number of enterprise storage plug-in drivers ship out-of- the-box with OpenStack Block Storage (and many more available via third party channels). General purpose clouds are more likely to use directly at- tached storage in the majority of block storage nodes, deeming it neces- sary to provide additional levels of service to tenants which can only be provided by enterprise class storage solutions. Redundancy and availability requirements impact the decision to use a RAID controller card in block storage nodes. The input-output per sec- ond (IOPS) demand of your application will influence whether or not you should use a RAID controller, and which level of RAID is required. Making use of higher performing RAID volumes is suggested when considering per- formance. However, where redundancy of block storage volumes is more important we recommend making use of a redundant RAID configura- tion such as RAID 5 or RAID 6. Some specialized features, such as automat- ed replication of block storage volumes, may require the use of third-par- ty plug-ins and enterprise block storage solutions in order to provide the high demand on storage. Furthermore, where extreme performance is a requirement it may also be necessary to make use of high speed SSD disk drives' high performing flash storage solutions. Software selection The software selection process plays a large role in the architecture of a general purpose cloud. The following have a large impact on the design of the cloud: • Choice of operating system • Selection of OpenStack software components • Choice of hypervisor • Selection of supplemental software Operating system (OS) selection plays a large role in the design and archi- tecture of a cloud. There are a number of OSes which have native support for OpenStack including: • Ubuntu • Red Hat Enterprise Linux (RHEL) Architecture Guide March 17, 2015 current 23 • CentOS • SUSE Linux Enterprise Server (SLES) Note Native support is not a constraint on the choice of OS; users are free to choose just about any Linux distribution (or even Microsoft Windows) and install OpenStack directly from source (or compile their own packages). However, many organiza- tions will prefer to install OpenStack from distribution-sup- plied packages or repositories (although using the distribution vendor's OpenStack packages might be a requirement for sup- port). OS selection also directly influences hypervisor selection. A cloud architect who selects Ubuntu, RHEL, or SLES has some flexibility in hypervisor; KVM, Xen, and LXC are supported virtualization methods available under Open- Stack Compute (nova) on these Linux distributions. However, a cloud archi- tect who selects Hyper-V is limited to Windows Servers. Similarly, a cloud architect who selects XenServer is limited to the CentOS-based dom0 oper- ating system provided with XenServer. The primary factors that play into OS-hypervisor selection include: User requirements The selection of OS-hypervisor combination first and foremost needs to support the user require- ments. Support The selected OS-hypervisor combination needs to be supported by OpenStack. Interoperability The OS-hypervisor needs to be interoperable with other features and services in the Open- Stack design in order to meet the user require- ments. Hypervisor OpenStack supports a wide variety of hypervisors, one or more of which can be used in a single cloud. These hypervisors include: • KVM (and QEMU) • XCP/XenServer Architecture Guide March 17, 2015 current 24 • vSphere (vCenter and ESXi) • Hyper-V • LXC • Docker • Bare-metal A complete list of supported hypervisors and their capabilities can be found at OpenStack Hypervisor Support Matrix. We recommend general purpose clouds use hypervisors that support the most general purpose use cases, such as KVM and Xen. More specific hy- pervisors should be chosen to account for specific functionality or a sup- ported feature requirement. In some cases, there may also be a mandated requirement to run software on a certified hypervisor including solutions from VMware, Microsoft, and Citrix. The features offered through the OpenStack cloud platform determine the best choice of a hypervisor. As an example, for a general purpose cloud that predominantly supports a Microsoft-based migration, or is managed by staff that has a particular skill for managing certain hypervisors and op- erating systems, Hyper-V would be the best available choice. While the de- cision to use Hyper-V does not limit the ability to run alternative operat- ing systems, be mindful of those that are deemed supported. Each differ- ent hypervisor also has their own hardware requirements which may affect the decisions around designing a general purpose cloud. For example, to utilize the live migration feature of VMware, vMotion, this requires an in- stallation of vCenter/vSphere and the use of the ESXi hypervisor, which in- creases the infrastructure requirements. In a mixed hypervisor environment, specific aggregates of compute re- sources, each with defined capabilities, enable workloads to utilize soft- ware and hardware specific to their particular requirements. This function- ality can be exposed explicitly to the end user, or accessed through defined metadata within a particular flavor of an instance. OpenStack components A general purpose OpenStack cloud design should incorporate the core OpenStack services to provide a wide range of services to end-users. The OpenStack core services recommended in a general purpose cloud are: • OpenStack Compute (nova) Architecture Guide March 17, 2015 current 25 • OpenStack Networking (neutron) • OpenStack Image Service (glance) • OpenStack Identity (keystone) • OpenStack dashboard (horizon) • Telemetry module (ceilometer) A general purpose cloud may also include OpenStack Object Storage (swift). OpenStack Block Storage (cinder). These may be selected to pro- vide storage to applications and instances. Note However, depending on the use case, these could be optional. Supplemental software A general purpose OpenStack deployment consists of more than just OpenStack-specific components. A typical deployment involves services that provide supporting functionality, including databases and message queues, and may also involve software to provide high availability of the OpenStack environment. Design decisions around the underlying message queue might affect the required number of controller services, as well as the technology to provide highly resilient database functionality, such as MariaDB with Galera. In such a scenario, replication of services relies on quorum. Therefore, the underlying database nodes, for example, should consist of at least 3 nodes to account for the recovery of a failed Galera node. When increasing the number of nodes to support a feature of the software, consideration of rack space and switch port density becomes im- portant. Where many general purpose deployments use hardware load balancers to provide highly available API access and SSL termination, software so- lutions, for example HAProxy, can also be considered. It is vital to ensure that such software implementations are also made highly available. High availability can be achieved by using software such as Keepalived or Pace- maker with Corosync. Pacemaker and Corosync can provide active-active or active-passive highly available configuration depending on the specific ser- vice in the OpenStack environment. Using this software can affect the de- sign as it assumes at least a 2-node controller infrastructure where one of those nodes may be running certain services in standby mode. Architecture Guide March 17, 2015 current 26 Memcached is a distributed memory object caching system, and Redis is a key-value store. Both are deployed on general purpose clouds to assist in alleviating load to the Identity service. The memcached service caches tokens, and due to its distributed nature it can help alleviate some bottle- necks to the underlying authentication system. Using memcached or Redis does not affect the overall design of your architecture as they tend to be deployed onto the infrastructure nodes providing the OpenStack services. Performance Performance of an OpenStack deployment is dependent on a number of factors related to the infrastructure and controller services. The user re- quirements can be split into general network performance, performance of compute resources, and performance of storage systems. Controller infrastructure The Controller infrastructure nodes provide management services to the end-user as well as providing services internally for the operating of the cloud. The Controllers run message queuing services that carry system mes- sages between each service. Performance issues related to the message bus would lead to delays in sending that message to where it needs to go. The result of this condition would be delays in operation functions such as spinning up and deleting instances, provisioning new storage volumes and managing network resources. Such delays could adversely affect an application’s ability to react to certain conditions, especially when using auto-scaling features. It is important to properly design the hardware used to run the controller infrastructure as outlined above in the Hardware Se- lection section. Performance of the controller services is not limited to processing power, but restrictions may emerge in serving concurrent users. Ensure that the APIs and Horizon services are load tested to ensure that you are able to serve your customers. Particular attention should be made to the Open- Stack Identity Service (Keystone), which provides the authentication and authorization for all services, both internally to OpenStack itself and to end-users. This service can lead to a degradation of overall performance if this is not sized appropriately. Network performance In a general purpose OpenStack cloud, the requirements of the network help determine performance capabilities. For example, small deployments Architecture Guide March 17, 2015 current 27 may employ 1 Gigabit Ethernet (GbE) networking, whereas larger instal- lations serving multiple departments or many users would be better archi- tected with 10 GbE networking. The performance of the running instances will be limited by these speeds. It is possible to design OpenStack environ- ments that run a mix of networking capabilities. By utilizing the different interface speeds, the users of the OpenStack environment can choose net- works that are fit for their purpose. For example, web application instances may run on a public network pre- sented through OpenStack Networking that has 1 GbE capability, whereas the back-end database uses an OpenStack Networking network that has 10 GbE capability to replicate its data or, in some cases, the design may in- corporate link aggregation for greater throughput. Network performance can be boosted considerably by implementing hard- ware load balancers to provide front-end service to the cloud APIs. The hardware load balancers also perform SSL termination if that is a require- ment of your environment. When implementing SSL offloading, it is impor- tant to understand the SSL offloading capabilities of the devices selected. Compute host The choice of hardware specifications used in compute nodes including CPU, memory and disk type directly affects the performance of the in- stances. Other factors which can directly affect performance include tun- able parameters within the OpenStack services, for example the overcom- mit ratio applied to resources. The defaults in OpenStack Compute set a 16:1 over-commit of the CPU and 1.5 over-commit of the memory. Running at such high ratios leads to an increase in "noisy-neighbor" activity. Care must be taken when sizing your Compute environment to avoid this sce- nario. For running general purpose OpenStack environments it is possible to keep to the defaults, but make sure to monitor your environment as us- age increases. Storage performance When considering performance of OpenStack Block Storage, hardware and architecture choice is important. Block Storage can use enterprise back-end systems such as NetApp or EMC, scale out storage such as Glus- terFS and Ceph, or simply use the capabilities of directly attached storage in the nodes themselves. Block Storage may be deployed so that traffic tra- verses the host network, which could affect, and be adversely affected by, the front-side API traffic performance. As such, consider using a dedicat- Architecture Guide March 17, 2015 current 28 ed data storage network with dedicated interfaces on the Controller and Compute hosts. When considering performance of OpenStack Object Storage, a number of design choices will affect performance. A user’s access to the Object Storage is through the proxy services, which sit behind hardware load bal- ancers. By the very nature of a highly resilient storage system, replication of the data would affect performance of the overall system. In this case, 10 GbE (or better) networking is recommended throughout the storage net- work architecture. Availability In OpenStack, the infrastructure is integral to providing services and should always be available, especially when operating with SLAs. Ensuring net- work availability is accomplished by designing the network architecture so that no single point of failure exists. A consideration of the number of switches, routes and redundancies of power should be factored into core infrastructure, as well as the associated bonding of networks to provide di- verse routes to your highly available switch infrastructure. The OpenStack services themselves should be deployed across multiple servers that do not represent a single point of failure. Ensuring API avail- ability can be achieved by placing these services behind highly available load balancers that have multiple OpenStack servers as members. OpenStack lends itself to deployment in a highly available manner where it is expected that at least 2 servers be utilized. These can run all the ser- vices involved from the message queuing service, for example RabbitMQ or QPID, and an appropriately deployed database service such as MySQL or MariaDB. As services in the cloud are scaled out, back-end services will need to scale too. Monitoring and reporting on server utilization and re- sponse times, as well as load testing your systems, will help determine scale out decisions. Care must be taken when deciding network functionality. Currently, Open- Stack supports both the legacy networking (nova-network) system and the newer, extensible OpenStack Networking (neutron). Both have their pros and cons when it comes to providing highly available access. Legacy networking, which provides networking access maintained in the Open- Stack Compute code, provides a feature that removes a single point of fail- ure when it comes to routing, and this feature is currently missing in Open- Stack Networking. The effect of legacy networking’s multi-host functionali- ty restricts failure domains to the host running that instance. Architecture Guide March 17, 2015 current 29 When using OpenStack Networking, the OpenStack controller servers or separate Networking hosts handle routing. For a deployment that requires features available in only Networking, it is possible to remove this restric- tion by using third party software that helps maintain highly available L3 routes. Doing so allows for common APIs to control network hardware, or to provide complex multi-tier web applications in a secure manner. It is al- so possible to completely remove routing from Networking, and instead rely on hardware routing capabilities. In this case, the switching infrastruc- ture must support L3 routing. OpenStack Networking and legacy networking both have their advantages and disadvantages. They are both valid and supported options that fit dif- ferent network deployment models described in the OpenStack Opera- tions Guide. Ensure your deployment has adequate back-up capabilities. As an exam- ple, in a deployment that has two infrastructure controller nodes, the de- sign should include controller availability. In the event of the loss of a sin- gle controller, cloud services will run from a single controller in the event of failure. Where the design has higher availability requirements, it is impor- tant to meet those requirements by designing the proper redundancy and availability of controller nodes. Application design must also be factored into the capabilities of the under- lying cloud infrastructure. If the compute hosts do not provide a seamless live migration capability, then it must be expected that when a compute host fails, that instance and any data local to that instance will be deleted. Conversely, when providing an expectation to users that instances have a high-level of uptime guarantees, the infrastructure must be deployed in a way that eliminates any single point of failure when a compute host disap- pears. This may include utilizing shared file systems on enterprise storage or OpenStack Block storage to provide a level of guarantee to match ser- vice features. For more information on high availability in OpenStack, see the OpenStack High Availability Guide. Security A security domain comprises users, applications, servers or networks that share common trust requirements and expectations within a system. Typ- ically they have the same authentication and authorization requirements and users. These security domains are: Architecture Guide March 17, 2015 current 30 • Public • Guest • Management • Data These security domains can be mapped to an OpenStack deployment indi- vidually, or combined. For example, some deployment topologies combine both guest and data domains onto one physical network, whereas in other cases these networks are physically separated. In each case, the cloud op- erator should be aware of the appropriate security concerns. Security do- mains should be mapped out against your specific OpenStack deployment topology. The domains and their trust requirements depend upon whether the cloud instance is public, private, or hybrid. The public security domain is an entirely untrusted area of the cloud infras- tructure. It can refer to the Internet as a whole or simply to networks over which you have no authority. This domain should always be considered un- trusted. Typically used for compute instance-to-instance traffic, the guest securi- ty domain handles compute data generated by instances on the cloud but not services that support the operation of the cloud, such as API calls. Public cloud providers and private cloud providers who do not have strin- gent controls on instance use or who allow unrestricted Internet access to instances should consider this domain to be untrusted. Private cloud providers may want to consider this network as internal and therefore trusted only if they have controls in place to assert that they trust instances and all their tenants. The management security domain is where services interact. Sometimes referred to as the "control plane", the networks in this domain transport confidential data such as configuration parameters, user names, and pass- words. In most deployments this domain is considered trusted. The data security domain is concerned primarily with information pertain- ing to the storage services within OpenStack. Much of the data that cross- es this network has high integrity and confidentiality requirements and, depending on the type of deployment, may also have strong availability re- quirements. The trust level of this network is heavily dependent on other deployment decisions. When deploying OpenStack in an enterprise as a private cloud it is usual- ly behind the firewall and within the trusted network alongside existing Architecture Guide March 17, 2015 current 31 systems. Users of the cloud are, traditionally, employees that are bound by the security requirements set forth by the company. This tends to push most of the security domains towards a more trusted model. However, when deploying OpenStack in a public facing role, no assumptions can be made and the attack vectors significantly increase. For example, the API endpoints, along with the software behind them, become vulnerable to bad actors wanting to gain unauthorized access or prevent access to ser- vices, which could lead to loss of data, functionality, and reputation. These services must be protected against through auditing and appropriate filter- ing. Consideration must be taken when managing the users of the system for both public and private clouds. The identity service allows for LDAP to be part of the authentication process. Including such systems in an OpenStack deployment may ease user management if integrating into existing sys- tems. It's important to understand that user authentication requests include sen- sitive information including user names, passwords and authentication to- kens. For this reason, placing the API services behind hardware that per- forms SSL termination is strongly recommended. For more information OpenStack Security, see the OpenStack Security Guide Operational considerations In the planning and design phases of the build out, it is important to in- clude the operation's function. Operational factors affect the design choic- es for a general purpose cloud, and operations staff are often tasked with the maintenance of cloud environments for larger installations. Knowing when and where to implement redundancy and high availabili- ty is directly affected by expectations set by the terms of the Service Lev- el Agreements (SLAs). SLAs are contractual obligations that provide assur- ances for service availability. They define the levels of availability that drive the technical design, often with penalties for not meeting contractual obli- gations. SLA terms that will affect the design include: • API availability guarantees implying multiple infrastructure services, and highly available load balancers. Architecture Guide March 17, 2015 current 32 • Network uptime guarantees affecting switch design, which might re- quire redundant switching and power. • Network security policies requirements need to be factored in to deploy- ments. Support and maintainability To be able to support and maintain an installation, OpenStack cloud man- agement requires operations staff to understand and comprehend design architecture content. The operations and engineering staff skill level, and level of separation, are dependent on size and purpose of the installation. Large cloud service providers, or telecom providers, are more likely to be managed by a specially trained, dedicated operations organization. Small- er implementations are more likely to rely on support staff that need to take on combined engineering, design and operations functions. Maintaining OpenStack installations requires a variety of technical skills. For example, if you are to incorporate features into an architecture and design that reduce the operations burden, it is advised to automate the operations functions. It may, however, be beneficial to use third party management companies with special expertise in managing OpenStack de- ployment. Monitoring OpenStack clouds require appropriate monitoring platforms to ensure er- rors are caught and managed appropriately. Specific metrics that are criti- cally important to monitor include: • Image disk utilization • Response time to the Compute API Leveraging existing monitoring systems is an effective check to ensure OpenStack environments can be monitored. Downtime To effectively run cloud installations, initial downtime planning includes creating processes and architectures that support the following: • Planned (maintenance) • Unplanned (system faults) Architecture Guide March 17, 2015 current 33 Resiliency of overall system and individual components are going to be dic- tated by the requirements of the SLA, meaning designing for high avail- ability (HA) can have cost ramifications. For example, if a compute host failed, this would be an operational con- sideration; requiring the restoration of instances from a snapshot or re- spawning an instance. The overall application design is impacted, gener- al purpose clouds should not need to provide abilities to migrate instances from one host to another. Additional considerations need to be made around supporting instance migration if the expectation is that the appli- cation will be designed to tolerate failure. Extra support services, including shared storage attached to compute hosts, might need to be deployed in this example. Capacity planning Capacity constraints for a general purpose cloud environment include: • Compute limits • Storage limits A relationship exists between the size of the compute environment and the supporting OpenStack infrastructure controller nodes requiring sup- port. Increasing the size of the supporting compute environment increases the network traffic and messages, adding load to the controller or networking nodes. Effective monitoring of the environment will help with capacity de- cisions on scaling. Compute nodes automatically attach to OpenStack clouds, resulting in a horizontally scaling process when adding extra compute capacity to an OpenStack cloud. Additional processes are required to place nodes into ap- propriate availability zones and host aggregates. When adding additional compute nodes to environments, ensure identical or functional compatible CPUs are used, otherwise live migration features will break. It is necessary to add rack capacity or network switches as scaling out compute hosts di- rectly affects network and datacenter resources. Assessing the average workloads and increasing the number of instances that can run within the compute environment by adjusting the overcom- mit ratio is another option. It is important to remember that changing the CPU overcommit ratio can have a detrimental effect and cause a potential Architecture Guide March 17, 2015 current 34 increase in a noisy neighbor. The additional risk of increasing the overcom- mit ratio is more instances failing when a compute host fails. Compute host components can also be upgraded to account for increas- es in demand; this is known as vertical scaling. Upgrading CPUs with more cores, or increasing the overall server memory, can add extra needed ca- pacity depending on whether the running applications are more CPU in- tensive or memory intensive. Insufficient disk capacity could also have a negative effect on overall per- formance including CPU and memory usage. Depending on the back- end architecture of the OpenStack Block Storage layer, capacity includes adding disk shelves to enterprise storage systems or installing addition- al block storage nodes. Upgrading directly attached storage installed in compute hosts, and adding capacity to the shared storage for additional ephemeral storage to instances, may be necessary. For a deeper discussion on many of these topics, refer to the OpenStack Operations Guide. Architecture Hardware selection involves three key areas: • Compute • Network • Storage Selecting hardware for a general purpose OpenStack cloud should reflect a cloud with no pre-defined usage model. General purpose clouds are de- signed to run a wide variety of applications with varying resource usage re- quirements. These applications include any of the following: • RAM-intensive • CPU-intensive • Storage-intensive Choosing hardware for a general purpose OpenStack cloud must provide balanced access to all major resources. Certain hardware form factors may better suit a general purpose Open- Stack cloud due to the requirement for equal (or nearly equal) balance of resources. Server hardware must provide the following: Architecture Guide March 17, 2015 current 35 • Equal (or nearly equal) balance of compute capacity (RAM and CPU) • Network capacity (number and speed of links) • Storage capacity (gigabytes or terabytes as well as Input/Output Opera- tions Per Second (IOPS) Server hardware is evaluated around four conflicting dimensions. Server density A measure of how many servers can fit into a giv- en measure of physical space, such as a rack unit [U]. Resource capacity The number of CPU cores, how much RAM, or how much storage a given server will deliver. Expandability The number of additional resources that can be added to a server before it has reached its limit. Cost The relative purchase price of the hardware weighted against the level of design effort need- ed to build the system. Increasing server density means sacrificing resource capacity or expandabil- ity, however, increasing resource capacity and expandability increases cost and decreases server density. As a result, determining the best server hard- ware for a general purpose OpenStack architecture means understanding how choice of form factor will impact the rest of the design. The following list outlines the form factors to choose from: • Blade servers typically support dual-socket multi-core CPUs, which is the configuration generally considered to be the "sweet spot" for a gener- al purpose cloud deployment. Blades also offer outstanding density. As an example, both HP BladeSystem and Dell PowerEdge M1000e support up to 16 servers in only 10 rack units. However, the blade servers them- selves often have limited storage and networking capacity. Additionally, the expandability of many blade servers can be limited. • 1U rack-mounted servers occupy only a single rack unit. Their benefits in- clude high density, support for dual-socket multi-core CPUs, and support for reasonable RAM amounts. This form factor offers limited storage ca- pacity, limited network capacity, and limited expandability. • 2U rack-mounted servers offer the expanded storage and networking capacity that 1U servers tend to lack, but with a corresponding decrease in server density (half the density offered by 1U rack-mounted servers). Architecture Guide March 17, 2015 current 36 • Larger rack-mounted servers, such as 4U servers, will tend to offer even greater CPU capacity, often supporting four or even eight CPU sock- ets. These servers often have much greater expandability so will provide the best option for upgradability. This means, however, that the servers have a much lower server density and a much greater hardware cost. •"Sled servers" are rack-mounted servers that support multiple indepen- dent servers in a single 2U or 3U enclosure. This form factor offers in- creased density over typical 1U-2U rack-mounted servers but tends to suffer from limitations in the amount of storage or network capacity each individual server supports. The best form factor for server hardware supporting a general purpose OpenStack cloud is driven by outside business and cost factors. No single reference architecture will apply to all implementations; the decision must flow from user requirements, technical considerations, and operational considerations. Here are some of the key factors that influence the selec- tion of server hardware: Instance density Sizing is an important consideration for a general purpose OpenStack cloud. The ex- pected or anticipated number of instances that each hypervisor can host is a common metric used in sizing the deployment. The se- lected server hardware needs to support the expected or anticipated instance density. Host density Physical data centers have limited physical space, power, and cooling. The number of hosts (or hypervisors) that can be fitted into a given metric (rack, rack unit, or floor tile) is another important method of sizing. Floor weight is an often overlooked consideration. The data center floor must be able to sup- port the weight of the proposed number of hosts within a rack or set of racks. These fac- tors need to be applied as part of the host density calculation and server hardware se- lection. Power density Data centers have a specified amount of power fed to a given rack or set of racks. Older data centers may have a power den- sity as power as low as 20 AMPs per rack, Architecture Guide March 17, 2015 current 37 while more recent data centers can be archi- tected to support power densities as high as 120 AMP per rack. The selected server hard- ware must take power density into account. Network connectivity The selected server hardware must have the appropriate number of network connec- tions, as well as the right type of network connections, in order to support the pro- posed architecture. Ensure that, at a mini- mum, there are at least two diverse network connections coming into each rack. For archi- tectures requiring even more redundancy, it might be necessary to confirm that the net- work connections are from diverse telecom providers. Many data centers have that ca- pacity available. The selection of form factors or architectures affects the selection of serv- er hardware. For example, if the design is a scale-out storage architecture, then the server hardware selection will require careful consideration when matching the requirements set to the commercial solution. Ensure that the selected server hardware is configured to support enough storage capacity (or storage expandability) to match the requirements of selected scale-out storage solution. For example, if a centralized storage solution is required, such as a centralized storage array from a storage ven- dor that has InfiniBand or FDDI connections, the server hardware will need to have appropriate network adapters installed to be compatible with the storage array vendor's specifications. Similarly, the network architecture will have an impact on the server hard- ware selection and vice versa. For example, make sure that the server is configured with enough additional network ports and expansion cards to support all of the networks required. There is variability in network expan- sion cards, so it is important to be aware of potential impacts or interoper- ability issues with other components in the architecture. Selecting storage hardware Storage hardware architecture is largely determined by the selected stor- age architecture. The selection of storage architecture, as well as the corre- sponding storage hardware, is determined by evaluating possible solutions against the critical factors, the user requirements, technical considerations, Architecture Guide March 17, 2015 current 38 and operational considerations. Factors that need to be incorporated into the storage architecture include: Cost Storage can be a significant portion of the overall sys- tem cost. For an organization that is concerned with vendor support, a commercial storage solution is advis- able, although it comes with a higher price tag. If ini- tial capital expenditure requires minimization, design- ing a system based on commodity hardware would ap- ply. The trade-off is potentially higher support costs and a greater risk of incompatibility and interoperabili- ty issues. Scalability Scalability, along with expandability, is a major con- sideration in a general purpose OpenStack cloud. It might be difficult to predict the final intended size of the implementation as there are no established usage patterns for a general purpose cloud. It might become necessary to expand the initial deployment in order to accommodate growth and user demand. Expandability Expandability is a major architecture factor for stor- age solutions with general purpose OpenStack cloud. A storage solution that expands to 50 PB is consid- ered more expandable than a solution that only scales to 10 PB. This metric is related to, but different, from scalability, which is a measure of the solution's perfor- mance as it expands. For example, the storage archi- tecture for a cloud that is intended for a development platform may not have the same expandability and scalability requirements as a cloud that is intended for a commercial product. Using a scale-out storage solution with direct-attached storage (DAS) in the servers is well suited for a general purpose OpenStack cloud. For exam- ple, it is possible to populate storage in either the compute hosts similar to a grid computing solution, or into hosts dedicated to providing block stor- age exclusively. When deploying storage in the compute hosts appropriate hardware, that can support both the storage and compute services on the same hardware, will be required. Understanding the requirements of cloud services will help determine what scale-out solution should be used. Determining if a single, highly ex- pandable and highly vertical, scalable, centralized storage array should be Architecture Guide March 17, 2015 current 39 included in the design. Once an approach has been determined, the stor- age hardware needs to be selected based on this criteria. This list expands upon the potential impacts for including a particular stor- age architecture (and corresponding storage hardware) into the design for a general purpose OpenStack cloud: Connectivity Ensure that, if storage protocols other than Ethernet are part of the storage solution, the appropriate hardware has been selected. If a centralized storage array is selected, ensure that the hy- pervisor will be able to connect to that storage array for image storage. Usage How the particular storage architecture will be used is critical for determining the architecture. Some of the configu- rations that will influence the architec- ture include whether it will be used by the hypervisors for ephemeral instance storage or if OpenStack Object Storage will use it for object storage. Instance and image locations Where instances and images will be stored will influence the architecture. Server hardware If the solution is a scale-out storage architecture that includes DAS, it will affect the server hardware selection. This could ripple into the decisions that affect host density, instance density, power density, OS-hypervisor, manage- ment tools and others. General purpose OpenStack cloud has multiple options. The key factors that will have an influence on selection of storage hardware for a general purpose OpenStack cloud are as follows: Capacity Hardware resources selected for the resource nodes should be capable of supporting enough storage for the cloud services. Defining the initial require- ments and ensuring the design can support adding capacity is important. Hardware nodes selected for object storage should be capable of support a large number of inexpensive disks with no reliance on Architecture Guide March 17, 2015 current 40 RAID controller cards. Hardware nodes selected for block storage should be capable of supporting high speed storage solutions and RAID controller cards to provide performance and redundancy to storage at a hardware level. Selecting hardware RAID con- trollers that automatically repair damaged arrays will assist with the replacement and repair of de- graded or destroyed storage devices. Performance Disks selected for object storage services do not need to be fast performing disks. We recommend that object storage nodes take advantage of the best cost per terabyte available for storage. Con- trastingly, disks chosen for block storage services should take advantage of performance boosting features that may entail the use of SSDs or flash storage to provide high performance block stor- age pools. Storage performance of ephemeral disks used for instances should also be taken into consid- eration. If compute pools are expected to have a high utilization of ephemeral storage, or requires very high performance, it would be advantageous to deploy similar hardware solutions to block stor- age. Fault tolerance Object storage resource nodes have no require- ments for hardware fault tolerance or RAID con- trollers. It is not necessary to plan for fault toler- ance within the object storage hardware because the object storage service provides replication be- tween zones as a feature of the service. Block stor- age nodes, compute nodes and cloud controllers should all have fault tolerance built in at the hard- ware level by making use of hardware RAID con- trollers and varying levels of RAID configuration. The level of RAID chosen should be consistent with the performance and availability requirements of the cloud. Selecting networking hardware Selecting network architecture determines which network hardware will be used. Networking software is determined by the selected networking hardware. For example, selecting networking hardware that only supports Architecture Guide March 17, 2015 current 41 Gigabit Ethernet (GbE) will impact the overall design. Similarly, deciding to use 10 Gigabit Ethernet (10 GbE) will have a number of impacts on various areas of the overall design. There are more subtle design impacts that need to be considered. The se- lection of certain networking hardware (and the networking software) af- fects the management tools that can be used. There are exceptions to this; the rise of "open" networking software that supports a range of network- ing hardware means that there are instances where the relationship be- tween networking hardware and networking software are not as tightly defined. An example of this type of software is Cumulus Linux, which is ca- pable of running on a number of switch vendor's hardware solutions. Some of the key considerations that should be included in the selection of networking hardware include: Port count The design will require networking hardware that has the requisite port count. Port density The network design will be affected by the physical space that is required to provide the requisite port count. A higher port density is preferred, as it leaves more rack space for com- pute or storage components that may be re- quired by the design. This can also lead into concerns about fault domains and power den- sity that should be considered. Higher density switches are more expensive and should also be considered, as it is important not to over design the network if it is not required. Port speed The networking hardware must support the proposed network speed, for example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE). Redundancy The level of network hardware redundan- cy required is influenced by the user require- ments for high availability and cost consider- ations. Network redundancy can be achieved by adding redundant power supplies or paired switches. If this is a requirement, the hardware will need to support this configuration. Power requirements Ensure that the physical data center provides the necessary power for the selected network hardware. Architecture Guide March 17, 2015 current 42 Note This may be an issue for spine switches in a leaf and spine fabric, or end of row (EoR) switches. There is no single best practice architecture for the networking hardware supporting a general purpose OpenStack cloud that will apply to all imple- mentations. Some of the key factors that will have a strong influence on selection of networking hardware include: Connectivity All nodes within an OpenStack cloud require network connectivity. In some cases, nodes require access to more than one network segment. The design must en- compass sufficient network capacity and bandwidth to ensure that all communications within the cloud, both north-south and east-west traffic have sufficient resources available. Scalability The network design should encompass a physical and logical network design that can be easily expanded up- on. Network hardware should offer the appropriate types of interfaces and speeds that are required by the hardware nodes. Availability To ensure that access to nodes within the cloud is not interrupted, we recommend that the network archi- tecture identify any single points of failure and provide some level of redundancy or fault tolerance. With re- gard to the network infrastructure itself, this often in- volves use of networking protocols such as LACP, VR- RP or others to achieve a highly available network con- nection. In addition, it is important to consider the net- working implications on API availability. In order to en- sure that the APIs, and potentially other services in the cloud are highly available, we recommend you design a load balancing solution within the network architecture to accommodate for these requirements. Software selection Software selection for a general purpose OpenStack architecture design needs to include these three areas: Architecture Guide March 17, 2015 current 43 • Operating system (OS) and hypervisor • OpenStack components • Supplemental software Operating system and hypervisor The operating system (OS) and hypervisor have a significant impact on the overall design. Selecting a particular operating system and hypervisor can directly affect server hardware selection. Make sure the storage hardware and topology support the selected operating system and hypervisor combi- nation. Also ensure the networking hardware selection and topology will work with the chosen operating system and hypervisor combination. For example, if the design uses Link Aggregation Control Protocol (LACP), the OS and hypervisor both need to support it. Some areas that could be impacted by the selection of OS and hypervisor include: Cost Selecting a commercially supported hypervi- sor, such as Microsoft Hyper-V, will result in a different cost model rather than commu- nity-supported open source hypervisors in- cluding KVM, Kinstance or Xen. When com- paring open source OS solutions, choosing Ubuntu over Red Hat (or vice versa) will have an impact on cost due to support con- tracts. Supportability Depending on the selected hypervisor, staff should have the appropriate training and knowledge to support the selected OS and hypervisor combination. If they do not, training will need to be provided which could have a cost impact on the design. Management tools The management tools used for Ubuntu and Kinstance differ from the management tools for VMware vSphere. Although both OS and hypervisor combinations are sup- ported by OpenStack, there will be very dif- ferent impacts to the rest of the design as a result of the selection of one combination versus the other. Architecture Guide March 17, 2015 current 44 Scale and performance Ensure that selected OS and hypervisor combinations meet the appropriate scale and performance requirements. The chosen architecture will need to meet the targeted instance-host ratios with the selected OS- hypervisor combinations. Security Ensure that the design can accommodate regular periodic installations of application security patches while maintaining required workloads. The frequency of security patch- es for the proposed OS-hypervisor combi- nation will have an impact on performance and the patch installation process could af- fect maintenance windows. Supported features Determine which features of OpenStack are required. This will often determine the se- lection of the OS-hypervisor combination. Some features are only available with spe- cific OSs or hypervisors. For example, if cer- tain features are not available, the design might need to be modified to meet the us- er requirements. Interoperability You will need to consider how the OS and hypervisor combination interactions with other operating systems and hypervisors, including other software solutions. Opera- tional troubleshooting tools for one OS-hy- pervisor combination may differ from the tools used for another OS-hypervisor combi- nation and, as a result, the design will need to address if the two sets of tools need to interoperate. OpenStack components Selecting which OpenStack components are included in the overall design can have a significant impact. Some OpenStack components, like compute and Image Service, are required in every architecture. Other components, like Orchestration, are not always required. Excluding certain OpenStack components can limit or constrain the func- tionality of other components. For example, if the architecture includes Architecture Guide March 17, 2015 current 45 Orchestration but excludes Telemetry, then the design will not be able to take advantage of Orchestrations' auto scaling functionality. It is impor- tant to research the component interdependencies in conjunction with the technical requirements before deciding on the final architecture. Supplemental components While OpenStack is a fairly complete collection of software projects for building a platform for cloud services, there are invariably additional pieces of software that need to be considered in any given OpenStack design. Networking software OpenStack Networking provides a wide variety of networking services for instances. There are many additional networking software packages that can be useful when managing OpenStack components. Some examples in- clude: • Software to provide load balancing • Network redundancy protocols • Routing daemons Some of these software packages are described in more detail in the Open- Stack High Availability Guide (refer to the Network controller cluster stack chapter of the OpenStack High Availability Guide). For a general purpose OpenStack cloud, the OpenStack infrastructure com- ponents need to be highly available. If the design does not include hard- ware load balancing, networking software packages like HAProxy will need to be included. Management software Selected supplemental software solution impacts and affects the overall OpenStack cloud design. This includes software for providing clustering, logging, monitoring and alerting. Inclusion of clustering software, such as Corosync or Pacemaker, is deter- mined primarily by the availability requirements. The impact of includ- ing (or not including) these software packages is primarily determined by the availability of the cloud infrastructure and the complexity of support- ing the configuration after it is deployed. The OpenStack High Availabil- Architecture Guide March 17, 2015 current 46 ity Guide provides more details on the installation and configuration of Corosync and Pacemaker, should these packages need to be included in the design. Requirements for logging, monitoring, and alerting are determined by op- erational considerations. Each of these sub-categories includes a number of various options. For example, in the logging sub-category one might con- sider Logstash, Splunk, instanceware Log Insight, or some other log aggre- gation-consolidation tool. Logs should be stored in a centralized location to make it easier to perform analytics against the data. Log data analytics engines can also provide automation and issue notification by providing a mechanism to both alert and automatically attempt to remediate some of the more commonly known issues. If these software packages are required, the design must account for the additional resource consumption (CPU, RAM, storage, and network band- width). Some other potential design impacts include: • OS-hypervisor combination: Ensure that the selected logging, monitor- ing, or alerting tools support the proposed OS-hypervisor combination. • Network hardware: The network hardware selection needs to be sup- ported by the logging, monitoring, and alerting software. Database software OpenStack components often require access to back-end database ser- vices to store state and configuration information. Selecting an appropri- ate back-end database that satisfies the availability and fault tolerance requirements of the OpenStack services is required. OpenStack services supports connecting to a database that is supported by the SQLAlchemy python drivers, however, most common database deployments make use of MySQL or variations of it. We recommend that the database, which provides back-end service within a general purpose cloud, be made high- ly available when using an available technology which can accomplish that goal. Addressing performance-sensitive workloads Although one of the key defining factors for a general purpose OpenStack cloud is that performance is not a determining factor, there may still be some performance-sensitive workloads deployed on the general purpose OpenStack cloud. For design guidance on performance-sensitive work- loads, we recommend that you refer to the focused scenarios later in this Architecture Guide March 17, 2015 current 47 guide. The resource-focused guides can be used as a supplement to this guide to help with decisions regarding performance-sensitive workloads. Compute-focused workloads In an OpenStack cloud that is compute-focused, there are some design choices that can help accommodate those workloads. Compute-focused workloads demand more CPU and memory resources with lower priority given to storage and network performance. For guidance on designing for this type of cloud, please refer to Chapter 3, “Compute focused” . Network-focused workloads In a network-focused OpenStack cloud, some design choices can improve the performance of these types of workloads. Network-focused workloads have extreme demands on network bandwidth and services that require specialized consideration and planning. For guidance on designing for this type of cloud, please refer to Chapter 5, “Network focused” . Storage-focused workloads Storage focused OpenStack clouds need to be designed to accommodate workloads that have extreme demands on either object or block storage services. For guidance on designing for this type of cloud, please refer to Chapter 4, “Storage focused” . Prescriptive example An online classified advertising company wants to run web applications consisting of Tomcat, Nginx and MariaDB in a private cloud. To be able to meet policy requirements, the cloud infrastructure will run in their own da- ta center. The company has predictable load requirements, but requires scaling to cope with nightly increases in demand. Their current environ- ment does not have the flexibility to align with their goal of running an open source API environment. The current environment consists of the fol- lowing: • Between 120 and 140 installations of Nginx and Tomcat, each with 2 vC- PUs and 4 GB of RAM • A three-node MariaDB and Galera cluster, each with 4 vCPUs and 8 GB RAM Architecture Guide March 17, 2015 current 48 The company runs hardware load balancers and multiple web applications serving their websites, and orchestrates environments using combinations of scripts and Puppet. The website generates large amounts of log data daily that requires archiving. The solution would consist of the following OpenStack components: • A firewall, switches and load balancers on the public facing network connections. • OpenStack Controller service running Image, Identity, Networking, com- bined with support services such as MariaDB and RabbitMQ, configured for high availability on at least three controller nodes. • OpenStack Compute nodes running the KVM hypervisor. • OpenStack Block Storage for use by compute instances, requiring persis- tent storage (such as databases for dynamic sites). • OpenStack Object Storage for serving static objects (such as images). Architecture Guide March 17, 2015 current 49 Running up to 140 web instances and the small number of MariaDB in- stances requires 292 vCPUs available, as well as 584 GB RAM. On a typical 1U server using dual-socket hex-core Intel CPUs with Hyperthreading, and assuming 2:1 CPU overcommit ratio, this would require 8 OpenStack Com- pute nodes. The web application instances run from local storage on each of the Open- Stack Compute nodes. The web application instances are stateless, mean- ing that any of the instances can fail and the application will continue to function. MariaDB server instances store their data on shared enterprise storage, such as NetApp or Solidfire devices. If a MariaDB instance fails, storage would be expected to be re-attached to another instance and rejoined to the Galera cluster. Logs from the web application servers are shipped to OpenStack Object Storage for processing and archiving. Additional capabilities can be realized by moving static web content to be served from OpenStack Object Storage containers, and backing the Open- Stack Image Service with OpenStack Object Storage. Note Increasing OpenStack Object Storage means network band- width needs to be taken into consideration. Running Open- Stack Object Storage with network connections offering 10 GbE or better connectivity is advised. Leveraging Orchestration and Telemetry modules is also a potential is- sue when providing auto-scaling, orchestrated web application environ- ments. Defining the web applications in Heat Orchestration Templates (HOT) negates the reliance on the current scripted Puppet solution. OpenStack Networking can be used to control hardware load balancers through the use of plug-ins and the Networking API. This allows users to control hardware load balance pools and instances as members in these pools, but their use in production environments must be carefully weighed against current stability. Architecture Guide March 17, 2015 current 51 3. Compute focused Table of Contents User requirements ................................................................................ 52 Technical considerations ....................................................................... 54 Operational considerations .................................................................. 64 Architecture ......................................................................................... 66 Prescriptive examples ........................................................................... 77 A compute-focused cloud is a specialized subset of the general purpose OpenStack cloud architecture. Unlike the general purpose OpenStack ar- chitecture, which is built to host a wide variety of workloads and appli- cations and does not heavily tax any particular computing aspect, a com- pute-focused cloud is built and designed specifically to support compute in- tensive workloads. As such, the design must be specifically tailored to sup- port hosting compute intensive workloads. Compute intensive workloads may be CPU intensive, RAM intensive, or both. However, they are not typ- ically storage intensive or network intensive. Compute-focused workloads may include the following use cases: • High performance computing (HPC) • Big data analytics using Hadoop or other distributed data stores • Continuous integration/continuous deployment (CI/CD) • Platform-as-a-Service (PaaS) • Signal processing for network function virtualization (NFV) Based on the use case requirements, such clouds might need to provide additional services such as a virtual machine disk library, file or object stor- age, firewalls, load balancers, IP addresses, and network connectivity in the form of overlays or virtual local area networks (VLANs). A compute-fo- cused OpenStack cloud will not typically use raw block storage services since the applications hosted on a compute-focused OpenStack cloud gen- erally do not need persistent block storage. Architecture Guide March 17, 2015 current 52 User requirements Compute intensive workloads are defined by their high utilization of CPU, RAM, or both. User requirements will determine if a cloud must be built to accommodate anticipated performance demands. Cost Cost is not generally a primary concern for a compute-focused cloud, however some orga- nizations might be concerned with cost avoid- ance. Repurposing existing resources to tackle compute-intensive tasks instead of needing to acquire additional resources may offer cost re- duction opportunities. Time to market Compute-focused clouds can be used to deliv- er products more quickly, for example, speed- ing up a company's software development life cycle (SDLC) for building products and applica- tions. Revenue opportunity Companies that are interested in building ser- vices or products that rely on the power of the compute resources will benefit from a compute-focused cloud. Examples include the analysis of large data sets (via Hadoop or Cas- sandra) or completing computational inten- sive tasks such as rendering, scientific compu- tation, or simulations. Legal requirements Many jurisdictions have legislative and regulatory requirements governing the storage and management of data in cloud environments. Common ar- eas of regulation include: • Data retention policies ensuring storage of persistent data and records management to meet data archival requirements. • Data ownership policies governing the possession and responsibility for data. • Data sovereignty policies governing the storage of data in foreign coun- tries or otherwise separate jurisdictions. Architecture Guide March 17, 2015 current 53 • Data compliance—certain types of information needs to reside in cer- tain locations due to regular issues—and more important cannot reside in other locations for the same reason. Examples of such legal frameworks include the data protection framework of the European Union and the requirements of the Financial Industry Reg- ulatory Authority in the United States. Consult a local regulatory body for more information. Technical considerations The following are some technical requirements that need to be incorporat- ed into the architecture design. Performance If a primary technical concern is for the envi- ronment to deliver high performance capa- bility, then a compute-focused design is an obvious choice because it is specifically de- signed to host compute-intensive workloads. Workload persistence Workloads can be either short-lived or long running. Short-lived workloads might include continuous integration and continuous de- ployment (CI-CD) jobs, where large numbers of compute instances are created simulta- neously to perform a set of compute-inten- sive tasks. The results or artifacts are then copied from the instance into long-term stor- age before the instance is destroyed. Long- running workloads, like a Hadoop or high- performance computing (HPC) cluster, typi- cally ingest large data sets, perform the com- putational work on those data sets, then push the results into long term storage. Un- like short-lived workloads, when the compu- tational work is completed, they will remain idle until the next job is pushed to them. Long-running workloads are often larger and more complex, so the effort of building them is mitigated by keeping them active between jobs. Another example of long running work- loads is legacy applications that typically are persistent over time. Architecture Guide March 17, 2015 current 54 Storage Workloads targeted for a compute-focused OpenStack cloud generally do not require any persistent block storage (although some usages of Hadoop with HDFS may dic- tate the use of persistent block storage). A shared filesystem or object store will main- tain the initial data set(s) and serve as the destination for saving the computational re- sults. By avoiding the input-output (IO) over- head, workload performance is significantly enhanced. Depending on the size of the da- ta set(s), it might be necessary to scale the object store or shared file system to match the storage demand. User interface Like any other cloud architecture, a com- pute-focused OpenStack cloud requires an on-demand and self-service user interface. End users must be able to provision comput- ing power, storage, networks and software simply and flexibly. This includes scaling the infrastructure up to a substantial level with- out disrupting host operations. Security Security is going to be highly dependent on the business requirements. For example, a computationally intense drug discovery ap- plication will obviously have much higher se- curity requirements than a cloud that is de- signed for processing market data for a re- tailer. As a general start, the security recom- mendations and guidelines provided in the OpenStack Security Guide are applicable. Operational considerations The compute intensive cloud from the operational perspective is similar to the requirements for the general-purpose cloud. More details on opera- tional requirements can be found in the general-purpose design section. Technical considerations In a compute-focused OpenStack cloud, the type of instance workloads be- ing provisioned heavily influences technical decision making. For example, Architecture Guide March 17, 2015 current 55 specific use cases that demand multiple short running jobs present differ- ent requirements than those that specify long-running jobs, even though both situations are considered "compute focused." Public and private clouds require deterministic capacity planning to sup- port elastic growth in order to meet user SLA expectations. Deterministic capacity planning is the path to predicting the effort and expense of mak- ing a given process consistently performant. This process is important be- cause, when a service becomes a critical part of a user's infrastructure, the user's fate becomes wedded to the SLAs of the cloud itself. In cloud com- puting, a service's performance will not be measured by its average speed but rather by the consistency of its speed. There are two aspects of capacity planning to consider: planning the initial deployment footprint, and planning expansion of it to stay ahead of the demands of cloud users. Planning the initial footprint for an OpenStack deployment is typically done based on existing infrastructure workloads and estimates based on expected uptake. The starting point is the core count of the cloud. By applying relevant ra- tios, the user can gather information about: • The number of instances expected to be available concurrently: (over- commit fraction × cores) / virtual cores per instance • How much storage is required: flavor disk size × number of instances These ratios can be used to determine the amount of additional infras- tructure needed to support the cloud. For example, consider a situation in which you require 1600 instances, each with 2 vCPU and 50 GB of storage. Assuming the default overcommit rate of 16:1, working out the math pro- vides an equation of: • 1600 = (16 × (number of physical cores)) / 2 • storage required = 50 GB × 1600 On the surface, the equations reveal the need for 200 physical cores and 80 TB of storage for /var/lib/nova/instances/. However, it is also important to look at patterns of usage to estimate the load that the API services, database servers, and queue servers are likely to encounter. Consider, for example, the differences between a cloud that supports a managed web-hosting platform with one running integration tests for a Architecture Guide March 17, 2015 current 56 development project that creates one instance per code commit. In the former, the heavy work of creating an instance happens only every few months, whereas the latter puts constant heavy load on the cloud con- troller. The average instance lifetime must be considered, as a larger num- ber generally means less load on the cloud controller. Aside from the creation and termination of instances, the impact of users must be considered when accessing the service, particularly on nova-api and its associated database. Listing instances garners a great deal of in- formation and, given the frequency with which users run this operation, a cloud with a large number of users can increase the load significantly. This can even occur unintentionally. For example, the OpenStack Dashboard instances tab refreshes the list of instances every 30 seconds, so leaving it open in a browser window can cause unexpected load. Consideration of these factors can help determine how many cloud con- troller cores are required. A server with 8 CPU cores and 8 GB of RAM serv- er would be sufficient for up to a rack of compute nodes, given the above caveats. Key hardware specifications are also crucial to the performance of user in- stances. Be sure to consider budget and performance needs, including stor- age performance (spindles/core), memory availability (RAM/core), net- work bandwidth (Gbps/core), and overall CPU performance (CPU/core). The cloud resource calculator is a useful tool in examining the impacts of different hardware and instance load outs. It is available at: https:// github.com/noslzzp/cloud-resource-calculator/blob/master/cloud-re- source-calculator.ods Expansion planning A key challenge faced when planning the expansion of cloud compute ser- vices is the elastic nature of cloud infrastructure demands. Previously, new users or customers would be forced to plan for and request the infrastruc- ture they required ahead of time, allowing time for reactive procurement processes. Cloud computing users have come to expect the agility provid- ed by having instant access to new resources as they are required. Conse- quently, this means planning should be delivered for typical usage, but al- so more importantly, for sudden bursts in usage. Planning for expansion can be a delicate balancing act. Planning too con- servatively can lead to unexpected oversubscription of the cloud and dis- satisfied users. Planning for cloud expansion too aggressively can lead to Architecture Guide March 17, 2015 current 57 unexpected underutilization of the cloud and funds spent on operating in- frastructure that is not being used efficiently. The key is to carefully monitor the spikes and valleys in cloud usage over time. The intent is to measure the consistency with which services can be delivered, not the average speed or capacity of the cloud. Using this infor- mation to model performance results in capacity enables users to more ac- curately determine the current and future capacity of the cloud. CPU and RAM (Adapted from: http://docs.openstack.org/openstack-ops/con- tent/compute_nodes.html#cpu_choice) In current generations, CPUs have up to 12 cores. If an Intel CPU supports Hyper-Threading, those 12 cores are doubled to 24 cores. If a server is pur- chased that supports multiple CPUs, the number of cores is further multi- plied. Hyper-Threading is Intel's proprietary simultaneous multi-threading implementation, used to improve parallelization on their CPUs. Consider enabling Hyper-Threading to improve the performance of multithreaded applications. Whether the user should enable Hyper-Threading on a CPU depends upon the use case. For example, disabling Hyper-Threading can be beneficial in intense computing environments. Performance testing conducted by run- ning local workloads with both Hyper-Threading on and off can help de- termine what is more appropriate in any particular case. If the Libvirt/KVM hypervisor driver are the intended use cases, then the CPUs used in the compute nodes must support virtualization by way of the VT-x extensions for Intel chips and AMD-v extensions for AMD chips to pro- vide full performance. OpenStack enables the user to overcommit CPU and RAM on compute nodes. This allows an increase in the number of instances running on the cloud at the cost of reducing the performance of the instances. OpenStack Compute uses the following ratios by default: • CPU allocation ratio: 16:1 • RAM allocation ratio: 1.5:1 The default CPU allocation ratio of 16:1 means that the scheduler allocates up to 16 virtual cores per physical core. For example, if a physical node has Architecture Guide March 17, 2015 current 58 12 cores, the scheduler sees 192 available virtual cores. With typical flavor definitions of 4 virtual cores per instance, this ratio would provide 48 in- stances on a physical node. Similarly, the default RAM allocation ratio of 1.5:1 means that the sched- uler allocates instances to a physical node as long as the total amount of RAM associated with the instances is less than 1.5 times the amount of RAM available on the physical node. For example, if a physical node has 48 GB of RAM, the scheduler allocates instances to that node until the sum of the RAM associated with the in- stances reaches 72 GB (such as nine instances, in the case where each in- stance has 8 GB of RAM). The appropriate CPU and RAM allocation ratio must be selected based on particular use cases. Additional hardware Certain use cases may benefit from exposure to additional devices on the compute node. Examples might include: • High performance computing jobs that benefit from the availability of graphics processing units (GPUs) for general-purpose computing. • Cryptographic routines that benefit from the availability of hardware random number generators to avoid entropy starvation. • Database management systems that benefit from the availability of SSDs for ephemeral storage to maximize read/write time when it is required. Host aggregates are used to group hosts that share similar characteristics, which can include hardware similarities. The addition of specialized hard- ware to a cloud deployment is likely to add to the cost of each node, so careful consideration must be given to whether all compute nodes, or just a subset which is targetable using flavors, need the additional customiza- tion to support the desired workloads. Utilization Infrastructure-as-a-Service offerings, including OpenStack, use flavors to provide standardized views of virtual machine resource requirements that simplify the problem of scheduling instances while making the best use of the available physical resources. Architecture Guide March 17, 2015 current 59 In order to facilitate packing of virtual machines onto physical hosts, the default selection of flavors are constructed so that the second largest fla- vor is half the size of the largest flavor in every dimension. It has half the vCPUs, half the vRAM, and half the ephemeral disk space. The next largest flavor is half that size again. As a result, packing a server for general pur- pose computing might look conceptually something like this figure: On the other hand, a CPU optimized packed server might look like the fol- lowing figure: Architecture Guide March 17, 2015 current 60 These default flavors are well suited to typical load outs for commodity server hardware. To maximize utilization, however, it may be necessary to customize the flavors or create new ones, to better align instance sizes to the available hardware. Workload characteristics may also influence hardware choices and flavor configuration, particularly where they present different ratios of CPU ver- sus RAM versus HDD requirements. For more information on Flavors refer to: http://docs.openstack.org/open- stack-ops/content/flavors.html Architecture Guide March 17, 2015 current 61 Performance The infrastructure of a cloud should not be shared, so that it is possible for the workloads to consume as many resources as are made available, and accommodations should be made to provide large scale workloads. The duration of batch processing differs depending on individual work- loads that are launched. Time limits range from seconds, minutes to hours, and as a result it is considered difficult to predict when resources will be used, for how long, and even which resources will be used. Security The security considerations needed for this scenario are similar to those of the other scenarios discussed in this book. A security domain comprises users, applications, servers or networks that share common trust requirements and expectations within a system. Typ- ically they have the same authentication and authorization requirements and users. These security domains are: 1. Public 2. Guest 3. Management 4. Data These security domains can be mapped individually to the installation, or they can also be combined. For example, some deployment topologies combine both guest and data domains onto one physical network, where- as in other cases these networks are physically separated. In each case, the cloud operator should be aware of the appropriate security concerns. Se- curity domains should be mapped out against specific OpenStack deploy- ment topology. The domains and their trust requirements depend upon whether the cloud instance is public, private, or hybrid. The public security domain is an entirely untrusted area of the cloud infras- tructure. It can refer to the Internet as a whole or simply to networks over which the user has no authority. This domain should always be considered untrusted. Architecture Guide March 17, 2015 current 62 Typically used for compute instance-to-instance traffic, the guest securi- ty domain handles compute data generated by instances on the cloud; not services that support the operation of the cloud, for example API calls. Public cloud providers and private cloud providers who do not have strin- gent controls on instance use or who allow unrestricted Internet access to instances should consider this domain to be untrusted. Private cloud providers may want to consider this network as internal and therefore trusted only if they have controls in place to assert that they trust instances and all their tenants. The management security domain is where services interact. Sometimes referred to as the "control plane", the networks in this domain transport confidential data such as configuration parameters, user names, and pass- words. In most deployments this domain is considered trusted. The data security domain is concerned primarily with information pertain- ing to the storage services within OpenStack. Much of the data that cross- es this network has high integrity and confidentiality requirements and de- pending on the type of deployment there may also be strong availability requirements. The trust level of this network is heavily dependent on de- ployment decisions and as such we do not assign this any default level of trust. When deploying OpenStack in an enterprise as a private cloud it is as- sumed to be behind a firewall and within the trusted network alongside existing systems. Users of the cloud are typically employees or trusted indi- viduals that are bound by the security requirements set forth by the com- pany. This tends to push most of the security domains towards a more trusted model. However, when deploying OpenStack in a public-facing role, no assumptions can be made and the attack vectors significantly in- crease. For example, the API endpoints and the software behind it will be vulnerable to potentially hostile entities wanting to gain unauthorized ac- cess or prevent access to services. This can result in loss of reputation and must be protected against through auditing and appropriate filtering. Consideration must be taken when managing the users of the system, whether it is the operation of public or private clouds. The identity service allows for LDAP to be part of the authentication process, and includes such systems as an OpenStack deployment that may ease user management if integrated into existing systems. It is strongly recommended that the API services are placed behind hard- ware that performs SSL termination. API services transmit user names, passwords, and generated tokens between client machines and API end- points and therefore must be secured. Architecture Guide March 17, 2015 current 63 More information on OpenStack Security can be found at http:// docs.openstack.org/security-guide/ OpenStack components Due to the nature of the workloads that will be used in this scenario, a number of components will be highly beneficial in a Compute-focused cloud. This includes the typical OpenStack components: • OpenStack Compute (nova) • OpenStack Image Service (glance) • OpenStack Identity (keystone) Also consider several specialized components: • Orchestration module (heat) It is safe to assume that, given the nature of the applications involved in this scenario, these will be heavily automated deployments. Making use of Orchestration will be highly beneficial in this case. Deploying a batch of in- stances and running an automated set of tests can be scripted, however it makes sense to use the Orchestration module to handle all these actions. • Telemetry module (ceilometer) Telemetry and the alarms it generates are required to support autoscaling of instances using Orchestration. Users that are not using the Orchestra- tion module do not need to deploy the Telemetry module and may choose to use other external solutions to fulfill their metering and monitoring re- quirements. See also: http://docs.openstack.org/openstack-ops/con- tent/logging_monitoring.html • OpenStack Block Storage (cinder) Due to the burst-able nature of the workloads and the applications and in- stances that will be used for batch processing, this cloud will utilize mainly memory or CPU, so the need for add-on storage to each instance is not a likely requirement. This does not mean that OpenStack Block Storage (cin- der) will not be used in the infrastructure, but typically it will not be used as a central component. Architecture Guide March 17, 2015 current 64 • Networking When choosing a networking platform, ensure that it either works with all desired hypervisor and container technologies and their OpenStack drivers, or includes an implementation of an ML2 mechanism driver. Networking platforms that provide ML2 mechanisms drivers can be mixed. Operational considerations Operationally, there are a number of considerations that affect the design of compute-focused OpenStack clouds. Some examples might include en- forcing strict API availability requirements, understanding and dealing with failure scenarios, or managing host maintenance schedules. Service-level agreements (SLAs) are a contractual obligation that gives as- surances around the availability of a provided service. As such, factoring in promises of availability implies a certain level of redundancy and resiliency when designing an OpenStack cloud. • Guarantees for API availability imply multiple infrastructure services com- bined with appropriately high available load balancers. • Network uptime guarantees will affect the switch design and might re- quire redundant switching and power. • Network security policy requirements need to be factored in to deploy- ments. Knowing when and where to implement redundancy and high availabili- ty (HA) is directly affected by the terms contained in any associated SLA, if one is present. Support and maintainability OpenStack cloud management requires operations staff to be able to un- derstand and comprehend design architecture content on some level. The level of skills and the level of separation of the operations and engineer- ing staff is dependent on the size and purpose of the installation. A large cloud service provider or a telecom provider is more inclined to be man- aged by a specially trained dedicated operations organization. A small- er implementation is more inclined to rely on a smaller support staff that might need to take on the combined engineering, design and operations functions. Architecture Guide March 17, 2015 current 65 Maintaining OpenStack installations require a variety of technical skills. Some of these skills may include the ability to debug Python log output to a basic level as well as an understanding of networking concepts. Consider incorporating features into the architecture and design that re- duce the operational burden. Some examples include automating some of the operations functions, or alternatively exploring the possibility of us- ing a third party management company with special expertise in managing OpenStack deployments. Monitoring Like any other infrastructure deployment, OpenStack clouds need an ap- propriate monitoring platform to ensure errors are caught and managed appropriately. Consider leveraging any existing monitoring system to see if it will be able to effectively monitor an OpenStack environment. While there are many aspects that need to be monitored, specific metrics that are critically important to capture include image disk utilization, or re- sponse time to the Compute API. Expected and unexpected server downtime At some point, servers will fail. The SLAs in place affect how the design has to address recovery time. Recovery of a failed host may mean restoring in- stances from a snapshot, or respawning that instance on another available host, which then has consequences on the overall application design run- ning on the OpenStack cloud. It might be acceptable to design a compute-focused cloud without the ability to migrate instances from one host to another, because the expec- tation is that the application developer must handle failure within the ap- plication itself. Conversely, a compute-focused cloud might be provisioned to provide extra resilience as a requirement of that business. In this sce- nario, it is expected that extra supporting services are also deployed, such as shared storage attached to hosts to aid in recovery and resiliency of ser- vices in order to meet strict SLAs. Capacity planning Adding extra capacity to an OpenStack cloud is an easy horizontally scal- ing process, as consistently configured nodes automatically attach to an OpenStack cloud. Be mindful, however, of any additional work to place the nodes into appropriate Availability Zones and Host Aggregates if nec- essary. The same (or very similar) CPUs are recommended when adding ex- Architecture Guide March 17, 2015 current 66 tra nodes to the environment because it reduces the chance to break any live-migration features if they are present. Scaling out hypervisor hosts also has a direct effect on network and other data center resources, so factor in this increase when reaching rack capacity or when extra network switches are required. Compute hosts can also have internal components changed to account for increases in demand, a process also known as vertical scaling. Swapping a CPU for one with more cores, or increasing the memory in a server, can help add extra needed capacity depending on whether the running appli- cations are more CPU intensive or memory based (as would be expected in a compute-focused OpenStack cloud). Another option is to assess the average workloads and increase the num- ber of instances that can run within the compute environment by adjust- ing the overcommit ratio. While only appropriate in some environments, it's important to remember that changing the CPU overcommit ratio can have a detrimental effect and cause a potential increase in a noisy neigh- bor. The added risk of increasing the overcommit ratio is that more in- stances will fail when a compute host fails. In a compute-focused Open- Stack design architecture, increasing the CPU overcommit ratio increases the potential for noisy neighbor issues and is not recommended. Architecture The hardware selection covers three areas: • Compute • Network • Storage An OpenStack cloud with extreme demands on processor and memory re- sources is considered to be compute-focused, and requires hardware that can handle these demands. This can mean choosing hardware which might not perform as well on storage or network capabilities. In a compute- fo- cused architecture, storage and networking are required while loading a data set into the computational cluster, but are not otherwise in heavy de- mand. Compute (server) hardware must be evaluated against four dimensions: Server density A measure of how many servers can fit into a giv- en amount of physical space, such as a rack unit (U). Architecture Guide March 17, 2015 current 67 Resource capacity The number of CPU cores, how much RAM, or how much storage a given server will deliver. Expandability The number of additional resources that can be added to a server before it has reached its limit. Cost The relative purchase price of the hardware weighted against the level of design effort need- ed to build the system. The dimensions need to be weighed against each other to determine the best design for the desired purpose. For example, increasing server densi- ty means sacrificing resource capacity or expandability. Increasing resource capacity and expandability can increase cost but decreases server densi- ty. Decreasing cost can mean decreasing supportability, server density, re- source capacity, and expandability. A compute-focused cloud should have an emphasis on server hardware that can offer more CPU sockets, more CPU cores, and more RAM. Net- work connectivity and storage capacity are less critical. The hardware will need to be configured to provide enough network connectivity and stor- age capacity to meet minimum user requirements, but they are not the pri- mary consideration. Some server hardware form factors are better suited than others, as CPU and RAM capacity have the highest priority. Some considerations for se- lecting hardware: • Most blade servers can support dual-socket multi-core CPUs. To avoid this CPU limit, select "full width" or "full height" blades, however this will also decrease the server density. For example, high density blade servers (like HP BladeSystem or Dell PowerEdge M1000e) which support up to 16 servers in only ten rack units. Using half-height blades is twice as dense as using full-height blades, which results in only eight servers per ten rack units. • 1U rack-mounted servers (servers that occupy only a single rack unit) may be able to offer greater server density than a blade server solu- tion. It is possible to place forty 1U servers in a rack, providing space for the top of rack (ToR) switches, compared to 32 full width blade servers. However, as of the Icehouse release, 1U servers from the major ven- dors are limited to dual-socket, multi-core CPU configurations. To ob- tain greater than dual-socket support in a 1U rack-mount form factor, you will need to buy systems from original design (ODMs) or second-tier manufacturers. Architecture Guide March 17, 2015 current 68 • 2U rack-mounted servers provide quad-socket, multi-core CPU support, but with a corresponding decrease in server density (half the density of- fered by 1U rack-mounted servers). • Larger rack-mounted servers, such as 4U servers, often provide even greater CPU capacity, commonly supporting four or even eight CPU sockets. These servers have greater expandability, but such servers have much lower server density and are often more expensive. •"Sled servers" (rack-mounted servers that support multiple independent servers in a single 2U or 3U enclosure) deliver increased density as com- pared to typical 1U or 2U rack-mounted servers. For example, many sled servers offer four independent dual-socket nodes in 2U for a total of eight CPU sockets in 2U. However, the dual-socket limitation on individu- al nodes may not be sufficient to offset their additional cost and config- uration complexity. Consider these facts when choosing server hardware for a compute- fo- cused OpenStack design architecture: Instance density In a compute-focused architecture, in- stance density is lower, which means CPU and RAM over-subscription ratios are also lower. More hosts will be re- quired to support the anticipated scale due to instance density being lower, es- pecially if the design uses dual-socket hardware designs. Host density Another option to address the higher host count that might be needed with dual socket designs is to use a quad socket platform. Taking this approach will decrease host density, which in- creases rack count. This configuration may affect the network requirements, the number of power connections, and possibly impact the cooling require- ments. Power and cooling density The power and cooling density require- ments might be lower than with blade, sled, or 1U server designs because of lower host density (by using 2U, 3U or even 4U server designs). For data cen- Architecture Guide March 17, 2015 current 69 ters with older infrastructure, this may be a desirable feature. Compute-focused OpenStack design architecture server hardware selec- tion results in a "scale up" versus "scale out" decision. Selecting a better so- lution, smaller number of larger hosts, or a larger number of smaller hosts depends on a combination of factors: cost, power, cooling, physical rack and floor space, support-warranty, and manageability. Storage hardware selection For a compute-focused OpenStack design architecture, the selection of storage hardware is not critical as it is not primary criteria, however it is still important. There are a number of different factors that a cloud architect must consider: Cost The overall cost of the solution will play a major role in what storage architecture (and resulting storage hard- ware) is selected. Performance The performance of the solution is also a big role and can be measured by observing the latency of storage I-O requests. In a compute-focused OpenStack cloud, storage latency can be a major consideration. In some compute-intensive workloads, minimizing the delays that the CPU experiences while fetching data from the storage can have a significant impact on the overall performance of the application. Scalability This section will refer to the term "scalability" to refer to how well the storage solution performs as it is ex- panded up to its maximum size. A storage solution that performs well in small configurations but has de- grading performance as it expands would not be con- sidered scalable. On the other hand, a solution that continues to perform well at maximum expansion would be considered scalable. Expandability Expandability refers to the overall ability of the solu- tion to grow. A storage solution that expands to 50 PB is considered more expandable than a solution that only scales to 10PB. Note that this metric is related to, but different from, scalability, which is a measure of the solution's performance as it expands. Architecture Guide March 17, 2015 current 70 For a compute-focused OpenStack cloud, latency of storage is a major con- sideration. Using solid-state disks (SSDs) to minimize latency for instance storage and reduce CPU delays caused by waiting for the storage will in- crease performance. Consider using RAID controller cards in compute hosts to improve the performance of the underlying disk subsystem. The selection of storage architecture, and the corresponding storage hard- ware (if there is the option), is determined by evaluating possible solutions against the key factors listed above. This will determine if a scale-out solu- tion (such as Ceph, GlusterFS, or similar) should be used, or if a single, high- ly expandable and scalable centralized storage array would be a better choice. If a centralized storage array is the right fit for the requirements, the hardware will be determined by the array vendor. It is also possible to build a storage array using commodity hardware with Open Source soft- ware, but there needs to be access to people with expertise to build such a system. Conversely, a scale-out storage solution that uses direct-attached storage (DAS) in the servers may be an appropriate choice. If so, then the server hardware needs to be configured to support the storage solution. The following lists some of the potential impacts that may affect a partic- ular storage architecture, and the corresponding storage hardware, of a compute-focused OpenStack cloud: Connectivity Based on the storage solution selected, ensure the connectivity matches the storage solution require- ments. If a centralized storage array is selected, it is important to determine how the hypervisors will connect to the storage array. Connectivity could af- fect latency and thus performance, so check that the network characteristics will minimize latency to boost the overall performance of the design. Latency Determine if the use case will have consistent or highly variable latency. Throughput To improve overall performance, make sure that the storage solution throughout is optimized. While it is not likely that a compute-focused cloud will have major data I-O to and from storage, this is an important factor to consider. Server Hardware If the solution uses DAS, this impacts, and is not lim- ited to, the server hardware choice that will ripple into host density, instance density, power density, OS-hypervisor, and management tools. Architecture Guide March 17, 2015 current 71 Where instances need to be made highly available, or they need to be ca- pable of migration between hosts, use of a shared storage file-system to store instance ephemeral data should be employed to ensure that com- pute services can run uninterrupted in the event of a node failure. Selecting networking hardware Some of the key considerations that should be included in the selection of networking hardware include: Port count The design will require networking hardware that has the requisite port count. Port density The network design will be affected by the physical space that is required to provide the requisite port count. A switch that can provide 48 10 GbE ports in 1U has a much higher port density than a switch that provides 24 10 GbE ports in 2U. A higher port density is preferred, as it leaves more rack space for compute or storage components that might be required by the design. This also leads into concerns about fault domains and power density that must al- so be considered. Higher density switches are more expensive and should also be considered, as it is important not to over design the net- work if it is not required. Port speed The networking hardware must support the proposed network speed, for example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE). Redundancy The level of network hardware redundan- cy required is influenced by the user require- ments for high availability and cost consider- ations. Network redundancy can be achieved by adding redundant power supplies or paired switches. If this is a requirement, the hardware will need to support this configuration. User re- quirements will determine if a completely re- dundant network infrastructure is required. Power requirements Ensure that the physical data center provides the necessary power for the selected network hardware. This is not an issue for top of rack Architecture Guide March 17, 2015 current 72 (ToR) switches, but may be an issue for spine switches in a leaf and spine fabric, or end of row (EoR) switches. It is important to first understand additional factors as well as the use case because these additional factors heavily influence the cloud network ar- chitecture. Once these key considerations have been decided, the proper network can be designed to best serve the workloads being placed in the cloud. We recommend designing the network architecture using a scalable net- work model that makes it easy to add capacity and bandwidth. A good ex- ample of such a model is the leaf-spline model. In this type of network de- sign, it is possible to easily add additional bandwidth as well as scale out to additional racks of gear. It is important to select network hardware that will support the required port count, port speed and port density while al- so allowing for future growth as workload demands increase. It is also im- portant to evaluate where in the network architecture it is valuable to pro- vide redundancy. Increased network availability and redundancy comes at a cost, therefore we recommend to weigh the cost versus the benefit gained from utilizing and deploying redundant network switches and us- ing bonded interfaces at the host level. Software selection Selecting software to be included in a compute-focused OpenStack archi- tecture design must include three main areas: • Operating system (OS) and hypervisor • OpenStack components • Supplemental software Design decisions made in each of these areas impact the rest of the Open- Stack architecture design. Operating system and hypervisor The selection of operating system (OS) and hypervisor has a significant im- pact on the end point design. Selecting a particular operating system and hypervisor could affect server hardware selection. For example, a selected combination needs to be supported on the selected hardware. Ensuring the storage hardware selection and topology supports the selected oper- ating system and hypervisor combination should also be considered. Addi- Architecture Guide March 17, 2015 current 73 tionally, make sure that the networking hardware selection and topology will work with the chosen operating system and hypervisor combination. For example, if the design uses Link Aggregation Control Protocol (LACP), the hypervisor needs to support it. Some areas that could be impacted by the selection of OS and hypervisor include: Cost Selecting a commercially supported hypervi- sor such as Microsoft Hyper-V will result in a different cost model rather than choosing a community-supported open source hy- pervisor like Kinstance or Xen. Even within the ranks of open source solutions, choos- ing Ubuntu over Red Hat (or vice versa) will have an impact on cost due to support con- tracts. On the other hand, business or appli- cation requirements might dictate a specific or commercially supported hypervisor. Supportability Depending on the selected hypervisor, the staff should have the appropriate training and knowledge to support the selected OS and hypervisor combination. If they do not, training will need to be provided which could have a cost impact on the design. Management tools The management tools used for Ubuntu and Kinstance differ from the management tools for VMware vSphere. Although both OS and hypervisor combinations are sup- ported by OpenStack, there will be very dif- ferent impacts to the rest of the design as a result of the selection of one combination versus the other. Scale and performance Ensure that selected OS and hypervisor combinations meet the appropriate scale and performance requirements. The chosen architecture will need to meet the targeted instance-host ratios with the selected OS- hypervisor combination. Security Ensure that the design can accommodate the regular periodic installation of applica- Architecture Guide March 17, 2015 current 74 tion security patches while maintaining the required workloads. The frequency of secu- rity patches for the proposed OS-hypervi- sor combination will have an impact on per- formance and the patch installation process could affect maintenance windows. Supported features Determine what features of OpenStack are required. This will often determine the se- lection of the OS-hypervisor combination. Certain features are only available with spe- cific OSs or hypervisors. For example, if cer- tain features are not available, the design might need to be modified to meet the us- er requirements. Interoperability Consideration should be given to the ability of the selected OS-hypervisor combination to interoperate or co-exist with other OS- hypervisors, or other software solutions in the overall design (if required). Operational and troubleshooting tools for one OS-hy- pervisor combination may differ from the tools used for another OS-hypervisor combi- nation and, as a result, the design will need to address if the two sets of tools need to interoperate. OpenStack components The selection of which OpenStack components will actually be included in the design and deployed has significant impact. There are certain compo- nents that will always be present, (Compute and Image Service, for exam- ple) yet there are other services that might not need to be present. For ex- ample, a certain design may not require the Orchestration module. Omit- ting Heat would not typically have a significant impact on the overall de- sign. However, if the architecture uses a replacement for OpenStack Ob- ject Storage for its storage component, this could potentially have signifi- cant impacts on the rest of the design. For a compute-focused OpenStack design architecture, the following com- ponents would be used: • Identity (keystone) Architecture Guide March 17, 2015 current 75 • Dashboard (horizon) • Compute (nova) • Object Storage (swift, ceph or a commercial solution) • Image (glance) • Networking (neutron) • Orchestration (heat) OpenStack Block Storage would potentially not be incorporated into a compute-focused design due to persistent block storage not being a signif- icant requirement for the types of workloads that would be deployed on- to instances running in a compute-focused cloud. However, there may be some situations where the need for performance dictates that a block stor- age component be used to improve data I-O. The exclusion of certain OpenStack components might also limit or con- strain the functionality of other components. If a design opts to include the Orchestration module but excludes the Telemetry module, then the design will not be able to take advantage of Orchestration's auto scaling functionality (which relies on information from Telemetry). This is due to the fact that you can use Orchestration to spin up a large number of in- stances to perform the compute-intensive processing. This includes Orches- tration in a compute-focused architecture design, which is strongly recom- mended. Supplemental software While OpenStack is a fairly complete collection of software projects for building a platform for cloud services, there are invariably additional pieces of software that might need to be added to any given OpenStack design. Networking software OpenStack Networking provides a wide variety of networking services for instances. There are many additional networking software packages that might be useful to manage the OpenStack components themselves. Some examples include software to provide load balancing, network redundan- cy protocols, and routing daemons. Some of these software packages are described in more detail in the OpenStack High Availability Guide (http:// docs.openstack.org/high-availability-guide/content). Architecture Guide March 17, 2015 current 76 For a compute-focused OpenStack cloud, the OpenStack infrastructure components will need to be highly available. If the design does not include hardware load balancing, networking software packages like HAProxy will need to be included. Management software The selected supplemental software solution impacts and affects the over- all OpenStack cloud design. This includes software for providing clustering, logging, monitoring and alerting. Inclusion of clustering Software, such as Corosync or Pacemaker, is deter- mined primarily by the availability design requirements. Therefore, the im- pact of including (or not including) these software packages is primari- ly determined by the availability of the cloud infrastructure and the com- plexity of supporting the configuration after it is deployed. The OpenStack High Availability Guide provides more details on the installation and con- figuration of Corosync and Pacemaker, should these packages need to be included in the design. Requirements for logging, monitoring, and alerting are determined by op- erational considerations. Each of these sub-categories includes a number of various options. For example, in the logging sub-category one might con- sider Logstash, Splunk, Log Insight, or some other log aggregation-consoli- dation tool. Logs should be stored in a centralized location to make it easi- er to perform analytics against the data. Log data analytics engines can al- so provide automation and issue notification by providing a mechanism to both alert and automatically attempt to remediate some of the more com- monly known issues. If any of these software packages are needed, then the design must ac- count for the additional resource consumption (CPU, RAM, storage, and network bandwidth for a log aggregation solution, for example). Some other potential design impacts include: • OS-hypervisor combination: Ensure that the selected logging, monitor- ing, or alerting tools support the proposed OS-hypervisor combination. • Network hardware: The network hardware selection needs to be sup- ported by the logging, monitoring, and alerting software. Database software A large majority of the OpenStack components require access to back-end database services to store state and configuration information. Selection Architecture Guide March 17, 2015 current 77 of an appropriate back-end database that will satisfy the availability and fault tolerance requirements of the OpenStack services is required. Open- Stack services support connecting to any database that is supported by the SQLAlchemy Python drivers, however most common database deploy- ments make use of MySQL or some variation of it. We recommend that the database which provides back-end services within a general-purpose cloud, be made highly available using an available technology which can accomplish that goal. Some of the more common software solutions used include Galera, MariaDB and MySQL with multi-master replication. Prescriptive examples The Conseil Européen pour la Recherche Nucléaire (CERN), also known as the European Organization for, Nuclear Research provides particle acceler- ators and other infrastructure for high-energy physics research. As of 2011 CERN operated these two compute centers in Europe with plans to add a third. Data center Approximate capacity Geneva, Switzerland • 3.5 Mega Watts • 91000 cores • 120 PB HDD • 100 PB Tape • 310 TB Memory Budapest, Hungary • 2.5 Mega Watts • 20000 cores • 6 PB HDD To support a growing number of compute heavy users of experiments re- lated to the Large Hadron Collider (LHC) CERN ultimately elected to deploy an OpenStack cloud using Scientific Linux and RDO. This effort aimed to simplify the management of the center's compute resources with a view to doubling compute capacity through the addition of an additional data center in 2013 while maintaining the same levels of compute staff. The CERN solution uses cells for segregation of compute resources and to transparently scale between different data centers. This decision meant trading off support for security groups and live migration. In addition some details like flavors needed to be manually replicated across cells. In Architecture Guide March 17, 2015 current 78 spite of these drawbacks cells were determined to provide the required scale while exposing a single public API endpoint to users. A compute cell was created for each of the two original data centers and a third was created when a new data center was added in 2013. Each cell contains three availability zones to further segregate compute resources and at least three RabbitMQ message brokers configured to be clustered with mirrored queues for high availability. The API cell, which resides behind a HAProxy load balancer, is located in the data center in Switzerland and directs API calls to compute cells using a customized variation of the cell scheduler. The customizations allow cer- tain workloads to be directed to a specific data center or "all" data centers with cell selection determined by cell RAM availability in the latter case. There is also some customization of the filter scheduler that handles place- ment within the cells: • ImagePropertiesFilter - To provide special handling depending on the guest operating system in use (Linux-based or Windows-based). • ProjectsToAggregateFilter - To provide special handling depending on the project the instance is associated with. Architecture Guide March 17, 2015 current 79 • default_schedule_zones - Allows the selection of multiple default avail- ability zones, rather than a single default. The MySQL database server in each cell is managed by a central database team and configured in an active/passive configuration with a NetApp storage back end. Backups are performed ever 6 hours. Network architecture To integrate with existing CERN networking infrastructure customizations were made to legacy networking (nova-network). This was in the form of a driver to integrate with CERN's existing database for tracking MAC and IP address assignments. The driver facilitates selection of a MAC address and IP for new instances based on the compute node the scheduler places the instance on The driver considers the compute node that the scheduler placed an in- stance on and then selects a MAC address and IP from the pre-registered list associated with that node in the database. The database is then updat- ed to reflect the instance the addresses were assigned to. Storage architecture The OpenStack Image Service is deployed in the API cell and configured to expose version 1 (V1) of the API. As a result the image registry is also re- quired. The storage back end in use is a 3 PB Ceph cluster. A small set of "golden" Scientific Linux 5 and 6 images are maintained which applications can in turn be placed on using orchestration tools. Pup- pet is used for instance configuration management and customization but Orchestration deployment is expected. Monitoring Although direct billing is not required, the Telemetry module is used to perform metering for the purposes of adjusting project quotas. A shard- ed, replicated, MongoDB back end is used. To spread API load, instances of the nova-api service were deployed within the child cells for Telemetry to query against. This also meant that some supporting services including keystone, glance-api and glance-registry needed to also be configured in the child cells. Architecture Guide March 17, 2015 current 80 Additional monitoring tools in use include Flume, Elastic Search, Kibana, and the CERN developed Lemon project. References The authors of the Architecture Design Guide would like to thank CERN for publicly documenting their OpenStack deployment in these resources, which formed the basis for this chapter: • http://openstack-in-production.blogspot.fr • Deep dive into the CERN Cloud Infrastructure Architecture Guide March 17, 2015 current 81 4. Storage focused Table of Contents User requirements ................................................................................ 82 Technical considerations ....................................................................... 84 Operational considerations .................................................................. 85 Architecture ......................................................................................... 91 Prescriptive examples ......................................................................... 102 Cloud storage is a model of data storage that stores digital data in logical pools and physical storage that spans across multiple servers and locations. Cloud storage commonly refers to a hosted object storage service, howev- er the term also includes other types of data storage that are available as a service, for example block storage. Cloud storage runs on virtualized infrastructure and resembles broader cloud computing in terms of accessible interfaces, elasticity, scalability, mul- ti-tenancy, and metered resources. You can use cloud storage services from an off-premises service or deploy on-premises. Cloud storage consists of many distributed, synonymous resources, which are often referred to as integrated storage clouds. Cloud storage is highly fault tolerant through redundancy and the distribution of data. It is high- ly durable through the creation of versioned copies, and can be consistent with regard to data replicas. At large scale, management of data operations is a resource intensive pro- cess for an organization. Hierarchical storage management (HSM) systems and data grids help annotate and report a baseline data valuation to make intelligent decisions and automate data decisions. HSM enables automated tiering and movement, as well as orchestration of data operations. A da- ta grid is an architecture, or set of services evolving technology, that brings together sets of services enabling users to manage large data sets. Example applications deployed with cloud storage characteristics: • Active archive, backups and hierarchical storage management. • General content storage and synchronization. An example of this is pri- vate dropbox. • Data analytics with parallel file systems. Architecture Guide March 17, 2015 current 82 • Unstructured data store for services. For example, social media back-end storage. • Persistent block storage. • Operating system and application image store. • Media streaming. • Databases. • Content distribution. • Cloud storage peering. User requirements Requirements for data define storage-focused clouds. These include: • Performance • Access patterns • Data structures A balance between cost and user requirements dictate what methods and technologies to use in a cloud architecture. Cost The user pays only for the storage they actually use. This limit typically reflects average user con- sumption during a month. This does not mean that cloud storage is less expensive, only that it incurs operating expenses rather than capital expenses. Legal requirements Multiple jurisdictions have legislative and regu- latory requirements governing the storage and management of data in cloud environments. Common areas of regulation include data re- tention policies and data ownership policies. Legal requirements Many jurisdictions have legislative and regulatory requirements governing the storage and management of data in cloud environments. Common ar- eas of regulation include: Architecture Guide March 17, 2015 current 83 Note Examples of such legal frameworks include the data protec- tion framework of the European Union and the requirements of the Financial Industry Regulatory Authority in the United States. Consult a local regulatory body for more information. Data retention Policies ensuring storage of persistent data and records management to meet data archival re- quirements. Data ownership Policies governing the possession and responsibili- ty for data. Data sovereignty Policies governing the storage of data in foreign countries or otherwise separate jurisdictions. Data compliance Policies governing types of information that must reside in certain locations due to regulatory issues and cannot reside in other locations for the same reason. Technical requirements You can incorporate the following technical requirements into the archi- tecture design: Storage proximity In order to provide high performance or large amounts of storage space, the design may have to accommodate storage that is attached to each hypervisor or served from a central storage device. Performance To boost performance, the organization may want to make use of different technologies to cache disk activity. Availability Specific requirements regarding availability influ- ence the technology used to store and protect data. These requirements influence cost and the implemented solution. Security You must protect data both in transit and at rest. Architecture Guide March 17, 2015 current 84 Technical considerations Some of the key technical considerations that are critical to a storage-fo- cused OpenStack design architecture include: Input-output requirements Input-Output performance require- ments require researching and mod- eling before deciding on a final stor- age framework. Running benchmarks for Input-Output performance pro- vides a baseline for expected perfor- mance levels. If these tests include de- tails, then the resulting data can help model behavior and results during dif- ferent workloads. Running scripted smaller benchmarks during the life cy- cle of the architecture helps record the system health at different points in time. The data from these scripted benchmarks assist in future scoping and gaining a deeper understanding of an organization's needs. Scale Scaling storage solutions in a stor- age-focused OpenStack architecture design is driven by initial requirements, including IOPS, capacity, bandwidth, and future needs. Planning capacity based on projected needs over the course of a budget cycle is important for a design. The architecture should balance cost and capacity, while also allowing flexibility to implement new technologies and methods as they be- come available. Security Designing security around data has multiple points of focus that vary de- pending on SLAs, legal requirements, industry regulations, and certifications needed for systems or people. Consider compliance with HIPPA, ISO9000, and SOX based on the type of data. For cer- Architecture Guide March 17, 2015 current 85 tain organizations, levels of access con- trol are important. OpenStack compatibility Interoperability and integration with OpenStack can be paramount in decid- ing on a storage hardware and storage management platform. Interoperabili- ty and integration includes factors such as OpenStack Block Storage interoper- ability, OpenStack Object Storage com- patibility, and hypervisor compatibility (which affects the ability to use storage for ephemeral instance storage). Storage management You must address a range of storage management-related considerations in the design of a storage-focused Open- Stack cloud. These considerations in- clude, but are not limited to, backup strategy (and restore strategy, since a backup that can not be restored is use- less), data valuation-hierarchical stor- age management, retention strate- gy, data placement, and workflow au- tomation. Data grids Data grids are helpful when answering questions around data valuation. Data grids improve decision making through correlation of access patterns, owner- ship, and business-unit revenue with other metadata values to deliver ac- tionable information about data. When building a storage-focused OpenStack architecture, strive to build a flexible design based on an industry standard core. One way of accom- plishing this might be through the use of different back ends serving differ- ent use cases. Operational considerations Operational factors affect the design choices for a general purpose cloud, and operations staff receive tasks regarding the maintenance of cloud en- vironments for larger installations. Architecture Guide March 17, 2015 current 86 Maintenance tasks The storage solution should take into account storage maintenance and the impact on underlying workloads. Reliability and availability Reliability and availability depend on wide area network availability and on the level of precautions taken by the service provider. Flexibility Organizations need to have the flexibil- ity to choose between off-premise and on-premise cloud storage options. This concept relies on relevant decision cri- teria that is complementary to initial di- rect cost savings potential. For exam- ple, continuity of operations, disaster recovery, security, and records reten- tion laws, regulations, and policies. Monitoring and alerting services are vitally important in cloud environ- ments with high demands on storage resources. These services provide a real-time view into the health and performance of the storage systems. An integrated management console, or other dashboards capable of visualiz- ing SNMP data, is helpful when discovering and resolving issues that arise within the storage cluster. A storage-focused cloud design should include: • Monitoring of physical hardware resources. • Monitoring of environmental resources such as temperature and humid- ity. • Monitoring of storage resources such as available storage, memory and CPU. • Monitoring of advanced storage performance data to ensure that stor- age systems are performing as expected. • Monitoring of network resources for service disruptions which would af- fect access to storage. • Centralized log collection. • Log analytics capabilities. Architecture Guide March 17, 2015 current 87 • Ticketing system (or integration with a ticketing system) to track issues. • Alerting and notification of responsible teams or automated systems which remediate problems with storage as they arise. • Network Operations Center (NOC) staffed and always available to re- solve issues. Management efficiency Operations personnel are often required to replace failed drives or nodes and provide ongoing maintenance of the storage hardware. Provisioning and configuration of new or upgraded storage is another im- portant consideration when it comes to management of resources. The ability to easily deploy, configure, and manage storage hardware results in a solution that is easy to manage. This also makes use of management sys- tems that can automate other pieces of the overall solution. For example, replication, retention, data backup and recovery. Application awareness Well-designed applications should be aware of underlying storage subsys- tems, in order to use cloud storage solutions effectively. If natively available replication is not available, operations personnel must be able to modify the application so that they can provide their own repli- cation service. In the event that replication is unavailable, operations per- sonnel can design applications to react such that they can provide their own replication services. An application designed to detect underlying storage systems can function in a wide variety of infrastructures, and still have the same basic behavior regardless of the differences in the underly- ing infrastructure. Fault tolerance and availability Designing for fault tolerance and availability of storage systems in an OpenStack cloud is vastly different when comparing the Block Storage and Object Storage services. The Object Storage service design features consis- tency and partition tolerance as a function of the application. Therefore, it does not have any reliance on hardware RAID controllers to provide redun- dancy for physical disks. Architecture Guide March 17, 2015 current 88 Block Storage fault tolerance and availability Block Storage resource nodes are commonly configured with advanced RAID controllers and high performance disks to provide fault tolerance at the hardware level. Deploy high performing storage solutions such as SSD disk drives or flash storage systems in cases where applications require extreme performance out of Block Storage devices. In environments that place extreme demands on Block Storage, we rec- ommend using multiple storage pools. In this case, each pool of devices should have a similar hardware design and disk configuration across all hardware nodes in that pool. This allows for a design that provides applica- tions with access to a wide variety of Block Storage pools, each with their own redundancy, availability, and performance characteristics. When de- ploying multiple pools of storage it is also important to consider the im- pact on the Block Storage scheduler which is responsible for provisioning storage across resource nodes. Ensuring that applications can schedule vol- umes in multiple regions, each with their own network, power, and cool- ing infrastructure, can give tenants the ability to build fault tolerant appli- cations that are distributed across multiple availability zones. In addition to the Block Storage resource nodes, it is important to design for high availability and redundancy of the APIs and related services that are responsible for provisioning and providing access to storage. We rec- ommend designing a layer of hardware or software load balancers in or- der to achieve high availability of the appropriate REST API services to provide uninterrupted service. In some cases, it may also be necessary to deploy an additional layer of load balancing to provide access to back- end database services responsible for servicing and storing the state of Block Storage volumes. We also recommend designing a highly available database solution to store the Block Storage databases. A number of high- ly available database solutions such as Galera and MariaDB can be lever- aged to help keep database services online to provide uninterrupted ac- cess so that tenants can manage Block Storage volumes. In a cloud with extreme demands on Block Storage, the network archi- tecture should take into account the amount of East-West bandwidth re- quired for instances to make use of the available storage resources. The se- lected network devices should support jumbo frames for transferring large blocks of data. In some cases, it may be necessary to create an additional back-end storage network dedicated to providing connectivity between in- stances and Block Storage resources so that there is no contention of net- work resources. Architecture Guide March 17, 2015 current 89 Object Storage fault tolerance and availability While consistency and partition tolerance are both inherent features of the Object Storage service, it is important to design the overall storage ar- chitecture to ensure that the implemented system meets those goals. The OpenStack Object Storage service places a specific number of data repli- cas as objects on resource nodes. These replicas are distributed throughout the cluster based on a consistent hash ring which exists on all nodes in the cluster. Design the Object Storage system with a sufficient number of zones to provide quorum for the number of replicas defined. For example, with three replicas configured in the Swift cluster, the recommended number of zones to configure within the Object Storage cluster in order to achieve quorum is 5. While it is possible to deploy a solution with fewer zones, the implied risk of doing so is that some data may not be available and API re- quests to certain objects stored in the cluster might fail. For this reason, en- sure you properly account for the number of zones in the Object Storage cluster. Each Object Storage zone should be self-contained within its own avail- ability zone. Each availability zone should have independent access to net- work, power and cooling infrastructure to ensure uninterrupted access to data. In addition, a pool of Object Storage proxy servers providing access to data stored on the object nodes should service each availability zone. Object proxies in each region should leverage local read and write affini- ty so that local storage resources facilitate access to objects wherever pos- sible. We recommend deploying upstream load balancing to ensure that proxy services are distributed across the multiple zones and, in some cas- es, it may be necessary to make use of third party solutions to aid with ge- ographical distribution of services. A zone within an Object Storage cluster is a logical division. Any of the fol- lowing may represent a zone: • A disk within a single node • One zone per node • Zone per collection of nodes • Multiple racks • Multiple DCs Architecture Guide March 17, 2015 current 90 Selecting the proper zone design is crucial for allowing the Object Storage cluster to scale while providing an available and redundant storage system. It may be necessary to configure storage policies that have different re- quirements with regards to replicas, retention and other factors that could heavily affect the design of storage in a specific zone. Scaling storage services Adding storage capacity and bandwidth is a very different process when comparing the Block and Object Storage services. While adding Block Stor- age capacity is a relatively simple process, adding capacity and bandwidth to the Object Storage systems is a complex task that requires careful plan- ning and consideration during the design phase. Scaling Block Storage You can upgrade Block Storage pools to add storage capacity without in- terruption to the overall Block Storage service. Add nodes to the pool by installing and configuring the appropriate hardware and software and then allowing that node to report in to the proper storage pool via the message bus. This is because Block Storage nodes report into the scheduler service advertising their availability. Once the node is online and available tenants can make use of those storage resources instantly. In some cases, the demand on Block Storage from instances may exhaust the available network bandwidth. As a result, design network infrastruc- ture that services Block Storage resources in such a way that you can add capacity and bandwidth easily. This often involves the use of dynamic rout- ing protocols or advanced networking solutions to add capacity to down- stream devices easily. Both the front-end and back-end storage network designs should encompass the ability to quickly and easily add capacity and bandwidth. Scaling Object Storage Adding back-end storage capacity to an Object Storage cluster requires careful planning and consideration. In the design phase it is important to determine the maximum partition power required by the Object Storage service, which determines the maximum number of partitions which can exist. Object Storage distributes data among all available storage, but a partition cannot span more than one disk, so the maximum number of partitions can only be as high as the number of disks. For example, a system that starts with a single disk and a partition pow- er of 3 can have 8 (2^3) partitions. Adding a second disk means that each Architecture Guide March 17, 2015 current 91 has 4 partitions. The one-disk-per-partition limit means that this system can never have more than 8 disks, limiting its scalability. However, a system that starts with a single disk and a partition power of 10 can have up to 1024 (2^10) disks. As you add back-end storage capacity to the system, the partition maps re- distribute data amongst the storage nodes. In some cases, this replication consists of extremely large data sets. In these cases, we recommend using back-end replication links that do not contend with tenants' access to data. As more tenants begin to access data within the cluster and their data sets grow it is necessary to add front-end bandwidth to service data access re- quests. Adding front-end bandwidth to an Object Storage cluster requires careful planning and design of the Object Storage proxies that tenants use to gain access to the data, along with the high availability solutions that enable easy scaling of the proxy layer. We recommend designing a front- end load balancing layer that tenants and consumers use to gain access to data stored within the cluster. This load balancing layer may be distributed across zones, regions or even across geographic boundaries, which may al- so require that the design encompass geo-location solutions. In some cases, you must add bandwidth and capacity to the network re- sources servicing requests between proxy servers and storage nodes. For this reason, the network architecture used for access to storage nodes and proxy servers should make use of a design which is scalable. Architecture There are three areas to consider when selecting storage hardware: • Cost • Performance • Reliability Storage-focused OpenStack clouds must reflect that the workloads are storage intensive. These workloads are not compute intensive, nor are they consistently network intensive. The network may be heavily utilized to transfer storage, but they are not otherwise network intensive. For a storage-focused OpenStack design architecture, the selection of stor- age hardware determines the overall performance and scalability of the design architecture. Several factors impact the design process: Architecture Guide March 17, 2015 current 92 Cost The cost of components affects which storage archi- tecture and hardware you choose. Performance The latency of storage I/O requests indicates perfor- mance. Performance requirements affect which solu- tion you choose. Scalability Scalability refers to how the storage solution performs as it expands to its maximum size. Storage solutions that perform well in small configurations but have de- graded performance in large configurations are not scalable. A solution that performs well at maximum ex- pansion is scalable. Large deployments require a stor- age solution that performs well as it expands. Expandability Expandability is the overall ability of the solution to grow. A storage solution that expands to 50 PB is more expandable than a solution that only scales to 10 PB. Note This metric is related to scalability. Latency is a key consideration in a storage-focused OpenStack cloud. Using solid-state disks (SSDs) to minimize latency for instance storage, and to re- duce CPU delays caused by waiting for the storage, increases performance. We recommend evaluating the gains from using RAID controller cards in compute hosts to improve the performance of the underlying disk subsys- tem. The selection of storage architecture determines if a scale-out solution should be used or if a single, highly expandable and scalable centralized storage array would be a better choice. If a centralized storage array is the right fit for the requirements then the array vendor determines the hardware selection. It is possible to build a storage array using commodity hardware with Open Source software, but requires people with expertise to build such a system. On the other hand, a scale-out storage solution that uses direct-attached storage (DAS) in the servers may be an appropriate choice. This requires configuration of the server hardware to support the storage solution. Considerations affecting storage architecture (and corresponding storage hardware) of a Storage-focused OpenStack cloud: Architecture Guide March 17, 2015 current 93 Connectivity Based on the selected storage solution, ensure the connectivity matches the storage solution require- ments. If selecting centralized storage array, deter- mine how the hypervisors connect to the storage array. Connectivity can affect latency and thus per- formance. We recommended confirming that the network characteristics minimize latency to boost the overall performance of the design. Latency Determine if the use case has consistent or highly variable latency. Throughput Ensure that the storage solution throughput is opti- mized based on application requirements. Server hardware Use of DAS impacts the server hardware choice and affects host density, instance density, power densi- ty, OS-hypervisor, and management tools. Compute (server) hardware selection Evaluate Compute (server) hardware four opposing dimensions: Server density A measure of how many servers can fit into a giv- en measure of physical space, such as a rack unit [U]. Resource capacity The number of CPU cores, how much RAM, or how much storage a given server delivers. Expandability The number of additional resources you can add to a server before it reaches capacity. Cost The relative of the hardware weighted against the level of design effort needed to build the sys- tem. You must weigh the dimensions against each other to determine the best design for the desired purpose. For example, increasing server density can mean sacrificing resource capacity or expandability. Increasing resource capacity and expandability can increase cost but decrease server density. Decreasing cost often means decreasing supportability, server density, re- source capacity, and expandability. Compute capacity (CPU cores and RAM capacity) is a secondary consider- ation for selecting server hardware. As a result, the required server hard- Architecture Guide March 17, 2015 current 94 ware must supply adequate CPU sockets, additional CPU cores, and more RAM; network connectivity and storage capacity are not as critical. The hardware needs to provide enough network connectivity and storage ca- pacity to meet the user requirements, however they are not the primary consideration. Some server hardware form factors are better suited to storage-focused designs than others. The following is a list of these form factors: • Most blade servers typically support dual-socket multi-core CPUs. Choose either full width or full height blades to avoid the limit. High density blade servers support up to 16 servers in only 10 rack units using half height or half width blades. Warning This decreases density by 50% (only 8 servers in 10 U) if a full width or full height option is used. • 1U rack-mounted servers have the ability to offer greater server density than a blade server solution, but are often limited to dual-socket, mul- ti-core CPU configurations. Note As of the Icehouse release, neither HP, IBM, nor Dell offered 1U rack servers with more than 2 CPU sockets. To obtain greater than dual-socket support in a 1U rack-mount form fac- tor, customers need to buy their systems from Original Design Manufac- turers (ODMs) or second-tier manufacturers. Warning This may cause issues for organizations that have preferred vendor policies or concerns with support and hardware war- ranties of non-tier 1 vendors. • 2U rack-mounted servers provide quad-socket, multi-core CPU support but with a corresponding decrease in server density (half the density of- fered by 1U rack-mounted servers). • Larger rack-mounted servers, such as 4U servers, often provide even greater CPU capacity. Commonly supporting four or even eight CPU Architecture Guide March 17, 2015 current 95 sockets. These servers have greater expandability but such servers have much lower server density and usually greater hardware cost. • Rack-mounted servers that support multiple independent servers in a single 2U or 3U enclosure, "sled servers", deliver increased density as compared to a typical 1U-2U rack-mounted servers. For example, many sled servers offer four independent dual-socket nodes in 2U for a total of 8 CPU sockets in 2U. However, the dual-socket limitation on individual nodes may not be sufficient to offset their addi- tional cost and configuration complexity. Other factors strongly influence server hardware selection for a storage-fo- cused OpenStack design architecture. The following is a list of these fac- tors: Instance density In this architecture, instance density and CPU-RAM oversubscription are lower. You require more hosts to sup- port the anticipated scale, especially if the design uses dual-socket hardware designs. Host density Another option to address the higher host count is to use a quad socket plat- form. Taking this approach decreases host density which also increases rack count. This configuration affects the number of power connections and also impacts network and cooling require- ments. Power and cooling density The power and cooling density require- ments might be lower than with blade, sled, or 1U server designs due to lower host density (by using 2U, 3U or even 4U server designs). For data centers with older infrastructure, this might be a desirable feature. Storage-focused OpenStack design architecture server hardware selection should focus on a "scale up" versus "scale out" solution. The determination of which is the best solution, a smaller number of larger hosts or a larger number of smaller hosts, depends on a combination of factors including cost, power, cooling, physical rack and floor space, support-warranty, and manageability. Architecture Guide March 17, 2015 current 96 Networking hardware selection Key considerations for the selection of networking hardware include: Port count The user requires networking hardware that has the requisite port count. Port density The physical space required to provide the req- uisite port count affects the network design. A switch that provides 48 10 GbE ports in 1U has a much higher port density than a switch that provides 24 10 GbE ports in 2U. On a gen- eral scale, a higher port density leaves more rack space for compute or storage components which is preferred. It is also important to con- sider fault domains and power density. Final- ly, higher density switches are more expensive, therefore it is important not to over design the network. Port speed The networking hardware must support the proposed network speed, for example: 1 GbE, 10 GbE, or 40 GbE (or even 100 GbE). Redundancy User requirements for high availability and cost considerations influence the required level of network hardware redundancy. Achieve net- work redundancy by adding redundant power supplies or paired switches. Note If this is a requirement, the hard- ware must support this configura- tion. User requirements determine if a completely redundant network infrastructure is required. Power requirements Ensure that the physical data center provides the necessary power for the selected network hardware. This is not typically an issue for top of rack (ToR) switches, but may be an issue for spine switches in a leaf and spine fabric, or end of row (EoR) switches. Architecture Guide March 17, 2015 current 97 Protocol support It is possible to gain more performance out of a single storage system by using specialized net- work technologies such as RDMA, SRP, iSER and SCST. The specifics for using these technologies is beyond the scope of this book. Software selection Selecting software for a storage-focused OpenStack architecture design in- cludes three areas: • Operating system (OS) and hypervisor • OpenStack components • Supplemental software Design decisions made in each of these areas impacts the rest of the Open- Stack architecture design. Operating system and hypervisor Selecting the OS and hypervisor has a significant impact on the overall de- sign and also affects server hardware selection. Ensure that the selected operating system and hypervisor combination support the storage hard- ware and work with the networking hardware selection and topology. For example, Link Aggregation Control Protocol (LACP) requires support from both the OS and hypervisor. OS and hypervisor selection affect the following areas: Cost Selection of a commercially supported hy- pervisor, such as Microsoft Hyper-V, results in a different cost model than a communi- ty-supported open source hypervisor like Kinstance or Xen. Similarly, choosing Ubun- tu over Red Hat (or vice versa) impacts cost due to support contracts. However, busi- ness or application requirements might dic- tate a specific or commercially supported hypervisor. Supportability Staff must have training with the chosen hypervisor. Consider the cost of training when choosing a solution. The support of a commercial product such as Red Hat, SUSE, Architecture Guide March 17, 2015 current 98 or Windows, is the responsibility of the OS vendor. If an open source platform is cho- sen, the support comes from in-house re- sources. Management tools Ubuntu and Kinstance use different man- agement tools than VMware vSphere. Al- though both OS and hypervisor combina- tions are supported by OpenStack, there are varying impacts to the rest of the de- sign as a result of the selection of one com- bination versus the other. Scale and performance Ensure that the selected OS and hypervi- sor combination meet the appropriate scale and performance requirements needed for this storage focused OpenStack cloud. The chosen architecture must meet the target- ed instance-host ratios with the selected OS-hypervisor combination. Security Ensure that the design can accommodate the regular periodic installation of applica- tion security patches while maintaining the required workloads. The frequency of secu- rity patches for the proposed OS-hypervisor combination impacts performance and the patch installation process could affect main- tenance windows. Supported features Determine the required features of Open- Stack. This often determines the selection of the OS-hypervisor combination. Certain features are only available with specific OS- es or hypervisors. For example, if certain features are not available, you might need to modify the design to meet user require- ments. Interoperability Any chosen OS/hypervisor combination should be chosen based on the interoper- ability with one another, and other OS-hy- ervisor combinations. Operational and trou- bleshooting tools for one OS-hypervisor combination may differ from the tools used Architecture Guide March 17, 2015 current 99 for another OS-hypervisor combination. As a result, the design must address if the two sets of tools need to interoperate. OpenStack components Which OpenStack components you choose can have a significant impact on the overall design. While there are certain components that are always present, Compute and Image Service, for example, there are other services that may not need to be present. As an example, a certain design may not require the Orchestration module. Omitting Orchestration would not typi- cally have a significant impact on the overall design, however, if the archi- tecture uses a replacement for OpenStack Object Storage for its storage component, this could potentially have significant impacts on the rest of the design. A storage-focused design might require the ability to use Orchestration to launch instances with Block Storage volumes to perform storage-intensive processing. A storage-focused OpenStack design architecture typically uses the follow- ing components: • OpenStack Identity (keystone) • OpenStack dashboard (horizon) • OpenStack Compute (nova) (including the use of multiple hypervisor drivers) • OpenStack Object Storage (swift) (or another object storage solution) • OpenStack Block Storage (cinder) • OpenStack Image Service (glance) • OpenStack Networking (neutron) or legacy networking (nova-network) Excluding certain OpenStack components may limit or constrain the func- tionality of other components. If a design opts to include Orchestra- tion but exclude Telemetry, then the design cannot take advantage of Orchestration's auto scaling functionality (which relies on information from Telemetry). Due to the fact that you can use Orchestration to spin up a large number of instances to perform the compute-intensive processing, we strongly recommend including Orchestration in a compute-focused ar- chitecture design. Architecture Guide March 17, 2015 current 100 Supplemental software While OpenStack is a fairly complete collection of software projects for building a platform for cloud services, you may need to add other pieces of software. Networking software OpenStack Networking (neutron) provides a wide variety of networking services for instances. There are many additional networking software packages that may be useful to manage the OpenStack components them- selves. Some examples include HAProxy, keepalived, and various routing daemons (like Quagga). The OpenStack High Availability Guide describes some of these software packages, HAProxy in particular. See the Network controller cluster stack chapter of the OpenStack High Availability Guide. Management software Management software includes software for providing: • Clustering • Logging • Monitoring • Alerting Important The factors for determining which software packages in this category to select is outside the scope of this design guide. The availability design requirements determine the selection of Clustering Software, such as Corosync or Pacemaker. The availability of the cloud in- frastructure and the complexity of supporting the configuration after de- ployment determines the impact of including these software packages. The OpenStack High Availability Guide provides more details on the instal- lation and configuration of Corosync and Pacemaker. Operational considerations determine the requirements for logging, mon- itoring, and alerting. Each of these sub-categories includes options. For ex- ample, in the logging sub-category you could select Logstash, Splunk, Log Insight, or another log aggregation-consolidation tool. Store logs in a cen- tralized location to facilitate performing analytics against the data. Log da- Architecture Guide March 17, 2015 current 101 ta analytics engines can also provide automation and issue notification, by providing a mechanism to both alert and automatically attempt to remedi- ate some of the more commonly known issues. If you require any of these software packages, the design must account for the additional resource consumption (CPU, RAM, storage, and network bandwidth for a log aggregation solution, for example). Some other po- tential design impacts include: • OS-Hypervisor combination: Ensure that the selected logging, monitor- ing, or alerting tools support the proposed OS-hypervisor combination. • Network hardware: The network hardware selection needs to be sup- ported by the logging, monitoring, and alerting software. Database software Most OpenStack components require access to back-end database services to store state and configuration information. Choose an appropriate back- end database which satisfies the availability and fault tolerance require- ments of the OpenStack services. MySQL is the default database for OpenStack, but other compatible databases are available. Note Telemetry uses MongoDB. The chosen high availability database solution changes according to the se- lected database. MySQL, for example, provides several options. Use a repli- cation technology such as Galera for active-active clustering. For active-pas- sive use some form of shared storage. Each of these potential solutions has an impact on the design: • Solutions that employ Galera/MariaDB require at least three MySQL nodes. • MongoDB has its own design considerations for high availability. • OpenStack design, generally, does not include shared storage. However, for some high availability designs, certain components might require it depending on the specific implementation. Architecture Guide March 17, 2015 current 102 Prescriptive examples Storage-focused architecture highly depends on the specific use case. This section discusses three specific example use cases: • An object store with a RESTful interface • Compute analytics with parallel file systems • High performance database The example below shows a REST interface without a high performance re- quirement. Swift is a highly scalable object store that is part of the OpenStack project. This diagram explains the example architecture: Architecture Guide March 17, 2015 current 103 The example REST interface, presented as a traditional Object store run- ning on traditional spindles, does not require a high performance caching tier. This example uses the following components: Network: • 10 GbE horizontally scalable spine leaf back-end storage and front end network. Storage hardware: • 10 storage servers each with 12x4 TB disks equaling 480 TB total space with approximately 160 Tb of usable space after replicas. Proxy: • 3x proxies • 2x10 GbE bonded front end • 2x10 GbE back-end bonds • Approximately 60 Gb of total bandwidth to the back-end storage cluster Note It may be necessary to implement a 3rd-party caching layer for some applications to achieve suitable performance. Compute analytics with Data processing service Analytics of large data sets are highly dependent on the performance of the storage system. Clouds using storage systems such as Hadoop Dis- tributed File System (HDFS) have inefficiencies which can cause perfor- mance issues. One potential solution to this problem is the implementation of storage systems designed for performance. Parallel file systems have previously filled this need in the HPC space and are suitable for large scale perfor- mance-orientated systems. OpenStack has integration with Hadoop to manage the Hadoop cluster within the cloud. This diagram shows an OpenStack store with a high per- formance requirement: Architecture Guide March 17, 2015 current 104 The hardware requirements and configuration are similar to those of the High Performance Database example below. In this case, the architecture uses Ceph's Swift-compatible REST interface, features that allow for con- necting a caching pool to allow for acceleration of the presented pool. High performance database with Database service Databases are a common workload that benefit from high performance storage back ends. Although enterprise storage is not a requirement, many environments have existing storage that OpenStack cloud can use as back ends. You can create a storage pool to provide block devices with OpenStack Block Storage for instances as well as object interfaces. In this example, the database I-O requirements are high and demand storage pre- sented from a fast SSD pool. A storage system presents a LUN backed by a set of SSDs using a tradition- al storage array with OpenStack Block Storage integration or a storage platform such as Ceph or Gluster. This system can provide additional performance. For example, in the database example below, a portion of the SSD pool can act as a block de- vice to the Database server. In the high performance analytics example, the inline SSD cache layer accelerates the REST interface. Architecture Guide March 17, 2015 current 105 In this example, Ceph presents a Swift-compatible REST interface, as well as a block level storage from a distributed storage cluster. It is highly flex- ible and has features that enable reduced cost of operations such as self healing and auto balancing. Using erasure coded pools are a suitable way of maximizing the amount of usable space. Note There are special considerations around erasure coded pools. For example, higher computational requirements and limita- tions on the operations allowed on an object; erasure coded pools do not support partial writes. Architecture Guide March 17, 2015 current 106 Using Ceph as an applicable example, a potential architecture would have the following requirements: Network: • 10 GbE horizontally scalable spine leaf back-end storage and front-end network Storage hardware: • 5 storage servers for caching layer 24x1 TB SSD • 10 storage servers each with 12x4 TB disks which equals 480 TB total space with about approximately 160 Tb of usable space after 3 replicas REST proxy: • 3x proxies • 2x10 GbE bonded front end • 2x10 GbE back-end bonds • Approximately 60 Gb of total bandwidth to the back-end storage cluster Using an SSD cache layer, you can present block devices directly to Hypervi- sors or instances. The REST interface can also use the SSD cache systems as an inline cache. Architecture Guide March 17, 2015 current 107 5. Network focused Table of Contents User requirements .............................................................................. 110 Technical considerations ..................................................................... 113 Operational considerations ................................................................. 121 Architecture ....................................................................................... 122 Prescriptive examples ......................................................................... 127 All OpenStack deployments are dependent, to some extent, on network communication in order to function properly due to a service-based na- ture. In some cases, however, use cases dictate that the network is elevat- ed beyond simple infrastructure. This chapter is a discussion of architec- tures that are more reliant or focused on network services. These architec- tures are heavily dependent on the network infrastructure and need to be architected so that the network services perform and are reliable in order to satisfy user and application requirements. Some possible use cases include: Content delivery network This could include streaming video, photographs or any other cloud based repository of data that is distributed to a large number of end users. Mass market streaming video will be very heavily affected by the network con- figurations that would affect latency, bandwidth, and the distribution of in- stances. Not all video streaming is con- sumer focused. For example, multicast videos (used for media, press confer- ences, corporate presentations, web conferencing services, and so on) can also utilize a content delivery network. Content delivery will be affected by the location of the video repository and its relationship to end users. Performance is also affected by network through- put of the backend systems, as well as the WAN architecture and the cache methodology. Architecture Guide March 17, 2015 current 108 Network management func- tions A cloud that provides network service functions would be built to support the delivery of back-end network services such as DNS, NTP or SNMP and would be used by a company for internal net- work management. Network service offerings A cloud can be used to run customer facing network tools to support ser- vices. For example, VPNs, MPLS private networks, GRE tunnels and others. Web portals or web services Web servers are a common application for cloud services and we recommend an understanding of the network re- quirements. The network will need to be able to scale out to meet user de- mand and deliver webpages with a minimum of latency. Internal east-west and north-south network bandwidth must be considered depending on the details of the portal architecture. High speed and high volume transactional systems These types of applications are very sensitive to network configurations. Ex- amples include many financial systems, credit card transaction applications, trading and other extremely high vol- ume systems. These systems are sensi- tive to network jitter and latency. They also have a high volume of both east- west and north-south network traffic that needs to be balanced to maximize efficiency of the data delivery. Many of these systems have large high perfor- mance database back ends that need to be accessed. High availability These types of use cases are highly de- pendent on the proper sizing of the network to maintain replication of da- ta between sites for high availability. If one site becomes unavailable, the extra sites will be able to serve the displaced load until the original site returns to Architecture Guide March 17, 2015 current 109 service. It is important to size network capacity to handle the loads that are desired. Big data Clouds that will be used for the man- agement and collection of big data (data ingest) will have a significant de- mand on network resources. Big data often uses partial replicas of the data to maintain data integrity over large distributed clouds. Other big data ap- plications that require a large amount of network resources are Hadoop, Cas- sandra, NuoDB, RIAK and other No-SQL and distributed databases. Virtual desktop infrastructure (VDI) This use case is very sensitive to net- work congestion, latency, jitter and other network characteristics. Like video streaming, the user experience is very important however, unlike video streaming, caching is not an option to offset the network issues. VDI requires both upstream and downstream traffic and cannot rely on caching for the de- livery of the application to the end us- er. Voice over IP (VoIP) This is extremely sensitive to network congestion, latency, jitter and other network characteristics. VoIP has a sym- metrical traffic pattern and it requires network quality of service (QoS) for best performance. It may also require an active queue management imple- mentation to ensure delivery. Users are very sensitive to latency and jitter fluc- tuations and can detect them at very low levels. Video Conference or web con- ference This also is extremely sensitive to net- work congestion, latency, jitter and other network flaws. Video Conferenc- ing has a symmetrical traffic pattern, Architecture Guide March 17, 2015 current 110 but unless the network is on an MPLS private network, it cannot use network quality of service (QoS) to improve per- formance. Similar to VOIP, users will be sensitive to network performance is- sues even at low levels. High performance computing (HPC) This is a complex use case that requires careful consideration of the traffic flows and usage patterns to address the needs of cloud clusters. It has high East-West traffic patterns for distribut- ed computing, but there can be sub- stantial North-South traffic depending on the specific application. User requirements Network focused architectures vary from the general purpose designs. They are heavily influenced by a specific subset of applications that interact with the network in a more impacting way. Some of the business require- ments that will influence the design include: • User experience: User experience is impacted by network latency through slow page loads, degraded video streams, and low quality VoIP sessions. Users are often not aware of how network design and architec- ture affects their experiences. Both enterprise customers and end-users rely on the network for delivery of an application. Network performance problems can provide a negative experience for the end-user, as well as productivity and economic loss. • Regulatory requirements: Networks need to take into consideration any regulatory requirements about the physical location of data as it travers- es the network. For example, Canadian medical records cannot pass out- side of Canadian sovereign territory. Another network consideration is maintaining network segregation of private data flows and ensuring that the network between cloud locations is encrypted where required. Network architectures are affected by regulatory requirements for en- cryption and protection of data in flight as the data moves through vari- ous networks. Many jurisdictions have legislative and regulatory requirements governing the storage and management of data in cloud environments. Common ar- eas of regulation include: Architecture Guide March 17, 2015 current 111 • Data retention policies ensuring storage of persistent data and records management to meet data archival requirements. • Data ownership policies governing the possession and responsibility for data. • Data sovereignty policies governing the storage of data in foreign coun- tries or otherwise separate jurisdictions. • Data compliance policies governing where information needs to reside in certain locations due to regular issues and, more importantly, where it cannot reside in other locations for the same reason. Examples of such legal frameworks include the data protection framework of the European Union (http://ec.europa.eu/justice/data-protection/) and the requirements of the Financial Industry Regulatory Authority (http:// www.finra.org/Industry/Regulation/FINRARules) in the United States. Consult a local regulatory body for more information. High availability issues OpenStack installations with high demand on network resources have high availability requirements that are determined by the application and use case. Financial transaction systems will have a much higher requirement for high availability than a development application. Forms of network availability, for example quality of service (QoS), can be used to improve the network performance of sensitive applications, for example VoIP and video streaming. Often, high performance systems will have SLA requirements for a mini- mum QoS with regard to guaranteed uptime, latency and bandwidth. The level of the SLA can have a significant impact on the network architecture and requirements for redundancy in the systems. Risks Network misconfigurations Configuring incorrect IP addresses, VLANs, and routes can cause outages to areas of the network or, in the worst-case scenario, the entire cloud infrastructure. Misconfigurations can cause disruptive problems and should be automated to minimize the oppor- tunity for operator error. Architecture Guide March 17, 2015 current 112 Capacity planning Cloud networks need to be managed for capacity and growth over time. There is a risk that the network will not grow to support the workload. Capac- ity planning includes the purchase of network circuits and hardware that can potentially have lead times measured in months or more. Network tuning Cloud networks need to be configured to minimize link loss, packet loss, pack- et storms, broadcast storms, and loops. Single Point Of Failure (SPOF) High availability must be taken into ac- count even at the physical and environ- mental layers. If there is a single point of failure due to only one upstream link, or only one power supply, an out- age becomes unavoidable. Complexity An overly complex network design be- comes difficult to maintain and trou- bleshoot. While automated tools that handle overlay networks or device lev- el configuration can mitigate this, non- traditional interconnects between func- tions and specialized hardware need to be well documented or avoided to pre- vent outages. Non-standard features There are additional risks that arise from configuring the cloud network to take advantage of vendor specific features. One example is multi-link ag- gregation (MLAG) that is being used to provide redundancy at the aggrega- tor switch level of the network. MLAG is not a standard and, as a result, each vendor has their own proprietary im- plementation of the feature. MLAG ar- chitectures are not interoperable across switch vendors, which leads to vendor lock-in, and can cause delays or inability when upgrading components. Architecture Guide March 17, 2015 current 113 Security Security is often overlooked or added after a design has been implement- ed. Consider security implications and requirements before designing the physical and logical network topologies. Some of the factors that need to be addressed include making sure the networks are properly segregat- ed and traffic flows are going to the correct destinations without cross- ing through locations that are undesirable. Some examples of factors that need to be taken into consideration are: • Firewalls • Overlay interconnects for joining separated tenant networks • Routing through or avoiding specific networks Another security vulnerability that must be taken into account is how net- works are attached to hypervisors. If a network must be separated from other systems at all costs, it may be necessary to schedule instances for that network onto dedicated compute nodes. This may also be done to mitigate against exploiting a hypervisor breakout allowing the attacker ac- cess to networks from a compromised instance. Technical considerations When you design an OpenStack network architecture, you must consid- er layer-2 and layer-3 issues. Layer-2 decisions involve those made at the data-link layer, such as the decision to use Ethernet versus Token Ring. Layer-3 decisions involve those made about the protocol layer and the point when IP comes into the picture. As an example, a completely inter- nal OpenStack network can exist at layer 2 and ignore layer 3 however, in order for any traffic to go outside of that cloud, to another network, or to the Internet, a layer-3 router or switch must be involved. The past few years have seen two competing trends in networking. One trend leans towards building data center network architectures based on layer-2 networking. Another trend treats the cloud environment essential- ly as a miniature version of the Internet. This approach is radically different from the network architecture approach that is used in the staging envi- ronment: the Internet is based entirely on layer-3 routing rather than lay- er-2 switching. A network designed on layer-2 protocols has advantages over one de- signed on layer-3 protocols. In spite of the difficulties of using a bridge to Architecture Guide March 17, 2015 current 114 perform the network role of a router, many vendors, customers, and ser- vice providers choose to use Ethernet in as many parts of their networks as possible. The benefits of selecting a layer-2 design are: • Ethernet frames contain all the essentials for networking. These include, but are not limited to, globally unique source addresses, globally unique destination addresses, and error control. • Ethernet frames can carry any kind of packet. Networking at layer 2 is independent of the layer-3 protocol. • More layers added to the Ethernet frame only slow the networking pro- cess down. This is known as 'nodal processing delay'. • Adjunct networking features, for example class of service (CoS) or multi- casting, can be added to Ethernet as readily as IP networks. • VLANs are an easy mechanism for isolating networks. Most information starts and ends inside Ethernet frames. Today this ap- plies to data, voice (for example, VoIP) and video (for example, web cam- eras). The concept is that, if more of the end-to-end transfer of informa- tion from a source to a destination can be done in the form of Ethernet frames, more of the benefits of Ethernet can be realized on the network. Though it is not a substitute for IP networking, networking at layer 2 can be a powerful adjunct to IP networking. Layer-2 Ethernet usage has these advantages over layer-3 IP network us- age: • Speed • Reduced overhead of the IP hierarchy. • No need to keep track of address configuration as systems are moved around. Whereas the simplicity of layer-2 protocols might work well in a data center with hundreds of physical machines, cloud data centers have the additional burden of needing to keep track of all virtual machine ad- dresses and networks. In these data centers, it is not uncommon for one physical node to support 30-40 instances. Important Networking at the frame level says nothing about the presence or absence of IP addresses at the packet level. Almost all ports, links, and devices on a network of LAN switches still have IP ad- Architecture Guide March 17, 2015 current 115 dresses, as do all the source and destination hosts. There are many reasons for the continued need for IP addressing. The largest one is the need to manage the network. A device or link without an IP address is usually invisible to most manage- ment applications. Utilities including remote access for diag- nostics, file transfer of configurations and software, and similar applications cannot run without IP addresses as well as MAC addresses. Layer-2 architecture limitations Outside of the traditional data center the limitations of layer-2 network ar- chitectures become more obvious. • Number of VLANs is limited to 4096. • The number of MACs stored in switch tables is limited. • The need to maintain a set of layer-4 devices to handle traffic control must be accommodated. • MLAG, often used for switch redundancy, is a proprietary solution that does not scale beyond two devices and forces vendor lock-in. • It can be difficult to troubleshoot a network without IP addresses and ICMP. • Configuring ARP is considered complicated on large layer-2 networks. • All network devices need to be aware of all MACs, even instance MACs, so there is constant churn in MAC tables and network state changes as instances are started or stopped. • Migrating MACs (instance migration) to different physical locations are a potential problem if ARP table timeouts are not set properly. It is important to know that layer 2 has a very limited set of network man- agement tools. It is very difficult to control traffic, as it does not have mechanisms to manage the network or shape the traffic, and network troubleshooting is very difficult. One reason for this difficulty is network devices have no IP addresses. As a result, there is no reasonable way to check network delay in a layer-2 network. On large layer-2 networks, configuring ARP learning can also be complicat- ed. The setting for the MAC address timer on switches is critical and, if set Architecture Guide March 17, 2015 current 116 incorrectly, can cause significant performance problems. As an example, the Cisco default MAC address timer is extremely long. Migrating MACs to different physical locations to support instance migration can be a sig- nificant problem. In this case, the network information maintained in the switches could be out of sync with the new location of the instance. In a layer-2 network, all devices are aware of all MACs, even those that be- long to instances. The network state information in the backbone changes whenever an instance is started or stopped. As a result there is far too much churn in the MAC tables on the backbone switches. Layer-3 architecture advantages In the layer 3 case, there is no churn in the routing tables due to instances starting and stopping. The only time there would be a routing state change would be in the case of a Top of Rack (ToR) switch failure or a link failure in the backbone itself. Other advantages of using a layer-3 architec- ture include: • Layer-3 networks provide the same level of resiliency and scalability as the Internet. • Controlling traffic with routing metrics is straightforward. • Layer 3 can be configured to use BGP confederation for scalability so core routers have state proportional to the number of racks, not to the number of servers or instances. • Routing ensures that instance MAC and IP addresses out of the network core reducing state churn. Routing state changes only occur in the case of a ToR switch failure or backbone link failure. • There are a variety of well tested tools, for example ICMP, to monitor and manage traffic. • Layer-3 architectures allow for the use of Quality of Service (QoS) to manage network performance. Layer-3 architecture limitations The main limitation of layer 3 is that there is no built-in isolation mecha- nism comparable to the VLANs in layer-2 networks. Furthermore, the hi- erarchical nature of IP addresses means that an instance will also be on Architecture Guide March 17, 2015 current 117 the same subnet as its physical host. This means that it cannot be migrat- ed outside of the subnet easily. For these reasons, network virtualization needs to use IP encapsulation and software at the end hosts for both iso- lation, as well as for separation of the addressing in the virtual layer from addressing in the physical layer. Other potential disadvantages of layer 3 include the need to design an IP addressing scheme rather than relying on the switches to automatically keep track of the MAC addresses and to con- figure the interior gateway routing protocol in the switches. Network recommendations overview OpenStack has complex networking requirements for several reasons. Many components interact at different levels of the system stack that adds complexity. Data flows are complex. Data in an OpenStack cloud moves both between instances across the network (also known as East-West), as well as in and out of the system (also known as North-South). Physical serv- er nodes have network requirements that are independent of those used by instances which need to be isolated from the core network to account for scalability. It is also recommended to functionally separate the net- works for security purposes and tune performance through traffic shaping. A number of important general technical and business factors need to be taken into consideration when planning and designing an OpenStack net- work. They include: • A requirement for vendor independence. To avoid hardware or soft- ware vendor lock-in, the design should not rely on specific features of a vendor's router or switch. • A requirement to massively scale the ecosystem to support millions of end users. • A requirement to support indeterminate platforms and applications. • A requirement to design for cost efficient operations to take advantage of massive scale. • A requirement to ensure that there is no single point of failure in the cloud ecosystem. • A requirement for high availability architecture to meet customer SLA re- quirements. • A requirement to be tolerant of rack level failure. Architecture Guide March 17, 2015 current 118 • A requirement to maximize flexibility to architect future production en- vironments. Keeping all of these in mind, the following network design recommenda- tions can be made: • Layer-3 designs are preferred over layer-2 architectures. • Design a dense multi-path network core to support multi-directional scal- ing and flexibility. • Use hierarchical addressing because it is the only viable option to scale network ecosystem. • Use virtual networking to isolate instance service network traffic from the management and internal network traffic. • Isolate virtual networks using encapsulation technologies. • Use traffic shaping for performance tuning. • Use eBGP to connect to the Internet up-link. • Use iBGP to flatten the internal traffic on the layer-3 mesh. • Determine the most effective configuration for block storage network. Additional considerations There are numerous topics to consider when designing a network-focused OpenStack cloud. OpenStack Networking versus legacy networking (nova-net- work) considerations Selecting the type of networking technology to implement depends on many factors. OpenStack Networking (neutron) and legacy networking (nova-network) both have their advantages and disadvantages. They are both valid and supported options that fit different use cases as described in the following table. Legacy networking (nova-network) OpenStack Networking Simple, single agent Complex, multiple agents More mature, established Newer, maturing Architecture Guide March 17, 2015 current 119 Legacy networking (nova-network) OpenStack Networking Flat or VLAN Flat, VLAN, Overlays, L2-L3, SDN No plug-in support Plug-in support for 3rd parties Scales well Scaling requires 3rd party plug-ins No multi-tier topologies Multi-tier topologies Redundant networking: ToR switch high availability risk analysis A technical consideration of networking is the idea that switching gear in the data center that should be installed with backup switches in case of hardware failure. Research into the mean time between failures (MTBF) on switches is be- tween 100,000 and 200,000 hours. This number is dependent on the am- bient temperature of the switch in the data center. When properly cooled and maintained, this translates to between 11 and 22 years before fail- ure. Even in the worst case of poor ventilation and high ambient temper- atures in the data center, the MTBF is still 2-3 years. This is based on pub- lished research found at http://www.garrettcom.com/techsupport/pa- pers/ethernet_switch_reliability.pdf and http://www.n-tron.com/pdf/ network_availability.pdf. In most cases, it is much more economical to only use a single switch with a small pool of spare switches to replace failed units than it is to outfit an en- tire data center with redundant switches. Applications should also be able to tolerate rack level outages without affecting normal operations since network and compute resources are easily provisioned and plentiful. Preparing for the future: IPv6 support One of the most important networking topics today is the impending ex- haustion of IPv4 addresses. In early 2014, ICANN announced that they started allocating the final IPv4 address blocks to the Regional Internet Registries (http://www.internetsociety.org/deploy360/blog/2014/05/ goodbye-ipv4-iana-starts-allocating-final-address-blocks/). This means the IPv4 address space is close to being fully allocated. As a result, it will soon become difficult to allocate more IPv4 addresses to an application that has experienced growth, or is expected to scale out, due to the lack of unallo- cated IPv4 address blocks. For network focused applications the future is the IPv6 protocol. IPv6 in- creases the address space significantly, fixes long standing issues in the IPv4 Architecture Guide March 17, 2015 current 120 protocol, and will become essential for network focused applications in the future. OpenStack Networking supports IPv6 when configured to take advantage of the feature. To enable it, simply create an IPv6 subnet in Networking and use IPv6 prefixes when creating security groups. Asymmetric links When designing a network architecture, the traffic patterns of an applica- tion will heavily influence the allocation of total bandwidth and the num- ber of links that are used to send and receive traffic. Applications that pro- vide file storage for customers will allocate bandwidth and links to favor incoming traffic, whereas video streaming applications will allocate band- width and links to favor outgoing traffic. Performance It is important to analyze the applications' tolerance for latency and jitter when designing an environment to support network focused applications. Certain applications, for example VoIP, are less tolerant of latency and jit- ter. Where latency and jitter are concerned, certain applications may re- quire tuning of QoS parameters and network device queues to ensure that they are queued for transmit immediately or guaranteed minimum band- width. Since OpenStack currently does not support these functions, some considerations may need to be made for the network plug-in selected. The location of a service may also impact the application or consumer ex- perience. If an application is designed to serve differing content to dif- fering users it will need to be designed to properly direct connections to those specific locations. Use a multi-site installation for these situations, where appropriate. Networking can be implemented in two separate ways. The legacy net- working (nova-network) provides a flat DHCP network with a single broad- cast domain. This implementation does not support tenant isolation net- works or advanced plug-ins, but it is currently the only way to implement a distributed layer-3 agent using the multi_host configuration. OpenStack Networking (neutron) is the official networking implementation and pro- vides a pluggable architecture that supports a large variety of network methods. Some of these include a layer-2 only provider network model, ex- ternal device plug-ins, or even OpenFlow controllers. Networking at large scales becomes a set of boundary questions. The de- termination of how large a layer-2 domain needs to be is based on the Architecture Guide March 17, 2015 current 121 amount of nodes within the domain and the amount of broadcast traffic that passes between instances. Breaking layer-2 boundaries may require the implementation of overlay networks and tunnels. This decision is a bal- ancing act between the need for a smaller overhead or a need for a small- er domain. When selecting network devices, be aware that making this decision based on the greatest port density often comes with a drawback. Aggregation switches and routers have not all kept pace with Top of Rack switches and may induce bottlenecks on north-south traffic. As a result, it may be possi- ble for massive amounts of downstream network utilization to impact up- stream network devices, impacting service to the cloud. Since OpenStack does not currently provide a mechanism for traffic shaping or rate limiting, it is necessary to implement these features at the network hardware level. Operational considerations Network focused OpenStack clouds have a number of operational consid- erations that will influence the selected design. Topics including, but not limited to, dynamic routing of static routes, service level agreements, and ownership of user management all need to be considered. One of the first required decisions is the selection of a telecom company or transit provider. This is especially true if the network requirements include external or site-to-site network connectivity. Additional design decisions need to be made about monitoring and alarm- ing. These can be an internal responsibility or the responsibility of the ex- ternal provider. In the case of using an external provider, SLAs will likely apply. In addition, other operational considerations such as bandwidth, la- tency, and jitter can be part of a service level agreement. The ability to upgrade the infrastructure is another subject for considera- tion. As demand for network resources increase, operators will be required to add additional IP address blocks and add additional bandwidth capac- ity. Managing hardware and software life cycle events, for example up- grades, decommissioning, and outages while avoiding service interruptions for tenants, will also need to be considered. Maintainability will also need to be factored into the overall network de- sign. This includes the ability to manage and maintain IP addresses as well as the use of overlay identifiers including VLAN tag IDs, GRE tunnel IDs, and MPLS tags. As an example, if all of the IP addresses have to be Architecture Guide March 17, 2015 current 122 changed on a network, a process known as renumbering, then the design needs to support the ability to do so. Network focused applications themselves need to be addressed when concerning certain operational realities. For example, the impending ex- haustion of IPv4 addresses, the migration to IPv6 and the utilization of pri- vate networks to segregate different types of traffic that an application receives or generates. In the case of IPv4 to IPv6 migrations, applications should follow best practices for storing IP addresses. It is further recom- mended to avoid relying on IPv4 features that were not carried over to the IPv6 protocol or have differences in implementation. When using private networks to segregate traffic, applications should cre- ate private tenant networks for database and data storage network traf- fic, and utilize public networks for client-facing traffic. By segregating this traffic, quality of service and security decisions can be made to ensure that each network has the correct level of service that it requires. Finally, decisions must be made about the routing of network traffic. For some applications, a more complex policy framework for routing must be developed. The economic cost of transmitting traffic over expensive links versus cheaper links, in addition to bandwidth, latency, and jitter require- ments, can be used to create a routing policy that will satisfy business re- quirements. How to respond to network events must also be taken into consideration. As an example, how load is transferred from one link to another during a failure scenario could be a factor in the design. If network capacity is not planned correctly, failover traffic could overwhelm other ports or network links and create a cascading failure scenario. In this case, traffic that fails over to one link overwhelms that link and then moves to the subsequent links until the all network traffic stops. Architecture Network focused OpenStack architectures have many similarities to other OpenStack architecture use cases. There are a number of very specific con- siderations to keep in mind when designing for a network-centric or net- work-heavy application environment. Networks exist to serve as a medium of transporting data between sys- tems. It is inevitable that an OpenStack design has inter-dependencies with non-network portions of OpenStack as well as on external systems. De- Architecture Guide March 17, 2015 current 123 pending on the specific workload, there may be major interactions with storage systems both within and external to the OpenStack environment. For example, if the workload is a content delivery network, then the inter- actions with storage will be two-fold. There will be traffic flowing to and from the storage array for ingesting and serving content in a north-south direction. In addition, there is replication traffic flowing in an east-west di- rection. Compute-heavy workloads may also induce interactions with the network. Some high performance compute applications require network-based memory mapping and data sharing and, as a result, will induce a high- er network load when they transfer results and data sets. Others may be highly transactional and issue transaction locks, perform their functions and rescind transaction locks at very high rates. This also has an impact on the network performance. Some network dependencies are going to be external to OpenStack. While OpenStack Networking is capable of providing network ports, IP address- es, some level of routing, and overlay networks, there are some other func- tions that it cannot provide. For many of these, external systems or equip- ment may be required to fill in the functional gaps. Hardware load bal- ancers are an example of equipment that may be necessary to distribute workloads or offload certain functions. Note that, as of the Icehouse re- lease, dynamic routing is currently in its infancy within OpenStack and may need to be implemented either by an external device or a specialized ser- vice instance within OpenStack. Tunneling is a feature provided by Open- Stack Networking, however it is constrained to a Networking-managed re- gion. If the need arises to extend a tunnel beyond the OpenStack region to either another region or an external system, it is necessary to implement the tunnel itself outside OpenStack or by using a tunnel management sys- tem to map the tunnel or overlay to an external tunnel. OpenStack does not currently provide quotas for network resources. Where network quo- tas are required, it is necessary to implement quality of service manage- ment outside of OpenStack. In many of these instances, similar solutions for traffic shaping or other network functions will be needed. Depending on the selected design, Networking itself might not even sup- port the required layer-3 network functionality. If you choose to use the provider networking mode without running the layer-3 agent, you must in- stall an external router to provide layer-3 connectivity to outside systems. Interaction with orchestration services is inevitable in larger-scale deploy- ments. The Orchestration module is capable of allocating network re- source defined in templates to map to tenant networks and for port cre- Architecture Guide March 17, 2015 current 124 ation, as well as allocating floating IPs. If there is a requirement to define and manage network resources in using orchestration, we recommend that the design include the Orchestration module to meet the demands of users. Design impacts A wide variety of factors can affect a network focused OpenStack architec- ture. While there are some considerations shared with a general use case, specific workloads related to network requirements will influence network design decisions. One decision includes whether or not to use Network Address Translation (NAT) and where to implement it. If there is a requirement for floating IPs to be available instead of using public fixed addresses then NAT is re- quired. This can be seen in network management applications that rely on an IP endpoint. An example of this is a DHCP relay that needs to know the IP of the actual DHCP server. In these cases it is easier to automate the in- frastructure to apply the target IP to a new instance rather than reconfig- ure legacy or external systems for each new instance. NAT for floating IPs managed by Networking will reside within the hyper- visor but there are also versions of NAT that may be running elsewhere. If there is a shortage of IPv4 addresses there are two common methods to mitigate this externally to OpenStack. The first is to run a load balancer ei- ther within OpenStack as an instance, or use an external load balancing so- lution. In the internal scenario, load balancing software, such as HAproxy, can be managed with Networking's Load-Balancer-as-a-Service (LBaaS). This is specifically to manage the Virtual IP (VIP) while a dual-homed con- nection from the HAproxy instance connects the public network with the tenant private network that hosts all of the content servers. In the external scenario, a load balancer would need to serve the VIP and also be joined to the tenant overlay network through external means or routed to it via private addresses. Another kind of NAT that may be useful is protocol NAT. In some cases it may be desirable to use only IPv6 addresses on instances and operate ei- ther an instance or an external service to provide a NAT-based transition technology such as NAT64 and DNS64. This provides the ability to have a globally routable IPv6 address while only consuming IPv4 addresses as nec- essary or in a shared manner. Application workloads will affect the design of the underlying network ar- chitecture. If a workload requires network-level redundancy, the routing Architecture Guide March 17, 2015 current 125 and switching architecture will have to accommodate this. There are differ- ing methods for providing this that are dependent on the network hard- ware selected, the performance of the hardware, and which networking model is deployed. Some examples of this are the use of Link aggregation (LAG) or Hot Standby Router Protocol (HSRP). There are also the consid- erations of whether to deploy OpenStack Networking or legacy network- ing (nova-network) and which plug-in to select for OpenStack Networking. If using an external system, Networking will need to be configured to run layer 2 with a provider network configuration. For example, it may be nec- essary to implement HSRP to terminate layer-3 connectivity. Depending on the workload, overlay networks may or may not be a rec- ommended configuration. Where application network connections are small, short lived or bursty, running a dynamic overlay can generate as much bandwidth as the packets it carries. It also can induce enough laten- cy to cause issues with certain applications. There is an impact to the de- vice generating the overlay which, in most installations, will be the hyper- visor. This will cause performance degradation on packet per second and connection per second rates. Overlays also come with a secondary option that may or may not be ap- propriate to a specific workload. While all of them will operate in full mesh by default, there might be good reasons to disable this function because it may cause excessive overhead for some workloads. Conversely, oth- er workloads will operate without issue. For example, most web services applications will not have major issues with a full mesh overlay network, while some network monitoring tools or storage replication workloads will have performance issues with throughput or excessive broadcast traffic. Many people overlook an important design decision: The choice of layer-3 protocols. While OpenStack was initially built with only IPv4 support, Net- working now supports IPv6 and dual-stacked networks. Note that, as of the Icehouse release, this only includes stateless address auto configura- tion but work is in progress to support stateless and stateful DHCPv6 as well as IPv6 floating IPs without NAT. Some workloads become possible through the use of IPv6 and IPv6 to IPv4 reverse transition mechanisms such as NAT64 and DNS64 or 6to4, because these options are available. This will alter the requirements for any address plan as single-stacked and transitional IPv6 deployments can alleviate the need for IPv4 addresses. As of the Icehouse release, OpenStack has limited support for dynamic routing, however there are a number of options available by incorporat- ing third party solutions to implement routing within the cloud including network equipment, hardware nodes, and instances. Some workloads will Architecture Guide March 17, 2015 current 126 perform well with nothing more than static routes and default gateways configured at the layer-3 termination point. In most cases this will suffice, however some cases require the addition of at least one type of dynamic routing protocol if not multiple protocols. Having a form of interior gate- way protocol (IGP) available to the instances inside an OpenStack installa- tion opens up the possibility of use cases for anycast route injection for ser- vices that need to use it as a geographic location or failover mechanism. Other applications may wish to directly participate in a routing protocol, either as a passive observer as in the case of a looking glass, or as an active participant in the form of a route reflector. Since an instance might have a large amount of compute and memory resources, it is trivial to hold an en- tire unpartitioned routing table and use it to provide services such as net- work path visibility to other applications or as a monitoring tool. Path maximum transmission unit (MTU) failures are lesser known but hard- er to diagnose. The MTU must be large enough to handle normal traf- fic, overhead from an overlay network, and the desired layer-3 protocol. When you add externally built tunnels, the MTU packet size is reduced. In this case, you must pay attention to the fully calculated MTU size because some systems are configured to ignore or drop path MTU discovery pack- ets. Tunable networking components Consider configurable networking components related to an OpenStack architecture design when designing for network intensive workloads in- clude MTU and QoS. Some workloads will require a larger MTU than nor- mal based on a requirement to transfer large blocks of data. When pro- viding network service for applications such as video streaming or storage replication, it is recommended to ensure that both OpenStack hardware nodes and the supporting network equipment are configured for jum- bo frames where possible. This will allow for a better utilization of avail- able bandwidth. Configuration of jumbo frames should be done across the complete path the packets will traverse. If one network component is not capable of handling jumbo frames then the entire path will revert to the default MTU. Quality of Service (QoS) also has a great impact on network intensive workloads by providing instant service to packets which have a higher pri- ority due to their ability to be impacted by poor network performance. In applications such as Voice over IP (VoIP) differentiated services code points are a near requirement for proper operation. QoS can also be used in the opposite direction for mixed workloads to prevent low priority but high bandwidth applications, for example backup services, video conferencing Architecture Guide March 17, 2015 current 127 or file sharing, from blocking bandwidth that is needed for the proper op- eration of other workloads. It is possible to tag file storage traffic as a low- er class, such as best effort or scavenger, to allow the higher priority traf- fic through. In cases where regions within a cloud might be geographical- ly distributed it may also be necessary to plan accordingly to implement WAN optimization to combat latency or packet loss. Prescriptive examples A large-scale web application has been designed with cloud principles in mind. The application is designed to scale horizontally in a bursting fashion and will generate a high instance count. The application requires an SSL connection to secure data and must not lose connection state to individual servers. An example design for this workload is depicted in the figure below. In this example, a hardware load balancer is configured to provide SSL offload functionality and to connect to tenant networks in order to reduce ad- dress consumption. This load balancer is linked to the routing architecture as it will service the VIP for the application. The router and load balancer are configured with GRE tunnel ID of the application's tenant network and provided an IP address within the tenant subnet but outside of the address pool. This is to ensure that the load balancer can communicate with the application's HTTP servers without requiring the consumption of a public IP address. Because sessions persist until they are closed, the routing and switching ar- chitecture is designed for high availability. Switches are meshed to each hy- pervisor and each other, and also provide an MLAG implementation to en- sure that layer-2 connectivity does not fail. Routers are configured with VR- RP and fully meshed with switches to ensure layer-3 connectivity. Since GRE is used as an overlay network, Networking is installed and configured to use the Open vSwitch agent in GRE tunnel mode. This ensures all devices can reach all other devices and that tenant networks can be created for private addressing links to the load balancer. Architecture Guide March 17, 2015 current 128 A web service architecture has many options and optional components. Due to this, it can fit into a large number of other OpenStack designs how- ever a few key components will need to be in place to handle the nature of most web-scale workloads. The user needs the following components: • OpenStack Controller services (Image, Identity, Networking and support- ing services such as MariaDB and RabbitMQ) • OpenStack Compute running KVM hypervisor • OpenStack Object Storage • Orchestration module • Telemetry module Beyond the normal Identity, Compute, Image Service and Object Storage components, the Orchestration module is a recommended component to Architecture Guide March 17, 2015 current 129 handle properly scaling the workloads to adjust to demand. Due to the requirement for auto-scaling, the design includes the Telemetry module. Web services tend to be bursty in load, have very defined peak and valley usage patterns and, as a result, benefit from automatic scaling of instances based upon traffic. At a network level, a split network configuration will work well with databases residing on private tenant networks since these do not emit a large quantity of broadcast traffic and may need to intercon- nect to some databases for content. Load balancing Load balancing was included in this design to spread requests across mul- tiple instances. This workload scales well horizontally across large num- bers of instances. This allows instances to run without publicly routed IP addresses and simply rely on the load balancer for the service to be glob- ally reachable. Many of these services do not require direct server return. This aids in address planning and utilization at scale since only the virtual IP (VIP) must be public. Overlay networks The overlay functionality design includes OpenStack Networking in Open vSwitch GRE tunnel mode. In this case, the layer-3 external routers are paired with VRRP and switches should be paired with an implementation of MLAG running to ensure that you do not lose connectivity with the up- stream routing infrastructure. Performance tuning Network level tuning for this workload is minimal. Quality-of-Service (QoS) will be applied to these workloads for a middle ground Class Selector de- pending on existing policies. It will be higher than a best effort queue but lower than an Expedited Forwarding or Assured Forwarding queue. Since this type of application generates larger packets with longer-lived con- nections, bandwidth utilization can be optimized for long duration TCP. Normal bandwidth planning applies here with regards to benchmarking a session's usage multiplied by the expected number of concurrent sessions with overhead. Network functions Network functions is a broad category but encompasses workloads that support the rest of a system's network. These workloads tend to consist Architecture Guide March 17, 2015 current 130 of large amounts of small packets that are very short lived, such as DNS queries or SNMP traps. These messages need to arrive quickly and do not deal with packet loss as there can be a very large volume of them. There are a few extra considerations to take into account for this type of work- load and this can change a configuration all the way to the hypervisor lev- el. For an application that generates 10 TCP sessions per user with an av- erage bandwidth of 512 kilobytes per second per flow and expected user count of ten thousand concurrent users, the expected bandwidth plan is approximately 4.88 gigabits per second. The supporting network for this type of configuration needs to have a low latency and evenly distributed availability. This workload benefits from having services local to the consumers of the service. A multi-site approach is used as well as deploying many copies of the application to handle load as close as possible to consumers. Since these applications function inde- pendently, they do not warrant running overlays to interconnect tenant networks. Overlays also have the drawback of performing poorly with rapid flow setup and may incur too much overhead with large quantities of small packets and are therefore not recommended. QoS is desired for some workloads to ensure delivery. DNS has a major im- pact on the load times of other services and needs to be reliable and pro- vide rapid responses. It is to configure rules in upstream devices to apply a higher Class Selector to DNS to ensure faster delivery or a better spot in queuing algorithms. Cloud storage Another common use case for OpenStack environments is to provide a cloud-based file storage and sharing service. You might consider this a stor- age-focused use case, but its network-side requirements make it a net- work-focused use case. For example, consider a cloud backup application. This workload has two specific behaviors that impact the network. Because this workload is an ex- ternally-facing service and an internally-replicating application, it has both north-south and east-west traffic considerations, as follows: north-south traffic When a user uploads and stores content, that content moves into the OpenStack installa- tion. When users download this content, the content moves from the OpenStack installa- tion. Because this service is intended primarily as a backup, most of the traffic moves south- Architecture Guide March 17, 2015 current 131 bound into the environment. In this situation, it benefits you to configure a network to be asymmetrically downstream because the traf- fic that enters the OpenStack installation is greater than the traffic that leaves the instal- lation. east-west traffic Likely to be fully symmetric. Because replica- tion originates from any node and might tar- get multiple other nodes algorithmically, it is less likely for this traffic to have a larger vol- ume in any specific direction. However this traffic might interfere with north-south traffic. This application prioritizes the north-south traffic over east-west traffic: the north-south traffic involves customer-facing data. The network design in this case is less dependent on availability and more dependent on being able to handle high bandwidth. As a direct result, it is beneficial to forego redundant links in favor of bonding those connec- tions. This increases available bandwidth. It is also beneficial to configure Architecture Guide March 17, 2015 current 132 all devices in the path, including OpenStack, to generate and pass jumbo frames. Architecture Guide March 17, 2015 current 133 6. Multi-site Table of Contents User requirements .............................................................................. 133 Technical considerations ..................................................................... 138 Operational considerations ................................................................. 142 Architecture ....................................................................................... 145 Prescriptive examples ......................................................................... 148 A multi-site OpenStack environment is one in which services, located in more than one data center, are used to provide the overall solution. Us- age requirements of different multi-site clouds may vary widely, but they share some common needs. OpenStack is capable of running in a multi-re- gion configuration. This enables some parts of OpenStack to effectively manage a group of sites as a single cloud. With careful planning in the de- sign phase, OpenStack can act as an excellent multi-site cloud solution for a multitude of needs. Some use cases that might indicate a need for a multi-site deployment of OpenStack include: • An organization with a diverse geographic footprint. • Geo-location sensitive data. • Data locality, in which specific data or functionality should be close to users. User requirements A multi-site architecture is complex and has its own risks and considera- tions, therefore it is important to make sure when contemplating the de- sign such an architecture that it meets the user and business requirements. Many jurisdictions have legislative and regulatory requirements governing the storage and management of data in cloud environments. Common ar- eas of regulation include: • Data retention policies ensuring storage of persistent data and records management to meet data archival requirements. Architecture Guide March 17, 2015 current 134 • Data ownership policies governing the possession and responsibility for data. • Data sovereignty policies governing the storage of data in foreign coun- tries or otherwise separate jurisdictions. • Data compliance policies governing types of information that needs to reside in certain locations due to regular issues and, more importantly, cannot reside in other locations for the same reason. Examples of such legal frameworks include the data protection framework of the European Union (http://ec.europa.eu/justice/data-protection) and the requirements of the Financial Industry Regulatory Authority (http:// www.finra.org/Industry/Regulation/FINRARules) in the United States. Consult a local regulatory body for more information. Workload characteristics The expected workload is a critical requirement that needs to be captured to guide decision-making. An understanding of the workloads in the con- text of the desired multi-site environment and use case is important. An- other way of thinking about a workload is to think of it as the way the sys- tems are used. A workload could be a single application or a suite of appli- cations that work together. It could also be a duplicate set of applications that need to run in multiple cloud environments. Often in a multi-site de- ployment the same workload will need to work identically in more than one physical location. This multi-site scenario likely includes one or more of the other scenarios in this book with the additional requirement of having the workloads in two or more locations. The following are some possible scenarios: For many use cases the proximity of the user to their workloads has a di- rect influence on the performance of the application and therefore should be taken into consideration in the design. Certain applications require ze- ro to minimal latency that can only be achieved by deploying the cloud in multiple locations. These locations could be in different data centers, cities, countries or geographical regions, depending on the user requirement and location of the users. Architecture Guide March 17, 2015 current 135 Consistency of images and templates across differ- ent sites It is essential that the deployment of instances is consistent across the dif- ferent sites. This needs to be built into the infrastructure. If the OpenStack Object Storage is used as a back end for the Image Service, it is possible to create repositories of consistent images across multiple sites. Having cen- tral endpoints with multiple storage nodes will allow for consistent central- ized storage for each and every site. Not using a centralized object store will increase operational overhead so that a consistent image library can be maintained. This could include devel- opment of a replication mechanism to handle the transport of images and the changes to the images across multiple sites. High availability If high availability is a requirement to provide continuous infrastructure op- erations, a basic requirement of high availability should be defined. The OpenStack management components need to have a basic and mini- mal level of redundancy. The simplest example is the loss of any single site has no significant impact on the availability of the OpenStack services of the entire infrastructure. The OpenStack High Availability Guide contains more information on how to provide redundancy for the OpenStack components. Multiple network links should be deployed between sites to provide re- dundancy for all components. This includes storage replication, which should be isolated to a dedicated network or VLAN with the ability to as- sign QoS to control the replication traffic or provide priority for this traffic. Note that if the data store is highly changeable, the network requirements could have a significant effect on the operational cost of maintaining the sites. The ability to maintain object availability in both sites has significant impli- cations on the object storage design and implementation. It will also have a significant impact on the WAN network design between the sites. Connecting more than two sites increases the challenges and adds more complexity to the design considerations. Multi-site implementations re- quire extra planning to address the additional topology complexity used Architecture Guide March 17, 2015 current 136 for internal and external connectivity. Some options include full mesh topology, hub spoke, spine leaf, or 3d Torus. Not all the applications running in a cloud are cloud-aware. If that is the case, there should be clear measures and expectations to define what the infrastructure can support and, more importantly, what it cannot. An ex- ample would be shared storage between sites. It is possible, however such a solution is not native to OpenStack and requires a third-party hardware vendor to fulfill such a requirement. Another example can be seen in appli- cations that are able to consume resources in object storage directly. These applications need to be cloud aware to make good use of an OpenStack Object Store. Application readiness Some applications are tolerant of the lack of synchronized object storage, while others may need those objects to be replicated and available across regions. Understanding of how the cloud implementation impacts new and existing applications is important for risk mitigation and the overall success of a cloud project. Applications may have to be written to expect an infrastructure with little to no redundancy. Existing applications not de- veloped with the cloud in mind may need to be rewritten. Cost The requirement of having more than one site has a cost attached to it. The greater the number of sites, the greater the cost and complexity. Costs can be broken down into the following categories: • Compute resources • Networking resources • Replication • Storage • Management • Operational costs Site loss and recovery Outages can cause loss of partial or full functionality of a site. Strategies should be implemented to understand and plan for recovery scenarios. Architecture Guide March 17, 2015 current 137 • The deployed applications need to continue to function and, more im- portantly, consideration should be taken of the impact on the perfor- mance and reliability of the application when a site is unavailable. • It is important to understand what will happen to the replication of ob- jects and data between the sites when a site goes down. If this causes queues to start building up, consider how long these queues can safely exist until something explodes. • Ensure determination of the method for resuming proper operations of a site when it comes back online after a disaster. We recommend you ar- chitect the recovery to avoid race conditions. Compliance and geo-location An organization could have certain legal obligations and regulatory com- pliance measures which could require certain workloads or data to not be located in certain regions. Auditing A well thought-out auditing strategy is important in order to be able to quickly track down issues. Keeping track of changes made to security groups and tenant changes can be useful in rolling back the changes if they affect production. For example, if all security group rules for a tenant disappeared, the ability to quickly track down the issue would be impor- tant for operational and legal reasons. Separation of duties A common requirement is to define different roles for the different cloud administration functions. An example would be a requirement to segre- gate the duties and permissions by site. Authentication between sites Ideally it is best to have a single authentication domain and not need a separate implementation for each and every site. This will, of course, re- quire an authentication mechanism that is highly available and distributed to ensure continuous operation. Authentication server locality is also some- thing that might be needed as well and should be planned for. Architecture Guide March 17, 2015 current 138 Technical considerations There are many technical considerations to take into account with re- gard to designing a multi-site OpenStack implementation. An OpenStack cloud can be designed in a variety of ways to handle individual application needs. A multi-site deployment will have additional challenges compared to single site installations and will therefore be a more complex solution. When determining capacity options be sure to take into account not just the technical issues, but also the economic or operational issues that might arise from specific decisions. Inter-site link capacity describes the capabilities of the connectivity be- tween the different OpenStack sites. This includes parameters such as bandwidth, latency, whether or not a link is dedicated, and any busi- ness policies applied to the connection. The capability and number of the links between sites will determine what kind of options may be avail- able for deployment. For example, if two sites have a pair of high-band- width links available between them, it may be wise to configure a sep- arate storage replication network between the two sites to support a single Swift endpoint and a shared object storage capability between them. (An example of this technique, as well as a configuration walk- through, is available at http://docs.openstack.org/developer/swift/ replication_network.html#dedicated-replication-network). Another option in this scenario is to build a dedicated set of tenant private networks across the secondary link using overlay networks with a third party mapping the site overlays to each other. The capacity requirements of the links between sites will be driven by ap- plication behavior. If the latency of the links is too high, certain applica- tions that use a large number of small packets, for example RPC calls, may encounter issues communicating with each other or operating properly. Additionally, OpenStack may encounter similar types of issues. To mitigate this, tuning of the Identity service call timeouts may be necessary to pre- vent issues authenticating against a central Identity service. Another capacity consideration when it comes to networking for a mul- ti-site deployment is the available amount and performance of overlay net- works for tenant networks. If using shared tenant networks across zones, it is imperative that an external overlay manager or controller be used to map these overlays together. It is necessary to ensure the amount of pos- sible IDs between the zones are identical. Note that, as of the Icehouse release, OpenStack Networking was not capable of managing tunnel IDs Architecture Guide March 17, 2015 current 139 across installations. This means that if one site runs out of IDs, but other does not, that tenant's network will be unable to reach the other site. Capacity can take other forms as well. The ability for a region to grow de- pends on scaling out the number of available compute nodes. This top- ic is covered in greater detail in the section for compute-focused deploy- ments. However, it should be noted that cells may be necessary to grow an individual region beyond a certain point. This point depends on the size of your cluster and the ratio of virtual machines per hypervisor. A third form of capacity comes in the multi-region-capable components of OpenStack. Centralized Object Storage is capable of serving objects through a single namespace across multiple regions. Since this works by ac- cessing the object store via swift proxy, it is possible to overload the prox- ies. There are two options available to mitigate this issue. The first is to de- ploy a large number of swift proxies. The drawback to this is that the prox- ies are not load-balanced and a large file request could continually hit the same proxy. The other way to mitigate this is to front-end the proxies with a caching HTTP proxy and load balancer. Since swift objects are returned to the requester via HTTP, this load balancer would alleviate the load re- quired on the swift proxies. Utilization While constructing a multi-site OpenStack environment is the goal of this guide, the real test is whether an application can utilize it. Identity is normally the first interface for the majority of OpenStack users. Interacting with the Identity service is required for almost all major opera- tions within OpenStack. Therefore, it is important to ensure that you pro- vide users with a single URL for Identity service authentication. Equally im- portant is proper documentation and configuration of regions within the Identity service. Each of the sites defined in your installation is considered to be a region in Identity nomenclature. This is important for the users of the system, when reading Identity documentation, as it is required to de- fine the region name when providing actions to an API endpoint or in the dashboard. Load balancing is another common issue with multi-site installations. While it is still possible to run HAproxy instances with Load-Balancer-as-a-Service, these will be local to a specific region. Some applications may be able to cope with this via internal mechanisms. Others, however, may require the implementation of an external system including global services load bal- ancers or anycast-advertised DNS. Architecture Guide March 17, 2015 current 140 Depending on the storage model chosen during site design, storage repli- cation and availability will also be a concern for end-users. If an application is capable of understanding regions, then it is possible to keep the object storage system separated by region. In this case, users who want to have an object available to more than one region will need to do the cross-site replication themselves. With a centralized swift proxy, however, the user may need to benchmark the replication timing of the Object Storage back end. Benchmarking allows the operational staff to provide users with an understanding of the amount of time required for a stored or modified ob- ject to become available to the entire environment. Performance Determining the performance of a multi-site installation involves consider- ations that do not come into play in a single-site deployment. Being a dis- tributed deployment, multi-site deployments incur a few extra penalties to performance in certain situations. Since multi-site systems can be geographically separated, they may have worse than normal latency or jitter when communicating across regions. This can especially impact systems like the OpenStack Identity service when making authentication attempts from regions that do not contain the cen- tralized Identity implementation. It can also affect certain applications which rely on remote procedure call (RPC) for normal operation. An exam- ple of this can be seen in High Performance Computing workloads. Storage availability can also be impacted by the architecture of a multi-site deployment. A centralized Object Storage service requires more time for an object to be available to instances locally in regions where the object was not created. Some applications may need to be tuned to account for this effect. Block Storage does not currently have a method for replicat- ing data across multiple regions, so applications that depend on available block storage will need to manually cope with this limitation by creating duplicate block storage entries in each region. Security Securing a multi-site OpenStack installation also brings extra challenges. Tenants may expect a tenant-created network to be secure. In a multi-site installation the use of a non-private connection between sites may be re- quired. This may mean that traffic would be visible to third parties and, in cases where an application requires security, this issue will require mit- igation. Installing a VPN or encrypted connection between sites is recom- mended in such instances. Architecture Guide March 17, 2015 current 141 Another security consideration with regard to multi-site deployments is Identity. Authentication in a multi-site deployment should be centralized. Centralization provides a single authentication point for users across the deployment, as well as a single point of administration for traditional cre- ate, read, update and delete operations. Centralized authentication is al- so useful for auditing purposes because all authentication tokens originate from the same source. Just as tenants in a single-site deployment need isolation from each other, so do tenants in multi-site installations. The extra challenges in multi-site designs revolve around ensuring that tenant networks function across re- gions. Unfortunately, OpenStack Networking does not presently support a mechanism to provide this functionality, therefore an external system may be necessary to manage these mappings. Tenant networks may con- tain sensitive information requiring that this mapping be accurate and con- sistent to ensure that a tenant in one site does not connect to a different tenant in another site. OpenStack components Most OpenStack installations require a bare minimum set of pieces to function. These include the OpenStack Identity (keystone) for authentica- tion, OpenStack Compute (nova) for compute, OpenStack Image Service (glance) for image storage, OpenStack Networking (neutron) for network- ing, and potentially an object store in the form of OpenStack Object Stor- age (swift). Bringing multi-site into play also demands extra components in order to coordinate between regions. Centralized Identity service is neces- sary to provide the single authentication point. Centralized dashboard is al- so recommended to provide a single login point and a mapped experience to the API and CLI options available. If necessary, a centralized Object Stor- age service may be used and will require the installation of the swift proxy service. It may also be helpful to install a few extra options in order to facilitate certain use cases. For instance, installing designate may assist in automati- cally generating DNS domains for each region with an automatically-pop- ulated zone full of resource records for each instance. This facilitates using DNS as a mechanism for determining which region would be selected for certain applications. Another useful tool for managing a multi-site installation is Orchestration (heat). The Orchestration module allows the use of templates to define a set of instances to be launched together or for scaling existing sets. It can also be used to setup matching or differentiated groupings based on re- Architecture Guide March 17, 2015 current 142 gions. For instance, if an application requires an equally balanced number of nodes across sites, the same heat template can be used to cover each site with small alterations to only the region name. Operational considerations Deployment of a multi-site OpenStack cloud using regions requires that the service catalog contains per-region entries for each service deployed other than the Identity service itself. There is limited support amongst cur- rently available off-the-shelf OpenStack deployment tools for defining mul- tiple regions in this fashion. Deployers must be aware of this and provide the appropriate customiza- tion of the service catalog for their site either manually or via customiza- tion of the deployment tools in use. Note that, as of the Icehouse release, documentation for implementing this feature is in progress. See this bug for more information: https:// bugs.launchpad.net/openstack-manuals/+bug/1340509. Licensing Multi-site OpenStack deployments present additional licensing considera- tions over and above regular OpenStack clouds, particularly where site li- censes are in use to provide cost efficient access to software licenses. The licensing for host operating systems, guest operating systems, OpenStack distributions (if applicable), software-defined infrastructure including net- work controllers and storage systems, and even individual applications need to be evaluated in light of the multi-site nature of the cloud. Topics to consider include: • The specific definition of what constitutes a site in the relevant licenses, as the term does not necessarily denote a geographic or otherwise physi- cally isolated location in the traditional sense. • Differentiations between "hot" (active) and "cold" (inactive) sites where significant savings may be made in situations where one site is a cold standby for disaster recovery purposes only. • Certain locations might require local vendors to provide support and services for each site provides challenges, but will vary on the licensing agreement in place. Architecture Guide March 17, 2015 current 143 Logging and monitoring Logging and monitoring does not significantly differ for a multi-site Open- Stack cloud. The same well known tools described in the Logging and mon- itoring chapter of the Operations Guide remain applicable. Logging and monitoring can be provided both on a per-site basis and in a common cen- tralized location. When attempting to deploy logging and monitoring facilities to a central- ized location, care must be taken with regards to the load placed on the inter-site networking links. Upgrades In multi-site OpenStack clouds deployed using regions each site is, effec- tively, an independent OpenStack installation which is linked to the oth- ers by using centralized services such as Identity which are shared between sites. At a high level the recommended order of operations to upgrade an individual OpenStack environment is (see the Upgrades chapter of the Op- erations Guide for details): 1. Upgrade the OpenStack Identity service (keystone). 2. Upgrade the OpenStack Image Service (glance). 3. Upgrade OpenStack Compute (nova), including networking compo- nents. 4. Upgrade OpenStack Block Storage (cinder). 5. Upgrade the OpenStack dashboard (horizon). The process for upgrading a multi-site environment is not significantly dif- ferent: 1. Upgrade the shared OpenStack Identity service (keystone) deployment. 2. Upgrade the OpenStack Image Service (glance) at each site. 3. Upgrade OpenStack Compute (nova), including networking compo- nents, at each site. 4. Upgrade OpenStack Block Storage (cinder) at each site. Architecture Guide March 17, 2015 current 144 5. Upgrade the OpenStack dashboard (horizon), at each site or in the sin- gle central location if it is shared. Note that, as of the OpenStack Icehouse release, compute upgrades with- in each site can also be performed in a rolling fashion. Compute controller services (API, Scheduler, and Conductor) can be upgraded prior to upgrad- ing of individual compute nodes. This maximizes the ability of operations staff to keep a site operational for users of compute services while per- forming an upgrade. Quota management To prevent system capacities from being exhausted without notification, OpenStack provides operators with the ability to define quotas. Quotas are used to set operational limits and are currently enforced at the tenant (or project) level rather than at the user level. Quotas are defined on a per-region basis. Operators may wish to define identical quotas for tenants in each region of the cloud to provide a consis- tent experience, or even create a process for synchronizing allocated quo- tas across regions. It is important to note that only the operational limits imposed by the quotas will be aligned consumption of quotas by users will not be reflected between regions. For example, given a cloud with two regions, if the operator grants a us- er a quota of 25 instances in each region then that user may launch a total of 50 instances spread across both regions. They may not, however, launch more than 25 instances in any single region. For more information on managing quotas refer to the Managing projects and users chapter of the OpenStack Operators Guide. Policy management OpenStack provides a default set of Role Based Access Control (RBAC) poli- cies, defined in a policy.json file, for each service. Operators edit these files to customize the policies for their OpenStack installation. If the appli- cation of consistent RBAC policies across sites is considered a requirement, then it is necessary to ensure proper synchronization of the policy.json files to all installations. This must be done using normal system administration tools such as rsync as no functionality for synchronizing policies across regions is currently pro- vided within OpenStack. Architecture Guide March 17, 2015 current 145 Documentation Users must be able to leverage cloud infrastructure and provision new re- sources in the environment. It is important that user documentation is ac- cessible by users of the cloud infrastructure to ensure they are given suffi- cient information to help them leverage the cloud. As an example, by de- fault OpenStack will schedule instances on a compute node automatically. However, when multiple regions are available, it is left to the end user to decide in which region to schedule the new instance. The dashboard will present the user with the first region in your configuration. The API and CLI tools will not execute commands unless a valid region is specified. It is therefore important to provide documentation to your users describing the region layout as well as calling out that quotas are region-specific. If a user reaches his or her quota in one region, OpenStack will not automat- ically build new instances in another. Documenting specific examples will help users understand how to operate the cloud, thereby reducing calls and tickets filed with the help desk. Architecture This graphic is a high level diagram of a multiple site OpenStack architec- ture. Each site is an OpenStack cloud but it may be necessary to architect the sites on different versions. For example, if the second site is intended to be a replacement for the first site, they would be different. Another common design would be a private OpenStack cloud with replicated site that would be used for high availability or disaster recovery. The most im- portant design decision is how to configure the storage. It can be config- ured as a single shared pool or separate pools, depending on the user and technical requirements. Architecture Guide March 17, 2015 current 146 OpenStack services architecture The OpenStack Identity service, which is used by all other OpenStack com- ponents for authorization and the catalog of service endpoints, supports the concept of regions. A region is a logical construct that can be used to group OpenStack services that are in close proximity to one another. The concept of regions is flexible; it may can contain OpenStack service end- points located within a distinct geographic region, or regions. It may be smaller in scope, where a region is a single rack within a data center or even a single blade chassis, with multiple regions existing in adjacent racks in the same data center. The majority of OpenStack components are designed to run within the context of a single region. The OpenStack Compute service is designed to manage compute resources within a region, with support for subdivisions of compute resources by using availability zones and cells. The OpenStack Networking service can be used to manage network resources in the same broadcast domain or collection of switches that are linked. The OpenStack Block Storage service controls storage resources within a region with all storage resources residing on the same storage network. Like the Open- Stack Compute service, the OpenStack Block Storage service also supports the availability zone construct which can be used to subdivide storage re- sources. Architecture Guide March 17, 2015 current 147 The OpenStack dashboard, OpenStack Identity, and OpenStack Object Storage services are components that can each be deployed centrally in or- der to serve multiple regions. Storage With multiple OpenStack regions, having a single OpenStack Object Stor- age service endpoint that delivers shared file storage for all regions is de- sirable. The Object Storage service internally replicates files to multiple nodes. The advantages of this are that, if a file placed into the Object Stor- age service is visible to all regions, it can be used by applications or work- loads in any or all of the regions. This simplifies high availability failover and disaster recovery rollback. In order to scale the Object Storage service to meet the workload of mul- tiple regions, multiple proxy workers are run and load-balanced, storage nodes are installed in each region, and the entire Object Storage Service can be fronted by an HTTP caching layer. This is done so client requests for objects can be served out of caches rather than directly from the storage modules themselves, reducing the actual load on the storage network. In addition to an HTTP caching layer, use a caching layer like Memcache to cache objects between the proxy and storage nodes. If the cloud is designed without a single Object Storage Service endpoint for multiple regions, and instead a separate Object Storage Service end- point is made available in each region, applications are required to handle synchronization (if desired) and other management operations to ensure consistency across the nodes. For some applications, having multiple Ob- ject Storage Service endpoints located in the same region as the applica- tion may be desirable due to reduced latency, cross region bandwidth, and ease of deployment. For the Block Storage service, the most important decisions are the selec- tion of the storage technology and whether or not a dedicated network is used to carry storage traffic from the storage service to the compute nodes. Networking When connecting multiple regions together there are several design con- siderations. The overlay network technology choice determines how pack- ets are transmitted between regions and how the logical network and ad- dresses present to the application. If there are security or regulatory re- quirements, encryption should be implemented to secure the traffic be- Architecture Guide March 17, 2015 current 148 tween regions. For networking inside a region, the overlay network tech- nology for tenant networks is equally important. The overlay technology and the network traffic of an application generates or receives can be ei- ther complementary or be at cross purpose. For example, using an overlay technology for an application that transmits a large amount of small pack- ets could add excessive latency or overhead to each packet if not config- ured properly. Dependencies The architecture for a multi-site installation of OpenStack is dependent on a number of factors. One major dependency to consider is storage. When designing the storage system, the storage mechanism needs to be deter- mined. Once the storage type is determined, how it will be accessed is crit- ical. For example, we recommend that storage should use a dedicated net- work. Another concern is how the storage is configured to protect the da- ta. For example, the recovery point objective (RPO) and the recovery time objective (RTO). How quickly can the recovery from a fault be completed, will determine how often the replication of data be required. Ensure that enough storage is allocated to support the data protection strategy. Networking decisions include the encapsulation mechanism that will be used for the tenant networks, how large the broadcast domains should be, and the contracted SLAs for the interconnects. Prescriptive examples Based on the needs of the intended workloads, there are multiple ways to build a multi-site OpenStack installation. Below are example architec- tures based on different requirements. These examples are meant as a ref- erence, and not a hard and fast rule for deployments. Use the previous sec- tions of this chapter to assist in selecting specific components and imple- mentations based on specific needs. A large content provider needs to deliver content to customers that are geographically dispersed. The workload is very sensitive to latency and needs a rapid response to end-users. After reviewing the user, technical and operational considerations, it is determined beneficial to build a num- ber of regions local to the customer's edge. In this case rather than build a few large, centralized data centers, the intent of the architecture is to provide a pair of small data centers in locations that are closer to the cus- tomer. In this use case, spreading applications out allows for different hor- izontal scaling than a traditional compute workload scale. The intent is to scale by creating more copies of the application in closer proximity to the Architecture Guide March 17, 2015 current 149 users that need it most, in order to ensure faster response time to user re- quests. This provider will deploy two datacenters at each of the four cho- sen regions. The implications of this design are based around the method of placing copies of resources in each of the remote regions. Swift objects, Glance images, and block storage will need to be manually replicated into each region. This may be beneficial for some systems, such as the case of content service, where only some of the content needs to exist in some but not all regions. A centralized Keystone is recommended to ensure authen- tication and that access to the API endpoints is easily manageable. Installation of an automated DNS system such as Designate is highly rec- ommended. Unless an external Dynamic DNS system is available, applica- tion administrators will need a way to manage the mapping of which ap- plication copy exists in each region and how to reach it. Designate will as- sist by making the process automatic and by populating the records in the each region's zone. Telemetry for each region is also deployed, as each region may grow dif- ferently or be used at a different rate. Ceilometer will run to collect each region's metrics from each of the controllers and report them back to a central location. This is useful both to the end user and the administrator of the OpenStack environment. The end user will find this method useful, in that it is possible to determine if certain locations are experiencing high- er load than others, and take appropriate action. Administrators will also benefit by possibly being able to forecast growth per region, rather than expanding the capacity of all regions simultaneously, therefore maximizing the cost-effectiveness of the multi-site design. One of the key decisions of running this sort of infrastructure is whether or not to provide a redundancy model. Two types of redundancy and high availability models in this configuration will be implemented. The first type revolves around the availability of the central OpenStack components. Key- stone will be made highly available in three central data centers that will host the centralized OpenStack components. This prevents a loss of any one of the regions causing an outage in service. It also has the added ben- efit of being able to run a central storage repository as a primary cache for distributing content to each of the regions. The second redundancy topic is that of the edge data center itself. A sec- ond data center in each of the edge regional locations will house a second region near the first. This ensures that the application will not suffer de- graded performance in terms of latency and availability. This figure depicts the solution designed to have both a centralized set of core data centers for OpenStack services and paired edge data centers: Architecture Guide March 17, 2015 current 150 Geo-redundant load balancing A large-scale web application has been designed with cloud principles in mind. The application is designed provide service to application store, on a 24/7 basis. The company has typical 2-tier architecture with a web front- end servicing the customer requests and a NoSQL database back end stor- ing the information. As of late there has been several outages in number of major public cloud providers—usually due to the fact these applications were running out of a single geographical location. The design therefore should mitigate the chance of a single site causing an outage for their business. The solution would consist of the following OpenStack components: • A firewall, switches and load balancers on the public facing network connections. • OpenStack Controller services running, Networking, dashboard, Block Storage and Compute running locally in each of the three regions. The other services, Identity, Orchestration, Telemetry, Image Service and Object Storage will be installed centrally—with nodes in each of the re- gion providing a redundant OpenStack Controller plane throughout the globe. Architecture Guide March 17, 2015 current 151 • OpenStack Compute nodes running the KVM hypervisor. • OpenStack Object Storage for serving static objects such as images will be used to ensure that all images are standardized across all the regions, and replicated on a regular basis. • A Distributed DNS service available to all regions—that allows for dynam- ic update of DNS records of deployed instances. • A geo-redundant load balancing service will be used to service the re- quests from the customers based on their origin. An autoscaling heat template will used to deploy the application in the three regions. This template will include: • Web Servers, running Apache. • Appropriate user_data to populate the central DNS servers upon in- stance launch. • Appropriate Telemetry alarms that maintain state of the application and allow for handling of region or instance failure. Another autoscaling Heat template will be used to deploy a distributed MongoDB shard over the three locations—with the option of storing re- quired data on a globally available swift container. according to the usage and load on the database server—additional shards will be provisioned ac- cording to the thresholds defined in Telemetry. The reason that three regions were selected here was because of the fear of having abnormal load on a single region in the event of a failure. Two data center would have been sufficient had the requirements been met. Orchestration is used because of the built-in functionality of autoscaling and auto healing in the event of increased load. Additional configuration management tools, such as Puppet or Chef could also have been used in this scenario, but were not chosen due to the fact that Orchestration had the appropriate built-in hooks into the OpenStack cloud—whereas the oth- er tools were external and not native to OpenStack. In addition—since this deployment scenario was relatively straight forward—the external tools were not needed. OpenStack Object Storage is used here to serve as a back end for the Im- age Service since was the most suitable solution for a globally distributed storage solution—with its own replication mechanism. Home grown solu- tions could also have been used including the handling of replication—but Architecture Guide March 17, 2015 current 152 were not chosen, because Object Storage is already an intricate part of the infrastructure—and proven solution. An external load balancing service was used and not the LBaaS in Open- Stack because the solution in OpenStack is not redundant and does not have any awareness of geo location. Location-local service A common use for a multi-site deployment of OpenStack, is for creating a Content Delivery Network. An application that uses a location-local archi- tecture will require low network latency and proximity to the user, in order to provide an optimal user experience, in addition to reducing the cost of bandwidth and transit, since the content resides on sites closer to the cus- tomer, instead of a centralized content store that would require utilizing higher cost cross country links. This architecture usually includes a geo-location component that places us- er requests at the closest possible node. In this scenario, 100% redundan- cy of content across every site is a goal rather than a requirement, with the intent being to maximize the amount of content available that is within a minimum number of network hops for any given end user. Despite these differences, the storage replication configuration has significant overlap with that of a geo-redundant load balancing use case. Architecture Guide March 17, 2015 current 153 In this example, the application utilizing this multi-site OpenStack install that is location aware would launch web server or content serving in- stances on the compute cluster in each site. Requests from clients will first be sent to a global services load balancer that determines the location of the client, then routes the request to the closest OpenStack site where the application completes the request. Architecture Guide March 17, 2015 current 155 7. Hybrid Table of Contents User requirements .............................................................................. 156 Technical considerations ..................................................................... 162 Operational considerations ................................................................. 168 Architecture ....................................................................................... 170 Prescriptive examples ......................................................................... 174 Hybrid cloud, by definition, means that the design spans more than one cloud. An example of this kind of architecture may include a situation in which the design involves more than one OpenStack cloud (for example, an OpenStack-based private cloud and an OpenStack-based public cloud), or it may be a situation incorporating an OpenStack cloud and a non- OpenStack cloud (for example, an OpenStack-based private cloud that in- teracts with Amazon Web Services). Bursting into an external cloud is the practice of creating new instances to alleviate extra load where there is no available capacity in the private cloud. Some situations that could involve hybrid cloud architecture include: • Bursting from a private cloud to a public cloud • Disaster recovery • Development and testing • Federated cloud, enabling users to choose resources from multiple providers • Hybrid clouds built to support legacy systems as they transition to cloud As a hybrid cloud design deals with systems that are outside of the con- trol of the cloud architect or organization, a hybrid cloud architecture re- quires considering aspects of the architecture that might not have other- wise been necessary. For example, the design may need to deal with hard- ware, software, and APIs under the control of a separate organization. Similarly, the degree to which the architecture is OpenStack-based will have an effect on the cloud operator or cloud consumer's ability to accom- plish tasks with native OpenStack tools. By definition, this is a situation in which no single cloud can provide all of the necessary functionality. In Architecture Guide March 17, 2015 current 156 order to manage the entire system, users, operators and consumers will need an overarching tool known as a cloud management platform (CMP). Any organization that is working with multiple clouds already has a CMP, even if that CMP is the operator who logs into an external web portal and launches a public cloud instance. There are commercially available options, such as Rightscale, and open source options, such as ManageIQ (http://manageiq.org), but there is no single CMP that can address all needs in all scenarios. Whereas most of the sections of this book talk about the aspects of OpenStack, an archi- tect needs to consider when designing an OpenStack architecture. This sec- tion will also discuss the things the architect must address when choosing or building a CMP to run a hybrid cloud design, even if the CMP will be a manually built solution. User requirements Hybrid cloud architectures introduce additional complexities, particularly those that use heterogeneous cloud platforms. As a result, it is important to make sure that design choices match requirements in such a way that the benefits outweigh the inherent additional complexity and risks. Business considerations to make when designing a hybrid cloud deploy- ment include: Cost A hybrid cloud architecture involves multiple vendors and technical architec- tures. These architectures may be more expensive to deploy and maintain. Op- erational costs can be higher because of the need for more sophisticated or- chestration and brokerage tools than in other architectures. In contrast, over- all operational costs might be lower by virtue of using a cloud brokerage tool to deploy the workloads to the most cost effective platform. Revenue opportunity Revenue opportunities vary greatly based on the intent and use case of the cloud. If it is being built as a commer- cial customer-facing product, consider the drivers for building it over multiple platforms and whether the use of mul- Architecture Guide March 17, 2015 current 157 tiple platforms make the design more attractive to target customers, thus en- hancing the revenue opportunity. Time to market One of the most common reasons to use cloud platforms is to speed the time to market of a new product or ap- plication. A business requirement to use multiple cloud platforms may be because there is an existing investment in several applications and it is faster to tie them together rather than migrat- ing components and refactoring to a single platform. Business or technical diversity Organizations already leveraging cloud- based services may wish to embrace business diversity and utilize a hybrid cloud design to spread their workloads across multiple cloud providers so that no application is hosted in a single cloud provider. Application momentum A business with existing applications that are already in production on mul- tiple cloud environments may find that it is more cost effective to integrate the applications on multiple cloud plat- forms rather than migrate them to a single platform. Legal requirements Many jurisdictions have legislative and regulatory requirements governing the storage and management of data in cloud environments. Common ar- eas of regulation include: • Data retention policies ensuring storage of persistent data and records management to meet data archival requirements. • Data ownership policies governing the possession and responsibility for data. • Data sovereignty policies governing the storage of data in foreign coun- tries or otherwise separate jurisdictions. Architecture Guide March 17, 2015 current 158 • Data compliance policies governing certain types of information needs to reside in certain locations due to regular issues and, more important- ly, cannot reside in other locations for the same reason. Examples of such legal frameworks include the data protection framework of the European Union (http://ec.europa.eu/justice/data-protection/) and the requirements of the Financial Industry Regulatory Authority (http:// www.finra.org/Industry/Regulation/FINRARules/) in the United States. Consult a local regulatory body for more information. Workload considerations Defining what the word "workload" means in the context of a hybrid cloud environment is important. Workload can be defined as the intended way the systems will be utilized, which is often referred to as a "use case". A workload can be a single application or a suite of applications that work in concert. It can also be a duplicate set of applications that need to run on multiple cloud environments. In a hybrid cloud deployment, the same workload will often need to function equally well on radically different public and private cloud environments. The architecture needs to address these potential conflicts, complexity, and platform incompatibilities. Some possible use cases for a hybrid cloud architecture include: • Dynamic resource expansion or "bursting": Another common reason to use a multiple cloud architecture is a "bursty" application that needs ad- ditional resources at times. An example of this case could be a retailer that needs additional resources during the holiday selling season, but does not want to build expensive cloud resources to meet the peak de- mand. They might have an OpenStack private cloud but want to burst to AWS or some other public cloud for these peak load periods. These bursts could be for long or short cycles ranging from hourly, monthly or yearly. • Disaster recovery-business continuity: The cheaper storage and instance management makes a good case for using the cloud as a secondary site. The public cloud is already heavily used for these purposes in combina- tion with an OpenStack public or private cloud. • Federated hypervisor-instance management: Adding self-service, charge back and transparent delivery of the right resources from a federated pool can be cost effective. In a hybrid cloud environment, this is a partic- ularly important consideration. Look for a cloud that provides cross-plat- form hypervisor support and robust instance management tools. Architecture Guide March 17, 2015 current 159 • Application portfolio integration: An enterprise cloud delivers better ap- plication portfolio management and more efficient deployment by lever- aging self-service features and rules for deployments based on types of use. A common driver for building hybrid cloud architecture is to stitch together multiple existing cloud environments that are already in pro- duction or development. • Migration scenarios: A common reason to create a hybrid cloud architec- ture is to allow the migration of applications between different clouds. This may be because the application will be migrated permanently to a new platform, or it might be because the application needs to be sup- ported on multiple platforms going forward. • High availability: Another important reason for wanting a multiple cloud architecture is to address the needs for high availability. By using a com- bination of multiple locations and platforms, a design can achieve a lev- el of availability that is not possible with a single platform. This approach does add a significant amount of complexity. In addition to thinking about how the workload will work on a single cloud, the design must accommodate the added complexity of needing the workload to run on multiple cloud platforms. The complexity of trans- ferring workloads across clouds needs to be explored at the application, in- stance, cloud platform, hypervisor, and network levels. Tools considerations When working with designs spanning multiple clouds, the design must in- corporate tools to facilitate working across those multiple clouds. Some of the user requirements drive the need for tools that will do the following functions: • Broker between clouds: Since the multiple cloud architecture assumes that there will be at least two different and possibly incompatible plat- forms that are likely to have different costs, brokering software is de- signed to evaluate relative costs between different cloud platforms. These solutions are sometimes referred to as Cloud Management Plat- forms (CMPs). Examples include Rightscale, Gravitent, Scalr, CloudForms, and ManageIQ. These tools allow the designer to determine the right lo- cation for the workload based on predetermined criteria. • Facilitate orchestration across the clouds: CMPs are tools are used to tie everything together. Cloud orchestration tools are used to improve the management of IT application portfolios as they migrate onto public, Architecture Guide March 17, 2015 current 160 private, and hybrid cloud platforms. These tools are an important con- sideration. Cloud orchestration tools are used for managing a diverse portfolio of installed systems across multiple cloud platforms. The typi- cal enterprise IT application portfolio is still comprised of a few thousand applications scattered over legacy hardware, virtualized infrastructure, and now dozens of disjointed shadow public Infrastructure-as-a-Service (IaaS) and Software-as-a-Service (SaaS) providers and offerings. Network considerations The network services functionality is an important factor to assess when choosing a CMP and cloud provider. Considerations are functionality, secu- rity, scalability and HA. Verification and ongoing testing of the critical fea- tures of the cloud endpoint used by the architecture are important tasks. • Once the network functionality framework has been decided, a mini- mum functionality test should be designed. This will ensure testing and functionality persists during and after upgrades. • Scalability across multiple cloud providers may dictate which underlying network framework you will choose in different cloud providers. It is im- portant to have the network API functions presented and to verify that functionality persists across all cloud endpoints chosen. • High availability implementations vary in functionality and design. Ex- amples of some common methods are active-hot-standby, active-passive and active-active. High availability and a test framework needs to be de- veloped to insure that the functionality and limitations are well under- stood. • Security considerations include how data is secured between client and endpoint and any traffic that traverses the multiple clouds, from eaves- dropping to DoS activities. Risk mitigation and management considerations Hybrid cloud architectures introduce additional risk because they add addi- tional complexity and potentially conflicting or incompatible components or tools. However, they also reduce risk by spreading workloads over mul- tiple providers. This means, if one was to go out of business, the organiza- tion could remain operational. Risks that will be heightened by using a hybrid cloud architecture include: Architecture Guide March 17, 2015 current 161 Provider availability or imple- mentation details This can range from the company go- ing out of business to the company changing how it delivers its services. Cloud architectures are inherently de- signed to be flexible and changeable; paradoxically, the cloud is both per- ceived to be rock solid and ever flexible at the same time. Differing SLAs Users of hybrid cloud environments potentially encounter some losses through differences in service level agreements. A hybrid cloud design needs to accommodate the different SLAs provided by the various clouds involved in the design, and must ad- dress the actual enforceability of the providers' SLAs. Security levels Securing multiple cloud environments is more complex than securing a sin- gle cloud environment. Concerns need to be addressed at, but not limited to, the application, network, and cloud platform levels. One issue is that differ- ent cloud platforms approach securi- ty differently, and a hybrid cloud de- sign must address and compensate for differences in security approaches. For example, AWS uses a relatively simple model that relies on user privilege com- bined with firewalls. Provider API changes APIs are crucial in a hybrid cloud envi- ronment. As a consumer of a provider's cloud services, an organization will rarely have any control over provider changes to APIs. Cloud services that might have previously had compatible APIs may no longer work. This is partic- ularly a problem with AWS and Open- Stack AWS-compatible APIs. Open- Stack was originally planned to main- tain compatibility with changes in AWS Architecture Guide March 17, 2015 current 162 APIs. However, over time, the APIs have become more divergent in func- tionality. One way to address this issue is to focus on using only the most com- mon and basic APIs to minimize poten- tial conflicts. Technical considerations A hybrid cloud environment requires inspection and understanding of technical issues that are not only outside of an organization's data center, but potentially outside of an organization's control. In many cases, it is nec- essary to ensure that the architecture and CMP chosen can adapt to, not only to different environments, but also to the possibility of change. In this situation, applications are crossing diverse platforms and are likely to be lo- cated in diverse locations. All of these factors will influence and add com- plexity to the design of a hybrid cloud architecture. The only situation where cloud platform incompatibilities are not going to be an issue is when working with clouds that are based on the same ver- sion and the same distribution of OpenStack. Otherwise incompatibilities are virtually inevitable. Incompatibility should be less of an issue for clouds that exclusively use the same version of OpenStack, even if they use different distributions. The newer the distribution in question, the less likely it is that there will be in- compatibilities between version. This is due to the fact that the OpenStack community has established an initiative to define core functions that need to remain backward compatible between supported versions. The DefCore initiative defines basic functions that every distribution must support in or- der to bear the name "OpenStack". Some vendors, however, add proprietary customizations to their distri- butions. If an application or architecture makes use of these features, it will be difficult to migrate to or use other types of environments. Anyone considering incorporating older versions of OpenStack prior to Havana should consider carefully before attempting to incorporate functionality between versions. Internal differences in older versions may be so great that the best approach might be to consider the versions to be essentially diverse platforms, as different as OpenStack and Amazon Web Services or Microsoft Azure. The situation is more predictable if using different cloud platforms is incor- porated from inception. If the other clouds are not based on OpenStack, Architecture Guide March 17, 2015 current 163 then all pretense of compatibility vanishes, and CMP tools must account for the myriad of differences in the way operations are handled and ser- vices are implemented. Some situations in which these incompatibilities can arise include differences between the way in which a cloud: • Deploys instances • Manages networks • Treats applications • Implements services Capacity planning One of the primary reasons many organizations turn to a hybrid cloud sys- tem is to increase capacity without having to make large capital invest- ments. However, capacity planning is still necessary when designing an OpenStack installation even if it is augmented with external clouds. Specifically, overall capacity and placement of workloads need to be ac- counted for when designing for a mostly internally-operated cloud with the occasional capacity burs. The long-term capacity plan for such a design needs to incorporate growth over time to prevent the need to permanent- ly burst into, and occupy, a potentially more expensive external cloud. In order to avoid this scenario, account for the future applications and capaci- ty requirements and plan growth appropriately. One of the drawbacks of capacity planning is unpredictability. It is difficult to predict the amount of load a particular application might incur if the number of users fluctuates or the application experiences an unexpected increase in popularity. It is possible to define application requirements in terms of vCPU, RAM, bandwidth or other resources and plan appropriate- ly, but other clouds may not use the same metric or even the same over- subscription rates. Oversubscription is a method to emulate more capacity than they may physically be present. For example, a physical hypervisor node with 32 GB RAM may host 24 instances, each provisioned with 2 GB RAM. As long as all 24 of them are not concurrently utilizing 2 full gigabytes, this arrange- ment is a non-issue. However, some hosts take oversubscription to ex- tremes and, as a result, performance can frequently be inconsistent. If at all possible, determine what the oversubscription rates of each host are and plan capacity accordingly. Architecture Guide March 17, 2015 current 164 Security The nature of a hybrid cloud environment removes complete control over the infrastructure. Security becomes a stronger requirement because da- ta or applications may exist in a cloud that is outside of an organization's control. Security domains become an important distinction when planning for a hybrid cloud environment and its capabilities. A security domain com- prises users, applications, servers or networks that share common trust re- quirements and expectations within a system. The security domains are: 1. Public 2. Guest 3. Management 4. Data These security domains can be mapped individually to the organization's installation or combined. For example, some deployment topologies com- bine both guest and data domains onto one physical network, whereas other topologies may physically separate these networks. In each case, the cloud operator should be aware of the appropriate security concerns. Se- curity domains should be mapped out against the specific OpenStack de- ployment topology. The domains and their trust requirements depend up- on whether the cloud instance is public, private, or hybrid. The public security domain is an entirely untrusted area of the cloud infras- tructure. It can refer to the Internet as a whole or simply to networks over which an organization has no authority. This domain should always be con- sidered untrusted. When considering hybrid cloud deployments, any traffic traversing beyond and between the multiple clouds should always be con- sidered to reside in this security domain and is therefore untrusted. Typically used for instance-to-instance traffic within a single data center, the guest security domain handles compute data generated by instances on the cloud but not services that support the operation of the cloud such as API calls. Public cloud providers that are used in a hybrid cloud configu- ration which an organization does not control and private cloud providers who do not have stringent controls on instance use or who allow unre- stricted Internet access to instances should consider this domain to be un- trusted. Private cloud providers may consider this network as internal and therefore trusted only if there are controls in place to assert that instances and tenants are trusted. Architecture Guide March 17, 2015 current 165 The management security domain is where services interact. Sometimes referred to as the "control plane", the networks in this domain transport confidential data such as configuration parameters, user names, and pass- words. In deployments behind an organization's firewall, this domain is considered trusted. In a public cloud model which could be part of an ar- chitecture, this would have to be assessed with the public cloud provider to understand the controls in place. The data security domain is concerned primarily with information pertain- ing to the storage services within OpenStack. Much of the data that cross- es this network has high integrity and confidentiality requirements and de- pending on the type of deployment there may also be strong availability requirements. The trust level of this network is heavily dependent on de- ployment decisions and as such this is not assigned a default level of trust. Consideration must be taken when managing the users of the system, whether operating or utilizing public or private clouds. The identity service allows for LDAP to be part of the authentication process. Including such systems in your OpenStack deployments may ease user management if in- tegrating into existing systems. Be mindful when utilizing 3rd party clouds to explore authentication options applicable to the installation to help manage and keep user authentication consistent. Due to the process of passing user names, passwords, and generated to- kens between client machines and API endpoints, placing API services be- hind hardware that performs SSL termination is strongly recommended. Within cloud components themselves, another component that needs se- curity scrutiny is the hypervisor. In a public cloud, organizations typically do not have control over the choice of hypervisor. (Amazon uses its own particular version of Xen, for example.) In some cases, hypervisors may be vulnerable to a type of attack called "hypervisor breakout" if they are not properly secured. Hypervisor breakout describes the event of a com- promised or malicious instance breaking out of the resource controls of the hypervisor and gaining access to the bare metal operating system and hardware resources. If the security of instances is not considered important, there may not be an issue. In most cases, however, enterprises need to avoid this kind of vul- nerability, and the only way to do that is to avoid a situation in which the instances are running on a public cloud. That does not mean that there is a need to own all of the infrastructure on which an OpenStack installation operates; it suggests avoiding situations in which hardware may be shared with others. Architecture Guide March 17, 2015 current 166 There are other services worth considering that provide a bare metal in- stance instead of a cloud. In other cases, it is possible to replicate a second private cloud by integrating with a private Cloud-as-a-Service deployment, in which an organization does not buy hardware, but also does not share it with other tenants. It is also possible use a provider that hosts a bare- metal "public" cloud instance for which the hardware is dedicated only to one customer, or a provider that offers private Cloud-as-a-Service. Finally, it is important to realize that each cloud implements services differ- ently. What keeps data secure in one cloud may not do the same in anoth- er. Be sure to know the security requirements of every cloud that handles the organization's data or workloads. More information on OpenStack Security can be found in the OpenStack Security Guide. Utilization When it comes to utilization, it is important that the CMP understands what workloads are running, where they are running, and their preferred utilizations. For example, in most cases it is desirable to run as many work- loads internally as possible, utilizing other resources only when necessary. On the other hand, situations exist in which the opposite is true. The inter- nal cloud may only be for development and stressing it is undesirable. In most cases, a cost model of various scenarios helps with this decision, how- ever this analysis is heavily influenced by internal priorities. The important thing is the ability to efficiently make those decisions on a programmatic basis. The Telemetry module (ceilometer) is designed to provide information on the usage of various OpenStack components. There are two limitations to consider: first, if there is to be a large amount of data (for example, if monitoring a large cloud, or a very active one) it is desirable to use a NoSQL back end for Ceilometer, such as MongoDB. Second, when connect- ing to a non-OpenStack cloud, there will need to be a way to monitor that usage and to provide that monitoring data back to the CMP. Performance Performance is of primary importance in the design of a cloud. When it comes to a hybrid cloud deployment, many of the same issues for multi-site deployments apply, such as network latency between sites. It is also impor- tant to think about the speed at which a workload can be spun up in an- other cloud, and what can be done to reduce the time necessary to accom- Architecture Guide March 17, 2015 current 167 plish that task. That may mean moving data closer to applications, or con- versely, applications closer to the data they process. It may mean grouping functionality so that connections that require low latency take place over a single cloud rather than spanning clouds. That may also mean ensuring that the CMP has the intelligence to know which cloud can most efficiently run which types of workloads. As with utilization, native OpenStack tools are available to assist. Ceilome- ter can measure performance and, if necessary, the Orchestration module can be used to react to changes in demand by spinning up more resources. It is important to note, however, that Orchestration requires special con- figurations in the client to enable functioning with solution offerings from Amazon Web Services. When dealing with other types of clouds, it is neces- sary to rely on the features of the CMP. Components The number and types of native OpenStack components that are available for use is dependent on whether the deployment is exclusively an Open- Stack cloud or not. If so, all of the OpenStack components will be available for use, and in many ways the issues that need to be considered will be similar to those that need to be considered for a multi-site deployment. That said, in any situation in which more than one cloud is being used, at least four OpenStack tools will be considered: • OpenStack Compute (nova): Regardless of deployment location, hyper- visor choice has a direct effect on how difficult it is to integrate with one or more additional clouds. For example, integrating a Hyper-V based OpenStack cloud with Azure will have less compatibility issues than if KVM is used. • Networking: Whether OpenStack Networking (neutron) or legacy net- working (nova-network) is used, the network is one place where inte- gration capabilities need to be understood in order to connect between clouds. • Telemetry module (ceilometer): Use of Telemetry depends, in large part, on what the other parts of the cloud are using. • Orchestration module (heat): Similarly, Orchestration can be a valu- able tool in orchestrating tasks a CMP decides are necessary in an Open- Stack-based cloud. Architecture Guide March 17, 2015 current 168 Special considerations Hybrid cloud deployments also involve two more issues that are not com- mon in other situations: Image portability: Note that, as of the Icehouse release, there is no single common image format that is usable by all clouds. This means that images will need to be converted or recreated when porting between clouds. To make things simpler, launch the smallest and simplest images feasible, in- stalling only what is necessary preferably using a deployment manager such as Chef or Puppet. That means not to use golden images for speeding up the process, however if the same images are being repeatedly deployed it may make more sense to utilize this technique instead of provisioning applications on lighter images each time. API differences: The most profound issue that cannot be avoided when us- ing a hybrid cloud deployment with more than just OpenStack (or with dif- ferent versions of OpenStack) is that the APIs needed to perform certain functions are different. The CMP needs to know how to handle all neces- sary versions. To get around this issue, some implementers build portals to achieve a hybrid cloud environment, but a heavily developer-focused organization will get more use out of a hybrid cloud broker SDK such as jClouds. Operational considerations Hybrid cloud deployments present complex operational challenges. There are several factors to consider that affect the way each cloud is deployed and how users and operators will interact with each cloud. Each cloud provider implements infrastructure components differently. This can lead to incompatible interactions with workloads, or a specific Cloud Manage- ment Platform (CMP). Different cloud providers may also offer different levels of integration with competing cloud offerings. Monitoring is an important aspect to consider when selecting a CMP. Gain- ing valuable insight into each cloud is critical to gaining a holistic view of all involved clouds. It is vital to determine whether an existing CMP supports monitoring of all the clouds involved, or if compatible APIs are available to be queried for necessary information. Gather all the information about each cloud, you can now take appropriate actions on the offline data to avoid impacting workloads. Architecture Guide March 17, 2015 current 169 Agility The implemention of a hybrid cloud solution provides application avail- ability across different cloud environments and technologies. This avail- ability enables the deployment to survive disaster in any single cloud envi- ronment. Each cloud should provide the means to quickly spin up new in- stances in the case of capacity issues or complete unavailability of a single cloud installation. Application readiness It is important to understand the type of application workload that is to be deployed across a hybrid cloud environment. Enterprise workloads that depend on the underlying infrastructure for availability are not designed to run on OpenStack. Although these types of applications can run on an OpenStack cloud, if the application is not able to tolerate infrastructure failures, it is likely to require significant operator intervention to recover. However, cloud workloads are designed to handle fault tolerance. The SLA of the application is not tied to the underlying infrastructure. Ideally, cloud applications are designed to recover when entire racks and even data cen- ters full of infrastructure experience an outage. Upgrades If the deployment includes a public cloud, predicting upgrades may not be possible. Examine the advertised SLA for any public cloud provider being used. Note At massive scale, even when dealing with a cloud that offers an SLA with a high percentage of uptime, workloads must be able to recover at short notice. When upgrading private cloud deployments, care must be taken to mini- mize disruption by making incremental changes and providing a facility to either rollback or continue to roll forward when using a continuous deliv- ery model. Upgrades to the CMP may need to be completed in coordination with any of the hybrid cloud upgrades. This is necessary whenever API changes are made. Architecture Guide March 17, 2015 current 170 Network Operation Center It is important to recognize control over infrastructure particulates when planning the Network Operation Center (NOC) for a hybrid cloud envi- ronment. If a significant portion of the cloud is on externally managed systems, prepare for situations where it may not be possible to make changes. Additionally, situations of conflict may arise in which multiple providers have differing points of view on the way infrastructure must be managed and exposed. This can lead to delays in root cause and analysis where each insists the blame lies with the other provider. It is important to ensure that the structure put in place enables connection of the networking of both clouds to form an integrated system, keeping in mind the state of handoffs. These handoffs must both be as reliable as possible and include as little latency as possible to ensure the best perfor- mance of the overall system. Maintainability Operating hybrid clouds is a situation in which there is a greater reliance on third party systems and processes. As a result of a lack of control of various pieces of a hybrid cloud environment, it is not possible to guaran- tee proper maintenance of the overall system. Instead, the user must be prepared to abandon workloads and spin them up again in an improved state. Having a hybrid cloud deployment does, however, provide agility for these situations by allowing the migration of workloads to alternative clouds in response to cloud-specific issues. Architecture Once business and application requirements have been defined, the first step for designing a hybrid cloud solution is to map out the dependencies between the expected workloads and the diverse cloud infrastructures that need to support them. By mapping the applications and the targeted cloud environments, you can architect a solution that enables the broad- est compatibility between cloud platforms and minimizes the need to cre- ate workarounds and processes to fill identified gaps. Note the evaluation of the monitoring and orchestration APIs available on each cloud platform and the relative levels of support for them in the chosen cloud manage- ment platform. Architecture Guide March 17, 2015 current 171 Image portability The majority of cloud workloads currently run on instances using hypervi- sor technologies such as KVM, Xen, or ESXi. The challenge is that each of these hypervisors use an image format that is mostly, or not at all, compat- ible with one another. In a private or hybrid cloud solution, this can be mit- igated by standardizing on the same hypervisor and instance image for- mat but this is not always feasible. This is particularly evident if one of the clouds in the architecture is a public cloud that is outside of the control of the designers. There are conversion tools such as virt-v2v (http://libguestfs.org/virt-v2v) and virt-edit (http://libguestfs.org/virt-edit.1.html) that can be used in those scenarios but they are often not suitable beyond very basic cloud in- stance specifications. An alternative is to build a thin operating system im- age as the base for new instances. This facilitates rapid creation of cloud instances using cloud orchestration or configuration management tools, driven by the CMP, for more specific templating. Another more expensive option is to use a commercial image migration tool. The issue of image portability is not just for a one time migration. If the intention is to use the multiple cloud for disaster recovery, application diversity or high availabili- ty, the images and instances are likely to be moved between the different cloud platforms regularly. Upper-layer services Many clouds offer complementary services over and above the basic com- pute, network, and storage components. These additional services are of- ten used to simplify the deployment and management of applications on a cloud platform. Architecture Guide March 17, 2015 current 172 Consideration is required to be given to moving workloads that may have upper-layer service dependencies on the source cloud platform to a desti- nation cloud platform that may not have a comparable service. Converse- ly, the user can implement it in a different way or by using a different tech- nology. For example, moving an application that uses a NoSQL database service such as MongoDB that is delivered as a service on the source cloud, to a destination cloud that does not offer that service or may only use a re- lational database such as MySQL, could cause difficulties in maintaining the application between the platforms. There are a number of options that might be appropriate for the hybrid cloud use case: • Create a baseline of upper-layer services that are implemented across all of the cloud platforms. For platforms that do not support a given ser- vice, create a service on top of that platform and apply it to the work- loads as they are launched on that cloud. For example, through the Database Service for OpenStack (trove), OpenStack supports MySQL as a service but not NoSQL databases in production. To either move from or run alongside AWS, a NoSQL workload must use an automation tool, such as the Orchestration module (heat), to recreate the NoSQL database on top of OpenStack. • Deploy a Platform-as-a-Service (PaaS) technology such as Cloud Foundry or OpenShift that abstracts the upper-layer services from the underlying cloud platform. The unit of application deployment and migration is the PaaS and leverages the services of the PaaS and only consumes the base infrastructure services of the cloud platform. The downside to this ap- proach is that the PaaS itself then potentially becomes a source of lock- in. • Use only the base infrastructure services that are common across all cloud platforms. Use automation tools to create the required upper-lay- er services which are portable across all cloud platforms. For example, in- stead of using any database services that are inherent in the cloud plat- forms, launch cloud instances and deploy the databases on to those in- stances using scripts or various configuration and application deploy- ment tools. Network services Network services functionality is a significant barrier for multiple cloud ar- chitectures. It could be an important factor to assess when choosing a CMP and cloud provider. Considerations are: functionality, security, scalabili- Architecture Guide March 17, 2015 current 173 ty and high availability (HA). Verification and ongoing testing of the criti- cal features of the cloud endpoint used by the architecture are important tasks. • Once the network functionality framework has been decided, a mini- mum functionality test should be designed to confirm that the function- ality is in fact compatible. This will ensure testing and functionality per- sists during and after upgrades. Note that over time, the diverse cloud platforms are likely to de-synchronize if care is not taken to maintain compatibility. This is a particular issue with APIs. • Scalability across multiple cloud providers may dictate which underlying network framework is chosen for the different cloud providers. It is im- portant to have the network API functions presented and to verify that the desired functionality persists across all chosen cloud endpoint. • High availability (HA) implementations vary in functionality and design. Examples of some common methods are active-hot-standby, active-pas- sive and active-active. High availability and a test framework need to be developed to insure that the functionality and limitations are well under- stood. • Security considerations, such as how data is secured between client and endpoint and any traffic that traverses the multiple clouds, from eaves- dropping to DoS activities must be addressed. Business and regulatory requirements dictate the security approach that needs to be taken. Data Replication has been the traditional method for protecting object store implementations. A variety of different implementations have existed in storage architectures. Examples of this are both synchronous and asyn- chronous mirroring. Most object stores and back-end storage systems have a method for replication that can be implemented at the storage subsys- tem layer. Object stores also have implemented replication techniques that can be tailored to fit a clouds needs. An organization must find the right balance between data integrity and data availability. Replication strategy may also influence the disaster recovery methods implemented. Replication across different racks, data centers and geographical regions has led to the increased focus of determining and ensuring data locality. The ability to guarantee data is accessed from the nearest or fastest stor- age can be necessary for applications to perform well. Examples of this are Hadoop running in a cloud. The user either runs with a native HDFS, when Architecture Guide March 17, 2015 current 174 applicable, or on a separate parallel file system such as those provided by Hitachi and IBM. Special consideration should be taken when running em- bedded object store methods to not cause extra data replication, which can create unnecessary performance issues. Another example of ensuring data locality is by using Ceph. Ceph has a data container abstraction called a pool. Pools can be created with replicas or erasure code. Replica based pools can also have a rule set defined to have data written to a "local" set of hardware which would be the primary access and modification point. Prescriptive examples Multi-cloud environments are designed for these use cases: • Bursting workloads from private to public OpenStack clouds • Bursting workloads from private to public non-OpenStack clouds • High availability across clouds (for technical diversity) This chapter discusses examples of environments that address each of these use cases. Company A's data center is running dangerously low on capacity. The op- tion of expanding the data center will not be possible in the foreseeable future. In order to accommodate the continuously growing need for de- velopment resources in the organisation, Company A decided to use re- sources in the public cloud. Company A has an established data center with a substantial amount of hardware. Migrating the workloads to a public cloud is not feasible. The company has an internal cloud management platform that will direct requests to the appropriate cloud, depending on the local capacity. Note This is a custom in-house application written for this specific purpose. This solution is described in the figure below. Architecture Guide March 17, 2015 current 175 This example shows two clouds, with a Cloud Management Platform (CMP) connecting them. Note This guide does not attempt to cover a specific CMP, but de- scribes how the Orchestration and Telemetry services handle, manage, and control workloads. This is shown in the diagram above. The private OpenStack cloud has at least one controller, and at least one compute node. It includes metering provided by the Telemetry module. The Telemetry module captures the load increase, and the CMP processes the information. If there is available capacity, the CMP uses the OpenStack API to call the Orchestration service. This creates instances on the private cloud in response to user requests. When capacity is not available on the private cloud, the CMP issues a request to the Orchestration service API of the public cloud. This creates the instance on the public cloud. In this example, "Company A" decided not to direct the deployment to an external public cloud over concerns regarding resource control, security, and increase operational expense Bursting to a public non-OpenStack cloud The second example looks into bursting workloads from the private cloud into a non-OpenStack public cloud using Amazon Web Services (AWS) to take advantage of additional capacity and scale applications. For an OpenStack-to-AWS hybrid cloud, the architecture looks similar to the figure below: Architecture Guide March 17, 2015 current 176 Company B states that the developers were already using AWS and did not want to change the cloud provider. If the CMP is capable of connecting an external cloud provider with the ap- propriate API, the workflow process will remain the same as the previous scenario. The actions the CMP takes such as monitoring loads, and creat- ing new instances, stay the same. However, the CMP will perform actions in the public cloud using applicable API calls. If the public cloud is AWS, the CMP would use the EC2 API to create a new instance and assign an Elastic IP. That IP can then be added to HAProxy in the private cloud. The CMP can also reference AWS-specific tools such as CloudWatch and CloudFormation. Several open source tool kits for building CMPs are available and can han- dle this kind of translation. This includes ManageIQ, jClouds, and Jump- Gate. High availability/disaster recovery Company C requires their local data center to be able to recover from fail- ure. Some of the workloads currently in use are running on their private OpenStack cloud. Protecting the data involves Block Storage, Object Stor- age, and a database. The architecture is designed to support the failure of large components of the system, yet ensuring that the system will continue to deliver services. While the services remain available to users, the failed components are restored in the background based on standard best prac- tice DR policies. To achieve the objectives, data is replicated to a second cloud, in a geographically distant location. The logical diagram of the sys- tem is described in the figure below: Architecture Guide March 17, 2015 current 177 This example includes two private OpenStack clouds connected with a CMP. The source cloud, OpenStack Cloud 1, includes a controller and at least one instance running MySQL. It also includes at least one Block Stor- age volume and one Object Storage volume. This is so that the data is available to the users at all times. The details of the method for protecting each of these sources of data differs. Object Storage relies on the replication capabilities of the Object Storage provider. OpenStack Object Storage is enabled so that it creates geograph- ically separated replicas that take advantage of this feature. It is config- ured so that at least one replica exists in each cloud. In order to make this work, a single array spanning both clouds is configured with OpenStack Identity. Using Federated Identity, it talks to both clouds, communicating with OpenStack Object Storage through the Swift proxy. For Block Storage, the replication is a little more difficult, and involves tools outside of OpenStack itself. The OpenStack Block Storage volume is not set as the drive itself but as a logical object that points to a phys- ical back end. The disaster recovery is configured for Block Storage for synchronous backup for the highest level of data protection, but asyn- chronous backup could have been set as an alternative that is not as laten- cy sensitive. For asynchronous backup, the Block Storage API makes it pos- sible to export the data and also the metadata of a particular volume, so that it can be moved and replicated elsewhere. More information can be found here: https://blueprints.launchpad.net/cinder/+spec/cinder-back- up-volume-metadata-support. Architecture Guide March 17, 2015 current 178 The synchronous backups create an identical volume in both clouds and chooses the appropriate flavor so that each cloud has an identical back end. This was done by creating volumes through the CMP. The CMP knows to create identical volumes in both clouds. Once this is configured, a solu- tion, involving DRDB, is used to synchronize the actual physical drives. The database component is backed up using synchronous backups. MySQL does not support geographically diverse replication, so disaster recovery is provided by replicating the file itself. As it is not possible to use Object Storage as the back end of a database like MySQL, Swift replication was not an option. It was decided not to store the data on another geo-tiered storage system, such as Ceph, as Block Storage. This would have given an- other layer of protection. Another option would have been to store the database on an OpenStack Block Storage volume and backing it up just as any other Block Storage. Architecture Guide March 17, 2015 current 179 8. Massively scalable Table of Contents User requirements .............................................................................. 180 Technical considerations ..................................................................... 183 Operational considerations ................................................................. 186 A massively scalable architecture is defined as a cloud implementation that is either a very large deployment, such as one that would be built by a commercial service provider, or one that has the capability to support us- er requests for large amounts of cloud resources. An example would be an infrastructure in which requests to service 500 instances or more at a time is not uncommon. In a massively scalable infrastructure, such a request is fulfilled without completely consuming all of the available cloud infrastruc- ture resources. While the high capital cost of implementing such a cloud ar- chitecture makes it cost prohibitive and is only spearheaded by few organi- zations, many organizations are planning for massive scalability moving to- ward the future. A massively scalable OpenStack cloud design presents a unique set of chal- lenges and considerations. For the most part it is similar to a general pur- pose cloud architecture, as it is built to address a non-specific range of po- tential use cases or functions. Typically, it is rare that massively scalable clouds are designed or specialized for particular workloads. Like the gener- al purpose cloud, the massively scalable cloud is most often built as a plat- form for a variety of workloads. Massively scalable OpenStack clouds are generally built as commercial public cloud offerings since single private or- ganizations rarely have the resources or need for this scale. Services provided by a massively scalable OpenStack cloud will include: • Virtual-machine disk image library • Raw block storage • File or object storage • Firewall functionality • Load balancing functionality Architecture Guide March 17, 2015 current 180 • Private (non-routable) and public (floating) IP addresses • Virtualized network topologies • Software bundles • Virtual compute resources Like a general purpose cloud, the instances deployed in a massively scal- able OpenStack cloud will not necessarily use any specific aspect of the cloud offering (compute, network, or storage). As the cloud grows in scale, the scale of the number of workloads can cause stress on all of the cloud components. Additional stresses are introduced to supporting infrastruc- ture including databases and message brokers. The architecture design for such a cloud must account for these performance pressures without nega- tively impacting user experience. User requirements More so than other scenarios, defining user requirements for a massively scalable OpenStack design architecture dictates approaching the design from two different, yet sometimes opposing, perspectives: the cloud us- er, and the cloud operator. The expectations and perceptions of the con- sumption and management of resources of a massively scalable OpenStack cloud from the user point of view is distinctly different from that of the cloud operator. Many jurisdictions have legislative and regulatory requirements governing the storage and management of data in cloud environments. Common ar- eas of regulation include: • Data retention policies ensuring storage of persistent data and records management to meet data archival requirements. • Data ownership policies governing the possession and responsibility for data. • Data sovereignty policies governing the storage of data in foreign coun- tries or otherwise separate jurisdictions. • Data compliance policies governing certain types of information needs to reside in certain locations due to regular issues and, more important- ly, cannot reside in other locations for the same reason. Examples of such legal frameworks include the data protection framework of the European Union and the requirements of the Financial Industry Reg- Architecture Guide March 17, 2015 current 181 ulatory Authority in the United States. Consult a local regulatory body for more information. User requirements Massively scalable OpenStack clouds have the following user requirements: • The cloud user expects repeatable, dependable, and deterministic pro- cesses for launching and deploying cloud resources. This could be deliv- ered through a web-based interface or publicly available API endpoints. All appropriate options for requesting cloud resources need to be avail- able through some type of user interface, a command-line interface (CLI), or API endpoints. • Cloud users expect a fully self-service and on-demand consumption mod- el. When an OpenStack cloud reaches the "massively scalable" size, it means it is expected to be consumed "as a service" in each and every way. • For a user of a massively scalable OpenStack public cloud, there will be no expectations for control over security, performance, or availability. Only SLAs related to uptime of API services are expected, and very basic SLAs expected of services offered. The user understands it is his or her responsibility to address these issues on their own. The exception to this expectation is the rare case of a massively scalable cloud infrastructure built for a private or government organization that has specific require- ments. As might be expected, the cloud user requirements or expectations that determine the design are all focused on the consumption model. The user expects to be able to easily consume cloud resources in an automated and deterministic way, without any need for knowledge of the capacity, scala- bility, or other attributes of the cloud's underlying infrastructure. Operator requirements Whereas the cloud user should be completely unaware of the underlying infrastructure of the cloud and its attributes, the operator must be able to build and support the infrastructure, as well as how it needs to operate at scale. This presents a very demanding set of requirements for building such a cloud from the operator's perspective: • First and foremost, everything must be capable of automation. From the deployment of new hardware, compute hardware, storage hardware, or networking hardware, to the installation and configuration of the Architecture Guide March 17, 2015 current 182 supporting software, everything must be capable of being automated. Manual processes will not suffice in a massively scalable OpenStack de- sign architecture. • The cloud operator requires that capital expenditure (CapEx) is mini- mized at all layers of the stack. Operators of massively scalable Open- Stack clouds require the use of dependable commodity hardware and freely available open source software components to reduce deploy- ment costs and operational expenses. Initiatives like OpenCompute (more information available at http://www.opencompute.org) provide additional information and pointers. To cut costs, many operators sac- rifice redundancy. For example, redundant power supplies, redundant network connections, and redundant rack switches. • Companies operating a massively scalable OpenStack cloud also require that operational expenditures (OpEx) be minimized as much as possible. It is recommended that cloud-optimized hardware is a good approach when managing operational overhead. Some of the factors that need to be considered include power, cooling, and the physical design of the chassis. It is possible to customize the hardware and systems so they are optimized for this type of workload because of the scale of these imple- mentations. • Massively scalable OpenStack clouds require extensive metering and monitoring functionality to maximize the operational efficiency by keep- ing the operator informed about the status and state of the infrastruc- ture. This includes full scale metering of the hardware and software sta- tus. A corresponding framework of logging and alerting is also required to store and allow operations to act upon the metrics provided by the metering and monitoring solution(s). The cloud operator also needs a solution that uses the data provided by the metering and monitoring so- lution to provide capacity planning and capacity trending analysis. • A massively scalable OpenStack cloud will be a multi-site cloud. There- fore, the user-operator requirements for a multi-site OpenStack architec- ture design are also applicable here. This includes various legal require- ments for data storage, data placement, and data retention; other ju- risdictional legal or compliance requirements; image consistency-avail- ability; storage replication and availability (both block and file/object storage); and authentication, authorization, and auditing (AAA), just to name a few. Refer to the Chapter 6, “Multi-site”  for more details on requirements and considerations for multi-site OpenStack clouds. • Considerations around physical facilities such as space, floor weight, rack height and type, environmental considerations, power usage and power Architecture Guide March 17, 2015 current 183 usage efficiency (PUE), and physical security must also be addressed by the design architecture of a massively scalable OpenStack cloud. Technical considerations Converting an existing OpenStack environment that was designed for a different purpose to be massively scalable is a formidable task. When building a massively scalable environment from the ground up, make sure the initial deployment is built with the same principles and choices that ap- ply as the environment grows. For example, a good approach is to deploy the first site as a multi-site environment. This allows the same deployment and segregation methods to be used as the environment grows to sepa- rate locations across dedicated links or wide area networks. In a hyperscale cloud, scale trumps redundancy. Applications must be modified with this in mind, relying on the scale and homogeneity of the environment to provide reliability rather than redundant infrastructure provided by non-commodi- ty hardware solutions. Infrastructure segregation Fortunately, OpenStack services are designed to support massive horizon- tal scale. Be aware that this is not the case for the entire supporting infras- tructure. This is particularly a problem for the database management sys- tems and message queues used by the various OpenStack services for data storage and remote procedure call communications. Traditional clustering techniques are typically used to provide high avail- ability and some additional scale for these environments. In the quest for massive scale, however, additional steps need to be taken to relieve the performance pressure on these components to prevent them from nega- tively impacting the overall performance of the environment. It is impor- tant to make sure that all the components are in balance so that, if and when the massively scalable environment fails, all the components are at, or close to, maximum capacity. Regions are used to segregate completely independent installations linked only by an Identity and Dashboard (optional) installation. Services are in- stalled with separate API endpoints for each region, complete with sepa- rate database and queue installations. This exposes some awareness of the environment's fault domains to users and gives them the ability to ensure some degree of application resiliency while also imposing the requirement to specify which region their actions must be applied to. Architecture Guide March 17, 2015 current 184 Environments operating at massive scale typically need their regions or sites subdivided further without exposing the requirement to specify the failure domain to the user. This provides the ability to further divide the in- stallation into failure domains while also providing a logical unit for main- tenance and the addition of new hardware. At hyperscale, instead of adding single compute nodes, administrators may add entire racks or even groups of racks at a time with each new addition of nodes exposed via one of the segregation concepts mentioned herein. Cells provide the ability to subdivide the compute portion of an OpenStack installation, including regions, while still exposing a single endpoint. In each region an API cell is created along with a number of compute cells where the workloads actually run. Each cell gets its own database and message queue setup (ideally clustered), providing the ability to subdivide the load on these subsystems, improving overall performance. Within each compute cell a complete compute installation is provided, complete with full database and queue installations, scheduler, conductor, and multiple compute hosts. The cells scheduler handles placement of us- er requests from the single API endpoint to a specific cell from those avail- able. The normal filter scheduler then handles placement within the cell. The downside of using cells is that they are not well supported by any of the OpenStack services other than Compute. Also, they do not adequate- ly support some relatively standard OpenStack functionality such as secu- rity groups and host aggregates. Due to their relative newness and spe- cialized use, they receive relatively little testing in the OpenStack gate. De- spite these issues, however, cells are used in some very well known Open- Stack installations operating at massive scale including those at CERN and Rackspace. Host aggregates Host aggregates enable partitioning of OpenStack Compute deployments into logical groups for load balancing and instance distribution. Host ag- gregates may also be used to further partition an availability zone. Con- sider a cloud which might use host aggregates to partition an availabili- ty zone into groups of hosts that either share common resources, such as storage and network, or have a special property, such as trusted comput- ing hardware. Host aggregates are not explicitly user-targetable; instead they are implicitly targeted via the selection of instance flavors with extra specifications that map to host aggregate metadata. Architecture Guide March 17, 2015 current 185 Availability zones Availability zones provide another mechanism for subdividing an installa- tion or region. They are, in effect, host aggregates that are exposed for (optional) explicit targeting by users. Unlike cells, they do not have their own database server or queue broker but simply represent an arbitrary grouping of compute nodes. Typically, grouping of nodes into availability zones is based on a shared failure do- main based on a physical characteristic such as a shared power source, physical network connection, and so on. Availability zones are exposed to the user because they can be targeted; however, users are not required to target them. An alternate approach is for the operator to set a default availability zone to schedule instances to other than the default availability zone of nova. Segregation example In this example the cloud is divided into two regions, one for each site, with two availability zones in each based on the power layout of the data centers. A number of host aggregates have also been defined to allow tar- geting of virtual machine instances using flavors, that require special capa- bilities shared by the target hosts such as SSDs, 10 GbE networks, or GPU cards. Architecture Guide March 17, 2015 current 186 Operational considerations In order to run at massive scale, it is important to plan on the automation of as many of the operational processes as possible. Automation includes the configuration of provisioning, monitoring and alerting systems. Part of the automation process includes the capability to determine when human intervention is required and who should act. The objective is to increase the ratio of operational staff to running systems as much as possible to re- duce maintenance costs. In a massively scaled environment, it is impossible for staff to give each system individual care. Configuration management tools such as Puppet or Chef allow operations staff to categorize systems into groups based on their role and thus create configurations and system states that are enforced through the provision- ing system. Systems that fall out of the defined state due to errors or fail- ures are quickly removed from the pool of active nodes and replaced. At large scale the resource cost of diagnosing individual systems that have failed is far greater than the cost of replacement. It is more economical to immediately replace the system with a new system that can be provisioned and configured automatically and quickly brought back into the pool of active nodes. By automating tasks that are labor-intensive, repetitive, and critical to operations with automation, cloud operations teams are able to be managed more efficiently because fewer resources are needed for these babysitting tasks. Administrators are then free to tackle tasks that cannot be easily automated and have longer-term impacts on the business such as capacity planning. The bleeding edge Running OpenStack at massive scale requires striking a balance between stability and features. For example, it might be tempting to run an older stable release branch of OpenStack to make deployments easier. However, when running at massive scale, known issues that may be of some concern or only have minimal impact in smaller deployments could become pain points at massive scale. If the issue is well known, in many cases, it may be resolved in more recent releases. The OpenStack community can help re- solve any issues reported by applying the collective expertise of the Open- Stack developers. When issues crop up, the number of organizations running at a similar scale is a relatively tiny proportion of the OpenStack community, there- fore it is important to share these issues with the community and be a vo- Architecture Guide March 17, 2015 current 187 cal advocate for resolving them. Some issues only manifest when operating at large scale and the number of organizations able to duplicate and vali- date an issue is small, so it will be important to document and dedicate re- sources to their resolution. In some cases, the resolution to the problem is ultimately to deploy a more recent version of OpenStack. Alternatively, when the issue needs to be re- solved in a production environment where rebuilding the entire environ- ment is not an option, it is possible to deploy just the more recent separate underlying components required to resolve issues or gain significant per- formance improvements. At first glance, this could be perceived as poten- tially exposing the deployment to increased risk and instability. However, in many cases it could be an issue that has not been discovered yet. It is advisable to cultivate a development and operations organization that is responsible for creating desired features, diagnose and resolve is- sues, and also build the infrastructure for large scale continuous integra- tion tests and continuous deployment. This helps catch bugs early and make deployments quicker and less painful. In addition to development resources, the recruitment of experts in the fields of message queues, databases, distributed systems, and networking, cloud and storage is also advisable. Growth and capacity planning An important consideration in running at massive scale is projecting growth and utilization trends to plan capital expenditures for the near and long term. Utilization metrics for compute, network, and storage as well as a historical record of these metrics are required. While securing ma- jor anchor tenants can lead to rapid jumps in the utilization rates of all re- sources, the steady adoption of the cloud inside an organizations or by public consumers in a public offering will also create a steady trend of in- creased utilization. Skills and training Projecting growth for storage, networking, and compute is only one as- pect of a growth plan for running OpenStack at massive scale. Growing and nurturing development and operational staff is an additional consider- ation. Sending team members to OpenStack conferences, meetup events, and encouraging active participation in the mailing lists and committees is a very important way to maintain skills and forge relationships in the com- munity. A list of OpenStack training providers in the marketplace can be found here: http://www.openstack.org/marketplace/training/. Architecture Guide March 17, 2015 current 189 9. Specialized cases Table of Contents Multi-hypervisor example ................................................................... 190 Specialized networking example ......................................................... 192 Software-defined networking ............................................................. 192 Desktop-as-a-Service ........................................................................... 195 OpenStack on OpenStack ................................................................... 197 Specialized hardware ......................................................................... 199 Although most OpenStack architecture designs fall into one of the seven major scenarios outlined in other sections (compute focused, network fo- cused, storage focused, general purpose, multi-site, hybrid cloud, and mas- sively scalable), there are a few other use cases that are unique enough they can't be neatly categorized into one of the other major sections. This section discusses some of these unique use cases with some additional de- tails and design considerations for each use case: • Specialized Networking: This describes running networking-oriented software that may involve reading packets directly from the wire or par- ticipating in routing protocols. • Software-defined networking (SDN): This use case details both running an SDN controller from within OpenStack as well as participating in a software-defined network. • Desktop-as-a-Service: This is for organizations that want to run a virtual- ized desktop environment on a cloud. This can apply to private or public clouds. • OpenStack on OpenStack: Some organizations are finding that it makes technical sense to build a multi-tiered cloud by running OpenStack on top of an OpenStack installation. • Specialized hardware: Some highly specialized situations will require the use of specialized hardware devices from within the OpenStack environ- ment. Architecture Guide March 17, 2015 current 190 Multi-hypervisor example A financial company requires a migration of its applications from a tradi- tional virtualized environment to an API driven, orchestrated environment. A number of their applications have strict support requirements which lim- it what hypervisors they are supported on, however the rest do not have such restrictions and do not need the same features. Because of these re- quirements, the overall target environment needs multiple hypervisors. The current environment consists of a vSphere environment with 20 VMware ESXi hypervisors supporting 300 instances of various sizes. Ap- proximately 50 of the instances must be run on ESXi but the rest have more flexible requirements. The company has decided to bring the management of the overall system into a common platform provided by OpenStack. The approach is to run a host aggregate consisting of KVM hypervisors for the general purpose instances and a separate host aggregate for instances requiring ESXi. This way, workloads that must be run on ESXi can be tar- Architecture Guide March 17, 2015 current 191 geted at those hypervisors, but the rest can be targeted at the KVM hyper- visors. Images in the OpenStack Image Service have particular hypervisor meta- data attached so that when a user requests a certain image, the instance will spawn on the relevant aggregate. Images for ESXi are stored in VMDK format. QEMU disk images can be converted to VMDK, VMFS Flat Disks, which includes thin, thick, zeroed-thick, and eager-zeroed-thick. Note that once a VMFS thin disk is exported from VMFS to a non-VMFS location, like the OpenStack Image Service, it becomes a preallocated flat disk. This impacts the transfer time from the OpenStack Image Service to the data store when the full preallocated flat disk, rather than the thin disk, must be transferred. This example has the additional complication that, rather than being spawned directly on a hypervisor simply by calling a specific host aggre- gate using the metadata of the image, the VMware host aggregate com- pute nodes communicate with vCenter which then requests that the in- stance be scheduled to run on an ESXi hypervisor. As of the Icehouse re- lease, this functionality requires that VMware Distributed Resource Sched- uler (DRS) be enabled on a cluster and set to "Fully Automated". Due to the DRS requirement, note that vSphere requires shared storage (DRS uses vMotion, which requires shared storage). The solution uses shared storage to provide Block Storage capabilities to the KVM instances while also providing the storage for vSphere. The environment uses a ded- icated data network to provide this functionality, therefore the compute hosts should have dedicated NICs to support this dedicated traffic. vSphere supports the use of OpenStack Block Storage to present storage from a VMFS datastore to an instance, so the use of Block Storage in this architec- ture supports both hypervisors. In this case, network connectivity is provided by OpenStack Networking with the VMware NSX plug-in driver configured. Alternatively, the sys- tem could use legacy networking (nova-network), which is supported by both hypervisors used in this design, but has limitations. Specifically, secu- rity groups are not supported on vSphere with legacy networking. With VMware NSX as part of the design, however, when a user launches an in- stance within either of the host aggregates, the instances are attached to appropriate network overlay-based logical networks as defined by the us- er. Note that care must be taken with this approach, as there are design considerations around the OpenStack Compute integration. When using vSphere with OpenStack, the nova-compute service that is configured to Architecture Guide March 17, 2015 current 192 communicate with vCenter shows up as a single large hypervisor repre- senting the entire ESXi cluster (multiple instances of nova-compute can be run to represent multiple ESXi clusters or to connect to multiple vCenter servers). If the process running the nova-compute service crashes, the con- nection to that particular vCenter Server-and any ESXi clusters behind it- are severed and it will not be possible to provision more instances on that vCenter, despite the fact that vSphere itself could be configured for high availability. Therefore, it is important to monitor the nova-compute service that connects to vSphere for any disruptions. Specialized networking example Some applications that interact with a network require more specialized connectivity. Applications such as a looking glass require the ability to con- nect to a BGP peer, or route participant applications may need to join a network at a layer 2 level. Challenges Connecting specialized network applications to their required resources al- ters the design of an OpenStack installation. Installations that rely on over- lay networks are unable to support a routing participant, and may also block layer-2 listeners. Possible solutions Deploying an OpenStack installation using OpenStack Networking with a provider network will allow direct layer-2 connectivity to an upstream net- working device. This design provides the layer-2 connectivity required to communicate via Intermediate System-to-Intermediate System (ISIS) pro- tocol or to pass packets controlled via an OpenFlow controller. Using the multiple layer-2 plug-in with an agent such as Open vSwitch would allow a private connection through a VLAN directly to a specific port in a layer-3 device. This would allow a BGP point to point link to exist that will join the autonomous system. Avoid using layer-3 plug-ins as they will divide the broadcast domain and prevent router adjacencies from forming. Software-defined networking Software-defined networking (SDN) is the separation of the data plane and control plane. SDN has become a popular method of managing and Architecture Guide March 17, 2015 current 193 controlling packet flows within networks. SDN uses overlays or directly controlled layer-2 devices to determine flow paths, and as such presents challenges to a cloud environment. Some designers may wish to run their controllers within an OpenStack installation. Others may wish to have their installations participate in an SDN-controlled network. Challenges SDN is a relatively new concept that is not yet standardized, so SDN sys- tems come in a variety of different implementations. Because of this, a tru- ly prescriptive architecture is not feasible. Instead, examine the differences between an existing or intended OpenStack design and determine where the potential conflict and gaps can be found. Possible solutions If an SDN implementation requires layer-2 access because it directly manip- ulates switches, then running an overlay network or a layer-3 agent may not be advisable. If the controller resides within an OpenStack installation, it may be necessary to build an ML2 plug-in and schedule the controller in- stances to connect to tenant VLANs that then talk directly to the switch hardware. Alternatively, depending on the external device support, use a tunnel that terminates at the switch hardware itself. Diagram OpenStack hosted SDN controller: Architecture Guide March 17, 2015 current 194 OpenStack participating in an SDN controller network: Architecture Guide March 17, 2015 current 195 Desktop-as-a-Service Virtual Desktop Infrastructure (VDI) is a service that hosts user desktop en- vironments on remote servers. This application is very sensitive to network latency and requires a high performance compute environment. Tradition- ally these types of environments have not been put on cloud environments because few clouds are built to support such a demanding workload that is so exposed to end users. Recently, as cloud environments become more Architecture Guide March 17, 2015 current 196 robust, vendors are starting to provide services that allow virtual desktops to be hosted in the cloud. In the not too distant future, OpenStack could be used as the underlying infrastructure to run a virtual infrastructure envi- ronment, either in-house or in the cloud. Challenges Designing an infrastructure that is suitable to host virtual desktops is a very different task to that of most virtual workloads. The infrastructure will need to be designed, for example: • Boot storms: What happens when hundreds or thousands of users log in during shift changes, affects the storage design. • The performance of the applications running in these virtual desktops • Operating system and compatibility with the OpenStack hypervisor Broker The connection broker is a central component of the architecture that de- termines which remote desktop host will be assigned or connected to the user. The broker is often a full-blown management product allowing for the automated deployment and provisioning of remote desktop hosts. Possible solutions There are a number of commercial products available today that provide such a broker solution but nothing that is native in the OpenStack project. Not providing a broker is also an option, but managing this manually would not suffice as a large scale, enterprise solution. Architecture Guide March 17, 2015 current 197 Diagram OpenStack on OpenStack In some cases it is necessary to run OpenStack nested on top of another OpenStack cloud. This scenario allows for complete OpenStack cloud en- vironments to be managed and provisioned on instances running on hy- pervisors and servers controlled by the underlying OpenStack cloud. Public Architecture Guide March 17, 2015 current 198 cloud providers can use this technique to effectively manage the upgrade and maintenance process on complete OpenStack-based clouds. Develop- ers and those testing OpenStack can also use the guidance to provision their own OpenStack environments on available OpenStack Compute re- sources, whether public or private. Challenges The network aspect of deploying a nested cloud is the most complicated aspect of this architecture. When using VLANs, these will need to be ex- posed to the physical ports on which the undercloud runs, as the bare met- al cloud owns all the hardware, but they also need to be exposed to the nested levels as well. Alternatively, network overlay technologies can be used on the overcloud (the OpenStack cloud running on OpenStack) to provide the required software defined networking for the deployment. Hypervisor A key question to address in this scenario is the decision about which ap- proach should be taken to provide a nested hypervisor in OpenStack. This decision influences which operating systems can be used for the deploy- ment of the nested OpenStack deployments. Possible solutions: deployment Deployment of a full stack can be challenging but this difficulty can be readily be mitigated by creating a Heat template to deploy the entire stack or a configuration management system. Once the Heat template is creat- ed, deploying additional stacks will be a trivial thing and can be performed in an automated fashion. The OpenStack-on-OpenStack project (TripleO) addresses this issue—cur- rently, however, the project does not completely cover nested stacks. For more information, see https://wiki.openstack.org/wiki/TripleO. Possible solutions: hypervisor In the case of running TripleO, the underlying OpenStack cloud deploys the Compute nodes as bare-metal. OpenStack would then be deployed on these Compute bare-metal servers with the appropriate hypervisor, such as KVM. In the case of running smaller OpenStack clouds for testing purposes, and performance would not be a critical factor, QEMU can be utilized instead. Architecture Guide March 17, 2015 current 199 It is also possible to run a KVM hypervisor in an instance (see http:// davejingtian.org/2014/03/30/nested-kvm-just-for-fun/), though this is not a supported configuration, and could be a complex solution for such a use case. Diagram Specialized hardware Certain workloads require specialized hardware devices that are either difficult to virtualize or impossible to share. Applications such as load bal- ancers, highly parallel brute force computing, and direct to wire network- ing may need capabilities that basic OpenStack components do not pro- vide. Challenges Some applications need access to hardware devices to either improve per- formance or provide capabilities that are not virtual CPU, RAM, network or storage. These can be a shared resource, such as a cryptography processor, or a dedicated resource such as a Graphics Processing Unit. OpenStack has ways of providing some of these, while others may need extra work. Solutions In order to provide cryptography offloading to a set of instances, it is possi- ble to use Image Service configuration options to assign the cryptography Architecture Guide March 17, 2015 current 200 chip to a device node in the guest. The OpenStack Command Line Refer- ence contains further information on configuring this solution in the chap- ter Image Service property keys , but it allows all guests using the config- ured images to access the hypervisor cryptography device. If direct access to a specific device is required, it can be dedicated to a sin- gle instance per hypervisor through the use of PCI pass-through. The Open- Stack administrator needs to define a flavor that specifically has the PCI de- vice in order to properly schedule instances. More information regarding PCI pass-through, including instructions for implementing and using it, is available at https://wiki.openstack.org/wiki/Pci_passthrough. Architecture Guide March 17, 2015 current 201 10. References Data Protection framework of the European Union: Guidance on Data Pro- tection laws governed by the EU. Depletion of IPv4 Addresses: describing how IPv4 addresses and the migra- tion to IPv6 is inevitable. Ethernet Switch Reliability: Research white paper on Ethernet Switch relia- bility. Financial Industry Regulatory Authority: Requirements of the Financial In- dustry Regulatory Authority in the USA. Image Service property keys: Glance API property keys allows the adminis- trator to attach custom characteristics to images. LibGuestFS Documentation: Official LibGuestFS documentation. Logging and Monitoring: Official OpenStack Operations documentation. ManageIQ Cloud Management Platform: An Open Source Cloud Manage- ment Platform for managing multiple clouds. N-Tron Network Availability: Research white paper on network availability. Nested KVM: Post on how to nest KVM under KVM. Open Compute Project: The Open Compute Project Foundation's mission is to design and enable the delivery of the most efficient server, storage and data center hardware designs for scalable computing. OpenStack Flavors: Official OpenStack documentation. OpenStack High Availability Guide: Information on how to provide redun- dancy for the OpenStack components. OpenStack Hypervisor Support Matrix: Matrix of supported hypervisors and capabilities when used with OpenStack. OpenStack Object Store (Swift) Replication Reference: Developer docu- mentation of Swift replication. OpenStack Operations Guide: The OpenStack Operations Guide provides information on setting up and installing OpenStack. Architecture Guide March 17, 2015 current 202 OpenStack Security Guide: The OpenStack Security Guide provides infor- mation on securing OpenStack deployments. OpenStack Training Marketplace: The OpenStack Market for training and Vendors providing training on OpenStack. PCI passthrough: The PCI API patches extend the servers/os-hypervisor to show PCI information for instance and compute node, and also provides a resource endpoint to show PCI information. TripleO: TripleO is a program aimed at installing, upgrading and operating OpenStack clouds using OpenStack's own cloud facilities as the foundation. Architecture Guide March 17, 2015 current 203 Appendix A. Community support Table of Contents Documentation .................................................................................. 203 ask.openstack.org .............................................................................. 205 OpenStack mailing lists ....................................................................... 205 The OpenStack wiki ........................................................................... 205 The Launchpad Bugs area .................................................................. 205 The OpenStack IRC channel ................................................................ 207 Documentation feedback ................................................................... 207 OpenStack distribution packages ........................................................ 207 The following resources are available to help you run and use OpenStack. The OpenStack community constantly improves and adds to the main fea- tures of OpenStack, but if you have any questions, do not hesitate to ask. Use the following resources to get OpenStack support, and troubleshoot your installations. Documentation For the available OpenStack documentation, see docs.openstack.org. To provide feedback on documentation, join and use the mailing list at OpenStack Documentation Mailing List, or report a bug. The following books explain how to install an OpenStack cloud and its as- sociated components: • Installation Guide for openSUSE 13.1 and SUSE Linux Enterprise Server 11 SP3 • Installation Guide for Red Hat Enterprise Linux 7, CentOS 7, and Fedora 20 • Installation Guide for Ubuntu 14.04 The following books explain how to configure and run an OpenStack cloud: Architecture Guide March 17, 2015 current 204 • Architecture Design Guide • Cloud Administrator Guide • Configuration Reference • Operations Guide • High Availability Guide • Security Guide • Virtual Machine Image Guide The following books explain how to use the OpenStack dashboard and command-line clients: • API Quick Start • End User Guide • Admin User Guide • Command-Line Interface Reference The following documentation provides reference and guidance informa- tion for the OpenStack APIs: • OpenStack API Complete Reference (HTML) • API Complete Reference (PDF) • OpenStack Block Storage Service API v2 Reference • OpenStack Compute API v2 and Extensions Reference • OpenStack Identity Service API v2.0 Reference • OpenStack Image Service API v2 Reference • OpenStack Networking API v2.0 Reference • OpenStack Object Storage API v1 Reference The Training Guides offer software training for cloud administration and management. Architecture Guide March 17, 2015 current 205 ask.openstack.org During the set up or testing of OpenStack, you might have questions about how a specific task is completed or be in a situation where a feature does not work correctly. Use the ask.openstack.org site to ask questions and get answers. When you visit the http://ask.openstack.org site, scan the recently asked questions to see whether your question has already been answered. If not, ask a new question. Be sure to give a clear, concise summary in the title and provide as much detail as possible in the descrip- tion. Paste in your command output or stack traces, links to screen shots, and any other information which might be useful. OpenStack mailing lists A great way to get answers and insights is to post your question or problematic scenario to the OpenStack mailing list. You can learn from and help others who might have similar issues. To subscribe or view the archives, go to http://lists.openstack.org/cgi-bin/mailman/listinfo/open- stack. You might be interested in the other mailing lists for specific projects or development, which you can find on the wiki. A description of all mail- ing lists is available at http://wiki.openstack.org/MailingLists. The OpenStack wiki The OpenStack wiki contains a broad range of topics but some of the in- formation can be difficult to find or is a few pages deep. Fortunately, the wiki search feature enables you to search by title or content. If you search for specific information, such as about networking or nova, you can find a large amount of relevant material. More is being added all the time, so be sure to check back often. You can find the search box in the upper-right corner of any OpenStack wiki page. The Launchpad Bugs area The OpenStack community values your set up and testing efforts and wants your feedback. To log a bug, you must sign up for a Launchpad ac- count at https://launchpad.net/+login. You can view existing bugs and report bugs in the Launchpad Bugs area. Use the search feature to deter- mine whether the bug has already been reported or already been fixed. If it still seems like your bug is unreported, fill out a bug report. Some tips: Architecture Guide March 17, 2015 current 206 • Give a clear, concise summary. • Provide as much detail as possible in the description. Paste in your com- mand output or stack traces, links to screen shots, and any other infor- mation which might be useful. • Be sure to include the software and package versions that you are using, especially if you are using a development branch, such as, "Juno release" vs git commit bc79c3ecc55929bac585d04a03475b72e06a3208. • Any deployment-specific information is helpful, such as whether you are using Ubuntu 14.04 or are performing a multi-node installation. The following Launchpad Bugs areas are available: • Bugs: OpenStack Block Storage (cinder) • Bugs: OpenStack Compute (nova) • Bugs: OpenStack Dashboard (horizon) • Bugs: OpenStack Identity (keystone) • Bugs: OpenStack Image Service (glance) • Bugs: OpenStack Networking (neutron) • Bugs: OpenStack Object Storage (swift) • Bugs: Bare Metal (ironic) • Bugs: Data Processing Service (sahara) • Bugs: Database Service (trove) • Bugs: Orchestration (heat) • Bugs: Telemetry (ceilometer) • Bugs: Queue Service (marconi) • Bugs: OpenStack API Documentation (developer.openstack.org) • Bugs: OpenStack Documentation (docs.openstack.org) Architecture Guide March 17, 2015 current 207 The OpenStack IRC channel The OpenStack community lives in the #openstack IRC channel on the Freenode network. You can hang out, ask questions, or get immediate feedback for urgent and pressing issues. To install an IRC client or use a browser-based client, go to http://webchat.freenode.net/. You can also use Colloquy (Mac OS X, http://colloquy.info/), mIRC (Windows, http://www.mirc.com/), or XChat (Linux). When you are in the IRC chan- nel and want to share code or command output, the generally accepted method is to use a Paste Bin. The OpenStack project has one at http:// paste.openstack.org. Just paste your longer amounts of text or logs in the web form and you get a URL that you can paste into the channel. The OpenStack IRC channel is #openstack on irc.freenode.net. You can find a list of all OpenStack IRC channels at https://wiki.openstack.org/wi- ki/IRC. Documentation feedback To provide feedback on documentation, join and use the mailing list at OpenStack Documentation Mailing List, or report a bug. OpenStack distribution packages The following Linux distributions provide community-supported packages for OpenStack: • Debian: http://wiki.debian.org/OpenStack • CentOS, Fedora, and Red Hat Enterprise Linux: http:// openstack.redhat.com/ • openSUSE and SUSE Linux Enterprise Server: http://en.opensuse.org/ Portal:OpenStack • Ubuntu: https://wiki.ubuntu.com/ServerTeam/CloudArchive Architecture Guide March 17, 2015 current 209 Glossary 6to4 A mechanism that allows IPv6 packets to be transmitted over an IPv4 network, providing a strategy for migrating to IPv6. Address Resolution Protocol (ARP) The protocol by which layer-3 IP addresses are resolved into layer-2 link local ad- dresses. Block Storage The OpenStack core project that enables management of volumes, volume snap- shots, and volume types. The project name of Block Storage is cinder. Border Gateway Protocol (BGP) The Border Gateway Protocol is a dynamic routing protocol that connects au- tonomous systems. Considered the backbone of the Internet, this protocol con- nects disparate networks to form a larger network. bursting The practice of utilizing a secondary environment to elastically build instances on- demand when the primary environment is resource constrained. ceilometer The project name for the Telemetry service, which is an integrated project that provides metering and measuring facilities for OpenStack. cell Provides logical partitioning of Compute resources in a child and parent relation- ship. Requests are passed from parent cells to child cells if the parent cannot pro- vide the requested resource. cinder A core OpenStack project that provides block storage services for VMs. Compute The OpenStack core project that provides compute services. The project name of Compute service is nova. content delivery network (CDN) A content delivery network is a specialized network that is used to distribute con- tent to clients, typically located close to the client for increased performance. dashboard The web-based management interface for OpenStack. An alternative name for horizon. Architecture Guide March 17, 2015 current 210 Database Service An integrated project that provide scalable and reliable Cloud Database-as-a-Ser- vice functionality for both relational and non-relational database engines. The project name of Database Service is trove. denial of service (DoS) Denial of service (DoS) is a short form for denial-of-service attack. This is a mali- cious attempt to prevent legitimate users from using a service. Desktop-as-a-Service A platform that provides a suite of desktop environments that users may log in to receive a desktop experience from any location. This may provide general use, de- velopment, or even homogeneous testing environments. east-west traffic Network traffic between servers in the same cloud or data center. See also north- south traffic. encapsulation The practice of placing one packet type within another for the purposes of ab- stracting or securing data. Examples include GRE, MPLS, or IPsec. glance A core project that provides the OpenStack Image Service. heat An integrated project that aims to orchestrate multiple cloud applications for OpenStack. Heat Orchestration Template (HOT) Heat input in the format native to OpenStack. high availability (HA) A high availability system design approach and associated service implementation ensures that a prearranged level of operational performance will be met during a contractual measurement period. High availability systems seeks to minimize sys- tem downtime and data loss. horizon OpenStack project that provides a dashboard, which is a web interface. hybrid cloud A hybrid cloud is a composition of two or more clouds (private, community or public) that remain distinct entities but are bound together, offering the benefits of multiple deployment models. Hybrid cloud can also mean the ability to con- nect colocation, managed and/or dedicated services with cloud resources. Architecture Guide March 17, 2015 current 211 IaaS Infrastructure-as-a-Service. IaaS is a provisioning model in which an organization outsources physical components of a data center, such as storage, hardware, servers, and networking components. A service provider owns the equipment and is responsible for housing, operating and maintaining it. The client typically pays on a per-use basis. IaaS is a model for providing cloud services. Image Service An OpenStack core project that provides discovery, registration, and delivery ser- vices for disk and server images. The project name of the Image Service is glance. IOPS IOPS (Input/Output Operations Per Second) are a common performance mea- surement used to benchmark computer storage devices like hard disk drives, solid state drives, and storage area networks. kernel-based VM (KVM) An OpenStack-supported hypervisor. KVM is a full virtualization solution for Linux on x86 hardware containing virtualization extensions (Intel VT or AMD-V), ARM, IBM Power, and IBM zSeries. It consists of a loadable kernel module, that pro- vides the core virtualization infrastructure and a processor specific module. keystone The project that provides OpenStack Identity services. Layer-2 network Term used in the OSI network architecture for the data link layer. The data link layer is responsible for media access control, flow control and detecting and pos- sibly correcting erros that may occur in the physical layer. Layer-3 network Term used in the OSI network architecture for the network layer. The network layer is responsible for packet forwarding including routing from one node to an- other. Networking A core OpenStack project that provides a network connectivity abstraction layer to OpenStack Compute. The project name of Networking is neutron. neutron A core OpenStack project that provides a network connectivity abstraction layer to OpenStack Compute. north-south traffic Network traffic between a user or client (north) and a server (south), or traffic in- to the cloud (south) and out of the cloud (north). See also east-west traffic. Architecture Guide March 17, 2015 current 212 nova OpenStack project that provides compute services. Object Storage The OpenStack core project that provides eventually consistent and redundant storage and retrieval of fixed digital content. The project name of OpenStack Ob- ject Storage is swift. Open vSwitch Open vSwitch is a production quality, multilayer virtual switch licensed under the open source Apache 2.0 license. It is designed to enable massive network au- tomation through programmatic extension, while still supporting standard man- agement interfaces and protocols (for example NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). OpenStack OpenStack is a cloud operating system that controls large pools of compute, stor- age, and networking resources throughout a data center, all managed through a dashboard that gives administrators control while empowering their users to pro- vision resources through a web interface. OpenStack is an open source project li- censed under the Apache License 2.0. Orchestration An integrated project that orchestrates multiple cloud applications for Open- Stack. The project name of Orchestration is heat. Platform-as-a-Service (PaaS) Provides to the consumer the ability to deploy applications through a program- ming language or tools supported by the cloud platform provider. An example of Platform-as-a-Service is an Eclipse/Java programming platform provided with no downloads required. swift An OpenStack core project that provides object storage services. Telemetry An integrated project that provides metering and measuring facilities for Open- Stack. The project name of Telemetry is ceilometer. TripleO OpenStack-on-OpenStack program. The code name for the OpenStack Deploy- ment program. trove OpenStack project that provides database services to applications. Architecture Guide March 17, 2015 current 213 Xen Xen is a hypervisor using a microkernel design, providing services that allow mul- tiple computer operating systems to execute on the same computer hardware concurrently.