Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal - Operational Intent for Docker Deployment #11187

Open
jainvipin opened this issue Mar 6, 2015 · 9 comments
Open

Proposal - Operational Intent for Docker Deployment #11187

jainvipin opened this issue Mar 6, 2015 · 9 comments
Labels
kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/1-design-review

Comments

@jainvipin
Copy link

Proposal - Operational Intent for Docker Deployment

Authors: @jainvipin, @tgraf, @dvorkinista

Introduction

Problem Statement

Containers consume and interact with a variety of software and hardware resources, consumption of these resources require an operational-intent to allow desired instantiation of policies. For example:

  • Application Prioritization: In a deployment, all containers or compositions labeled as production require better prioritization of resource acquisition. This can apply to compute, network and storage resources and can be further extended to grouping behaviors across collections of hosts. The operational intent specifies what constitutes high priority, which can vary from one deployment to another.
  • Implicit-deny rule and its scope: this could be an operational intent with cluster-wide scope, that requires leveraging infrastructure in a way that must disallow any communication between apps except what is allowed by a composition. For example, in a composition web -> app ->db, the intention is to disallow web container to communicate directly with db tier, or another container to allow communicating with app or db tier.
  • An operational intent stating that all communication between certain apps are always required to be encrypted

Operational-intent is a high level specification that abstracts out the specifics to achieve the implementation of the intent. Ultimately, the intent realization depends on the infrastructure capabilities such as operating system capabilities, networking topology, and hardware capabilities of infrastructure elements like compute, storage, and network.

Operational-Intent and Docker-Compose

Docker compose, captures the orchestration-intent, whereas operational-intent describes the constraints, governance, rules and policies that must be observed in order to achieve the orchestration-intent.

Operational-Intent and Docker-Swarm

Docker Swarm, schedules containers in a multi-host cluster, with pluggable backends. The operational-intent can be leveraged by the scheduling backend to achieve better utilization of not just compute, but also network and storage resources.

The Goal - Highly abstracted intent with separation of concern

The goal is to provide a highly abstracted intent based model alongside existing application composition tools to capture operational constraints and governance rules decoupled from any underlying infrastructure or operating system specifics and correlate it to application developer intent. No authority involved should require intimate knowledge outside of its own concern context. For example, an operator must be able to impose rules affecting application developer intent without understanding the details of the application itself. For Example, while the operational constraint use an encrypted VXLAN tunnel affects the inter communication between containers, the application developer must not be aware of its existence, the operator defines how this intent is achieved given the infrastructure present.

Separating this concern and decoupling operational constraints will allow to develop and test an application in a small scale development environment with all application developer intent captured and then move the application containers to an integration and production environment without requiring any change to the application intent. All operational and governance constraints required to run in the production environment should be imposed independently alongside the existing application intent. It will also allow to move an application from one production environment to another while preserving the original intent, the intent can be translated to whatever infrastructure configuration is required in the new environment.

Operational-Intent Realization

Following diagram describes the functional elements desired in realizing the operational intent

opspolicymodel

Intent Specification

  • Application Developer Intent and Profile: This specifies application characteristics, either specified or derived from monitoring/profiling. This intent is specified in the form of metadata/label to be associated with an application, an application-composition, or an instantiation thereof. Any intent defined on this level is unaware of underlying infrastructure capabilities or location of other applications. It assumes unlimited resource availability and the ability of the network to connect anyone with anything, or a storage capabilities as desired by the application.

Examples: An application-composition describing itself as production launch, an application requiring block-io access privileges, an application running as a backup service.

  • Operational constraints: Rules required to run applications in a particular environment. Rules may apply to a specific application or to a group of applications categorized and identified using labels. Operational constraints, takes precedence over application intent, when conflicting. However they may be overruled by governance policy and typically take state into account.

Example: Application cache must use SSD storage, application accounting must use storage with backup, network packets to web must be load balanced, all network overlays must use VXLAN encapsulation on port 8472, any application labeled network latency sensitive should do QoS as follows, an application labelled net-encrypt should be spawned on a host with hardware encryption capabilities, ...

  • Governance policy: High level constraints required to fulfill governance requirements of the environment. These policies, typically does not refer to particular applications but refer to a category of workloads grouped with labels. Governance policy is not a constraint, but a desire that always take precedence over any other intent.

Examples: An encryption must use AES or stronger, a production tier labeled production must get higher compute/network/storage precedence than development environment with the label devel, applications from multiple tenants must be isolated from each other with encryption applied, mirror all packets of a specific tenant for monitoring/analysis, ...

  • Infrastructure Capabilities: These are the capabilities of the underlying infrastructure. Constraints imposed by the infrastructure itself received as feedback or capabilities of operating system and hardware.

Examples: host2 is over capacity and should not spawn any additional containers, host3 is capable of providing SR-IOV acceleration, network infrastructure between host1 and host2 is capable of providing multi-destination network, ability to do native hardware encryption is available on host8, infrastructure has ability to allow multiple classes/priority of applications within the infrastructure, ...

Scope of Intent

The scope of intent varies and can be described at the following levels:

  • Global: Applies to all applications within the intent universe.e.g.
    1. Disallow any communication unless otherwise allowed explicitly i.e. implicit deny
    2. Use VXLAN overlay to interconnect applications
    3. Always encrypt the data
  • Specific to a tier of applications: e.g. a structural composition defined via Docker compose
    1. A production deployment of a composition must be given higher network, storage, and compute priority over a test deployment of the same composition
    2. A composition identified by a label must use encrypted network and storage
  • Specific to arbitrary group of containers:
    Examples:
    1. A set of production applications tagged according to labels, e.g. applications tagged as high-priority be given higher priority over other applications
    2. A specified named list of containers must be given a low latency db access

Conflict Resolution

In the event that multiple authorities specify intent which conflict, a set of rules exist which allow for the resolution of conflicts:

  1. Governance rules always take precedence over any other type of intent.
    Examples: The infrastructure owner specifying that encryption is required overwrites any operational constraint which does not specify encryption, the infrastructure owner may make an exception to an operational constraint for debugging purposes.
  2. Operational constraints may be orthogonal to the application developer intent, and may not conflict with it. However, in situations where operational constraints specify the exceptions to the application developers intent, the operational constraint will always take precedence over application developer intent.
    Examples: The operator disallowing application "accounting" to talk to "database" will always overwrite the application developer request, the operational constraint "maximum 1GB memory for proxy" overwrites the application intent of allocating 2GB worth of memory.
  3. In the event of a conflict on the same level, intent with a more specific scope takes precedence over intent with less specific scope.
    Examples: Application with label priority may consume 1GB of memory overwrites the generic intent "All applications may use up to 500MB of memory".

It should be noted that certain conflicts eventually can not be resolved automatically and require additional clarification from intent authorities. In such an event the conflict is reported to monitoring facilities. Automatically resolved conflicts are also reported to provide a feedback loop and enable validation of correct intent enforcement.

Implementation

Following are the functional elements required to implement the operational-intent.

Intent Specification

The intent specification is a set of logical, intuitive YAML or Json specifications that describes the arbitrary attributes that are generically categorized. The implementation of intent attributes require translating the intent to concrete data as discussed further below. For example, a high priority job may mean different set of resources in different environments, yet the logical intent remains the same.

The intent specification is consumed by a tool, similar to Docker compose, and is responsible for validating the intent specification, its format and communicate the intent to intent distribution logic.

The intent specification is extensible and customizable.

Intent Distribution

The scope of the operational-intent is expected to be at the cluster level. The distribution logic, implemented as a logically centralized entity, acts upon elements that have cluster-wide significance and is responsible for translating the logical intent to a set of instructions to the backend-drivers within the infrastructure to achieve the desired result. The intent distribution logic produces metadata to be consumed by the drivers, in an interoperable and extensible way. It utilizes a state distribution mechanism, like libpack, or equivalent in a multi-host cluster environment to distribute the state.

Intent Realization

The extension drivers or scheduler backends, for example a swarm-backend, or network-extensions, storage extensions, are expected to implement the logic and become the enforcer of the network rules/policy/governance.

The drivers consume the metadata produced by intent-distribution logic and outputs operational state back into the distributed store. While, the desired state, as was requested by the Intent distribution logic, may differ from the operational state, the goal is to always get to the desired state as indicated in the intent. The operational state distributed by the drivers is expected to be used for improving future scheduling, monitoring, etc.

The drivers also publish the capabilities to in order to be consumed by intent distribution logic. The specification of the driver and scheduler backend APIs is documented in a separate proposal (#11188).

Scheduler Integration

As documented in the driver APIs (#11188), the capabilities are captured generically and are acted upon by schedulers largely as opaque information. Thus, scheduler implementation is not expected to change as new capabilities are discovered/described by the drives and consequently specified in operational-intent.

The scheduler can use the information for placement of the jobs, or in cases choose to not schedule a job. For example, when a capability 'network-encryption' is required by an application, and various drivers exposing this capability are on the hosts that are unavailable to take an additional job.

An end-to-end example

Please note: this is just an example of an infrastructure resource utilization policy and does not represent the only typical use case.

1. Specify the application intent (in docker-compose)

group.yml specifies the application composition, as mentioned in docker-compose

$ docker-compose up -label="production"

Alternatively a label can be specified in the YAML composition

2. Define an operational policy (json format here, but could be YAML):

The policy described here is an example policy, and the attributes names are not expected to be known apriori to scheduler or other modules other than drivers and operational-intent distribution logic. A typical policy definition consists of a discrete set of network policies, storage policies and compute policies. Ultimately a combination of these policies gets used in the global policies, which are used as attach points to applications or application-composition.

$ cat policies.json
{
    “Policies” : [
        {
            “Name” : “HighPriority”,
            “Labels” : “production, high-priority”,
            “NetworkPolicy” : “HighPriority”,
            “StoragePolicy” : “HighPriority”,
            “ComputePolicy” : “BestEffort”
        },
        {
            “Name” : “ImplicitDeny”,
            “Labels” : “all”
        },
        {
            “Name” : “HighPerformanceStorage”,
            “Labels” : “production, interactive”,
            “StoragePolicy” : “LowLatency”
        }
    ],
    “NetworkPolicies” : [
        {
            “Name” : “HighPriority”,
            “BufferResources” : “60percent”,
            “BandwidthAllocation” : “60percent”,
            “Latency” : “BestEffort”,
            “SchedulingMethod” : “WRR”
        },
        {
            “Name” : “BestEffort”
        }
    ],
    “StoragePolicies” : [
        {
            “Name” : “HighPriority”,
            “BufferResources” : “60percent”,
            “BandwidthAllocation” : “60percent”
        },
        {
            “Name” : “LowLatency”,
            “MaxLatency” : “30ms”
        },
        {
            “Name” : “BestEffort”
        }
    ],
    “ComputePolicies” : [
        {
            “Name” : “BestEffort”,
        }
    ]
}

Few things to observe in the intent specification above:

  • Labels are the glue points to application needs
  • Scheduler doesn not need to understand the attributes, it acts on opaquely to capabilities.
  • The attribute names are customizable
  • Capabilities are exposed by the drivers (this depends on driver implementation and underlying software/hardware infrastructure)
  • The realization of operational-intent to drivers is done on a logically centralized entity but is not expected to be tightly coupled to the scheduler

3. Instantiate the operational policy

$ sudo docker-deploy policies.json

Extending and customizing attributes

All attribute values in the proposal are expressed as strings, therefore extending or customizing the specification is easily done by specifying the attribute value to "mydomain.com/attribute_value_xyz" which of course is going to be interpreted by specific backends that implement such attributes. While the top order attributes are always consistent and uniform, additional value-adds are possible for the differentiation.

Proposed Attributes in an operational policy

Subjected to community discussion and conclusion, the proposal here speaks to a simple top level hierarchy, representing network, storage and compute attributes. And a way to combine these policies in an arbitrary way to create global level policies. Following list recommends as starting point of proposed standard attributes:

Policy Type Attribute Values Description Default
Network Name Name of the network policy Nil
BufferClass Guaranteed Percentage bandwidth allocation in %age e.g. "60percent" "besteffort"
Latency "low", "besteffort" low latency is better, "best effort" is non guaranteed "besteffort"
SchedulingMethod "wrr", "strict" strict allows absolute prioritization, "wrr" = weighted round robin "wrr"
Encryption "none", "aes" Encryption method used for traffic originated from the associated containers "none"
Bandwidth "high", "besteffort" "high" means higher bandwidth allocation within infrastructure "besteffort"
Service Service containers e.g. "proxy", "lb", "fw", etc. be consumed by containers applied with this policy Nil
AccessPolicy "none", "composition" "none" allows all apps to talk to one another, "compose" only allows apps permitted by application-composition specifiation to communicate with each other "none"
Storage Name Name of the storage policy Nil
BufferClass Guaranteed percentage buffering for storage traffic in %age e.g. "50percent" "besteffort"
Latency "low", "besteffort" "low" would result into ssd or memory based storage "besteffort"
IOPs ' e.g. 1000/s "besteffort"
Throughput Percentage bandwidth allocation for this class of storage traffic e.g. "40percent" "besteffort"
Encryption "none", "yes" Suggests whether data for the associated containers be encrypted. The specifics of the algorithm is TBD
Compute Name Name of the compute policy Nil
These attribute are the constraints being discussed in 'swarm' e.g. os-type, os-version. And host resources e.g. "memory", "cpu"

Open Items

  • Continue finalizing the set of attributes for first-cut implementation
  • Policy Hierarchy and applicability scope
    • The proposed method offers a simple hierarchy, however the relationship and scope of a complex sub-system is best represented in the form of hierarchy e.g. a tree of policies, where each node, or its subtree can be the applicable scope for the policy. In such case, policy determination and conflict resolution can start with the most specific leaf node moving towards the root of the tree.
    • Predicates to policy application: Within hierarchy, a policy at each node in a tree or a graph can be defined as various predicates, like 'override', 'augment', 'exception', etc. The model described here can extend specifying some simple predicates to structure the operational-intent
  • Additional Use cases (that will drive standardization of more attributes):
    • Using services (e.g. load balancer) to interconnect app tiers
    • Leveraging specific hardware acceleration capabilities
    • TLS authentication between applications
    • Use of SELinux for inter-app security
@gourao
Copy link
Contributor

gourao commented Mar 6, 2015

+1
We are working on storage interfaces around docker and have a mailing list at https://groups.google.com/forum/#!forum/sifc

We also see the need for the app developers and the IT admins intent to be passed all the way through till the execution point. I think this is a great start. There are more attributes we would need on the storage side, and I'll articulate them after we get some consensus on what they could be.

For what we have in our prototype product right now, we have been passing the intent around in a separate channel (we don't like this since its yet something else someone has to maintain).

FWIW, I am sharing the intent parameters we currently use (and by no means is this complete or even the right set... but it's what we use in the prototype)

{
  "apiName": "composition",
  "apiVersion": "v1beta1",
  "applicationName": "helloWorld",
  "registryURL": "https://registry.hub.docker.com/v1/search",
  "containers": [
    {
      "name": "redis",
      "image": "dockerfile/redis",
      "exposedPorts": [
        6379, 6378
      ],
      "memoryLimit": "8GB",
      "cpuNiceLevel": "-20",
      "highlyAvailable": "yes",
      "storage": {
        "filesystem": {
          "type": "ext4",
          "initialFormat": "yes",
          "blockSize": "4k"
        },
        "erasureCoded": "yes",
        "flashTiering": "yes",
        "dedupe": "yes",
        "minVolumeSize": "5TB",
        "thinVolumeSize": "20TB",
        "snapshots": {
          "enabled": "yes",
          "interval": "1H",
          "queue": 24
        }
      }
    },
    {
      "name": "nginx",
      "image": "dockerfile/nginx",
      "exposedPorts": [
        80, 443
      ],
      "scale": 5,
      "storage": {
        "filesystem": {
          "type": "xfs",
          "initialFormat": "yes",
          "hint": "ephemeral",
          "blockSize": "8k"
        },
        "erasureCoded": "no",
        "flashTiering": "no",
        "dedupe": "no",
        "minVolumeSize": "1MB",
        "thinVolumeSize": "20TB",
        "snapshots": {
          "enabled": "no"
        }
      }
    }
  ],
  "applicationProperties": {
    "admin": {
      "allowedUsers": [
        "admin1", "admin2"
      ]
    }
  }
}

@crosbymichael
Copy link
Contributor

ping @vieux @aluzzardi

@jainvipin
Copy link
Author

@gourao - good to see that it fits how you see things coming out. Standardizing attributes would not only validate the needed use cases, but also create interoperable implementations for schedulers, drivers, policy-instantiation logic, etc..

Few comments regarding the format:

  • The attributes are easy to consume (by other entities like scheduler) if the hierarchy is not exposed natively - that way it is extensible too, everything looks like strings to schedulers, etc. In your example, say if we were to keep the filesystem type as:
    "filesystem/type": "ext4"
    where filesystem/type could be of type buckets as mentioned in the drivers API doc. Then, it would allow drivers to advertise the filesystem/type set supported by the driver, and scheduler could match the app (container) requirement based on the such availability of the infrastructure resources.
  • Another comment I have in your definition of intent is that there are multiple elements in the intent here, as described above:
    1. Capabilities: e.g. filesystem/type as exposed by driver and consumed by scheduler, distribution logic
    2. Constraints/Governance e.g. "dedupe": "yes",

Both of the above can be expressed in a generic way that can be delivered to drivers in opaque way, but meaningful to the level needed for schedulers, etc.

      "storage": {
        "filesystem": {
          "type": "ext4",
          "initialFormat": "yes",
          "blockSize": "4k"
        },
        "erasureCoded": "yes",
        "flashTiering": "yes",
        "dedupe": "yes",
        "minVolumeSize": "5TB",
        "thinVolumeSize": "20TB",
        "snapshots": {
          "enabled": "yes",
          "interval": "1H",
          "queue": 24
        }
      }

@aanm
Copy link
Contributor

aanm commented Mar 17, 2015

I've started implementing labels on compose.
docker/compose#1124

@joeswaminathan
Copy link

@jainvipin

The scope of intent varies and can be described at the following levels:
Global: Applies to all applications within the intent universe.e.g.
Disallow any communication unless otherwise allowed explicitly i.e. implicit deny
Use VXLAN overlay to interconnect applications
Always encrypt the data
....
Example: Application cache must use SSD storage, application accounting must use storage with backup, network packets to web must be load balanced, all network overlays must use VXLAN encapsulation on port 8472, any application labeled network latency sensitive should do QoS as follows, an application labelled net-encrypt should be spawned on a host with hardware encryption capabilities, ...

Should the intent specify a specific technology. For example instead of saying "use VXLAN overlay" can it just say, "isolate application traffic". Because VXLAN implies switching. There could be a device that achieves isolation without overlay network, simply based on firewall rules. Amazon puts multiple tenants on the same subnet, but isolates tenants by firewall rules.

Similarly instead of saying "must use SSD", can it say "the io latency should be below x units". Because SSD is today's technology and tomorrow there might be a better technology available.

If we could avoid any technology references, let the enforcer translate into appropriate technologies it would be nice in my opinion.

@jainvipin
Copy link
Author

@joeswaminathan - valid points... intent needs to be abstract. Let us put out some first implementation version out in order to get some feedback and improve upon this. I am sure this is going to be a iterative process.

@jeremyeder
Copy link

Hi @jainvipin any further thoughts about this ? What is the status? /cc @tgraf

@tgraf
Copy link

tgraf commented Jun 3, 2015

@jeremyeder : Working on various implementation pieces around this idea based on labels and the new libnetwork API. Will be able to share a first preview of the implementation pretty soon.

@GordonTheTurtle
Copy link

GordonTheTurtle commented Sep 3, 2016

@jainvipin It has been detected that this issue has not received any activity in over 1 year. Can you please let us know if it is still relevant:

  • For a bug: do you still experience the issue with the latest version?
  • For a feature request: was your request appropriately answered in a later version?

Thank you!

@bsousaa bsousaa added status/1-design-review kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny labels Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny status/1-design-review
Projects
None yet
Development

No branches or pull requests

9 participants