Why We Chose Kubernetes Over ECS

On our last post, we saw how Docker changed the way we treat our infrastructure and what changes it brought to the domain of service orchestration.
In the following post, we’re going to take a tour of two of the leading Docker orchestration frameworks out there: ECS (Elastic Container Service) by AWS, and Kubernetes, an orchestration framework which began at Google and became open source later.

3 months ago when we, at nanit.com, came to evaluate which Docker orchestration framework to use, we gave ECS the first priority. We were already familiar with AWS services, and since we already had our whole infrastructure there, it was the default choice. After testing the service for a while we had the feeling it was not mature enough and missing some key features we needed (more on that later), so we went to test another orchestration framework: Kubernetes. We were glad to discover that Kubernetes is far more comprehensive and had almost all the features we required. For us, Kubernetes won ECS on ECS’s home court, which is AWS of course. Let’s go through the reasons which made Nanit choose K8s (short-name for Kubernetes which will be used throughout this article) as its Docker orchestration framework.

Note: Things might have changed on ECS since we made the evaluation 3 months ago. We tried to keep up with updates to the service, but We probably missed some of them. K8s was V1.0.7 and has a new V1.1.1 now.

Cluster Setup

ECS: To start an ECS cluster you have to set an Auto Scaling Group (ASG) out of an AMI Amazon has designated especially for ECS. You can edit the user-data in order to attach EC2 instances to a specific ECS cluster. After you set up the ASG and instances start, you should see them available on your ECS console. From that moment on, you can start scheduling task-definitions, which are simply ECS’s docker-compose like constructs.

K8s: To start a K8s cluster on AWS, you have to first start an EC2 instance with the proper permissions (via IAM) and run a one-liner from K8s getting started guide. This will create various AWS constructs to support your cluster: VPC, ASG for cluster instances, some Security Groups, and a K8s master instance. The cluster will take a few minutes to settle and afterward you can start running your containers on it.

Bottom line: There is no clear winner here. Setting up a cluster is user-friendly in both frameworks.

Basic Service Setup

Our task is to run a NginX Docker image containing our static web-site files and serve them to the public.

ECS: First, we have to create a public-facing ELB (Elastic Load Balancer) that will forward port 80 to port 80. Then we have to create a task-definition that will run our Docker image on port 80. The last step is to create a Service that will state how many instances of that task-definition will run simultaneously and bind it to the ELB we just created.

K8s: The first step is to create a Replication Controller which states the Docker image we want to run and how many of it to run simultaneously. After we set it up, we need to create a Service object. This will set up an ELB for us and route traffic from that ELB to the corresponding containers.

Bottom line: K8s felt a little bit more comfortable here. The procedure is slightly shorter, and you don’t have to manually setup or manage the ELB at all. It is all managed for you by K8s: when you create a service, an ELB is created for you, and when you delete a service, it is automatically deleted from AWS.

Service Discovery

When you use micro-services architecture in general, and Docker specifically, a good service discovery solution is extremely crucial. Docker containers shift between different VMs all the time and you must have a solid way to reach services both inside your cluster, and outside of it.

ECS: ECS does not provide any solution for service discovery within the cluster. The best solution I could think about back then was to set up an internal load balancer and attach each service to a load balancer. Then you can use the load balancer’s host name, which doesn’t change as long as it is up, as the endpoint for that service. Another way would be to integrate an external solution like Consul into your cluster.

K8s: I think this is one of the points where K8s really shines. K8s has a complete built-in service discovery solution. It is an add-on, so you have the option whether to use it or not, but I can’t really find any reason why not to. It works extremely well and in conjunction with the namespaces feature it is even more useful.
Simply put, when you create a K8s service with a name, let’s say redis, you can just refer to the hostname redis from everywhere inside your cluster and it will be resolved accordingly, even across different virtual machines. It is like having docker network links across all of your cluster and not only inside a specific VM.
Namespaces allow you to group services under a logical name. Let’s assume we have a production and a staging namespace, both having a redis service in them. A container on the production namespace can refer to the hostname redis and it will be resolved to the corresponding redis production service. The same goes for a container on the staging namespace, which will be routed to the staging redis service. This allows you to form isolated environments without the hassle of configuring things manually and tracking different hostnames for services on different environments. You can use the redis hostname across all of your namespaces, and trust K8s to resolve it for you.

Bottom line: We have a clear winner here. With K8s, you don’t even have to deal with service-discovery. You get it for free with zero effort.

Deployments

When we update a service we want to make sure it is 100% available, even during the deployment. Our tests included a simple NginX service with a static file. We started a load test on it, with 30 concurrent requests, and deployed a new version of the service during the load test.

During the deployments, we discovered that ECS dropped a lot more requests than K8s dropped, with a significant difference. In average, K8s dropped 0-2 requests during the tests and ECS around 9-14.

Bottom line: I was very disappointed with the numbers ECS presented. I was also disappointed by K8s numbers, but they are much better than ECS’s. I want to note here that K8s V1.1.1 should make these numbers even better, with some improvements to the rolling update mechanism, and generic performance improvements to the system.

Persistent Volumes

We often need to permanently attach a filesystem to a certain container. MySQL database is a classic example for this.

ECS: ECS sticks to the native solution Docker offers – you can run a data container and use volumes-from to mount it into your container. Taking MySQL as an example: you first set a mysql-data container which only declares a volume and immediately exits. Then you set a second mysql-db container which uses volumes-from mysql-data to mount the volume. While this solution works well, it is only host-specific, meaning your mysql-db container cannot move between hosts. You have to specify which host you would like your mysql-db container to run on, to prevent it from being re-scheduled to another host, losing its persistent storage.

K8s: In addition to mounting volumes from a specific host, K8s offers an option to mount an EBS (Elastic Book Store) volume into a container. It means a container’s persistent storage can move with it across different VMs. You don’t have to enforce your MySQL container to run on a specific VM just because you need persistent storage for it.
An important note on this feature: It is good only for services which have 1 instance (container) of them. This is due to the limitation of EBS which can be mounted only by one VM at a time. If you have a service which has two containers running across two different VMs, they won’t be able to mount and share the EBS.

Bottom line: K8s’s feature of EBS mounting is unique and very useful. Even if it is quite limited, it is very comfortable to have it at hand.

Health-Checks

Ensuring we have a certain capacity of a service is the core idea behind High Availability and Redundancy. Health checks are our way to make sure the service is not only running but also healthy and operates well.

ECS: ECS uses ELB (Elastic Load Balancer) health checks which have three major disadvantages:
1. ELB health checks are limited to HTTP/TCP checks.
2. If you want to perform a health check on a service which does not listen on a TCP port, well, you can’t. You have to run an HTTP/TCP server just to be able to respond to health checks.
3. Even if you have a service which already has an HTTP/TCP server, but does not need an ELB, you have to create an ELB and bind it to it, just to be able to run a health check on it.

K8s: K8s offers, in addition to HTTP GET and TCP Socket health checks, another type of health checks which is called Exec. Exec is a health check that allows you to run a command inside the container. If that command exits with a 0 exit status it means the service is healthy, otherwise the service is considered unhealthy and will be replaced by a new instance.

Bottom line: K8s health checks are much more flexible and easy to set up. You never have to set up redundant TCP/HTTP servers just to perform health checks and you can perform health checks to services even if they’re not bound to an ELB.

Ports Management

On the last post, we saw that ports management is a little bit harder in the Docker world. I want to take the simplest example to explain how K8s solves the problem of ports management better than ECS:
We have a cluster of 1 VM and two websites which we would like to serve on port 80. We can’t really have two processes on port 80 on a single VM so we have to find a way around it.

ECS: On ECS, we have to manually make sure the two services don’t use the same port. We have a single VM so we can only run 1 container that requires port 80. When we try to run the second one we won’t be able since we have no VM with port 80 available. Practically, the amount of instances of all containers, across all services, that require port X is limited by the number of VMs you have on the cluster. Fulfilling this demand is easy on a small cluster with a few services, but when your service count grows it becomes a real headache. You always have to make sure you don’t get out of available ports when you scale up the amount of containers of a service.

K8s: K8s solves this very elegantly. It assigns a random port for each of the containers on the single VM. It then creates two ELBs – one directs port 80 to container-A-random-port, and one directs port 80 to container-B-random-port. An internal routing mechanism will make sure each of the random ports designated to the containers will reach its destination.

Bottom line: K8s frees you from the headache of port management by using “virtual” ports instead of binding the original ports on the VM.

Logging

What system does not require logging? Right, none.
I never considered logging a hard problem, but it is always nice to have problems, even if they are easy ones, solved for you. We already mentioned K8s add-ons which provide us with a solution for service discovery. This time, I would like to mention the logging add-on. It has two different logging and metric collection mechanisms: The first one is the famous ELK which collects all the logs from your Docker containers and allows you to query and visualize them with the awesome Kibana interface. The second is InfluxDB with Grafana as a visualization tool to inquire system metrics like CPU and memory usage.

Bottom line: K8s add-ons are just great. Sure, you can live without them, but why would you if they work great as they do, and fit for 99% of the use cases. ECS provides no built-in solution for logging. It isn’t the hardest thing on earth to integrate one, but there is no comparison to the logging abilities K8s brings with it.

Cloud Agnostic

There’s no real competition between the two here 🙂
ECS will always be exclusive to AWS. If you build your infrastructure around ECS and would like to move to another cloud provider in a year, you will have a hard time.

Kubernetes is cloud agnostic. You can run your cluster on AWS, Google Cloud, Microsoft’s Azure, Rackspace etc, and it should run more or less the same. I am saying “more or less” because some features are available on certain cloud providers and some are not. You will still have to make sure your new cloud provider is being supported with the features you use in K8s, but at least such a move is possible.

OSS

Kubernetes, in contrary to ECS, is an Open Source project. It means that everything, from source code through bugs and issues to the roadmap and future versions is publicly available to you.
Found a bug? You can open an issue or even submit a Pull Request to fix it. Issues get a response in a matter of hours from the time of submission.
New features are being added with each version released. The amount of contributors and pull requests is staggering.

ECS has a different nature. I couldn’t find its roadmap anywhere on the net. A list of bugs and issues is not available and you have to dig the forums in order to get some answers. When you find an issue the response is sometimes lacking with no real will to help (see here). Maybe I just had a personal bad experience but it was very disappointing nonetheless.

Bottom line: Personally, I am very fond of OSS. I like the openness of K8s and the fact that everyone can contribute and participate in discussions. I strongly believe in the power of the community to make a great and stable product.

Multi-AZ

There is one thing that really bothers me regarding Kubernetes: It currently has no support for multiple availability-zones clusters on AWS. It means all of your EC2 instances live on a single AZ which makes your cluster vulnerable to outages in a single AWS AZ.

ECS has a full support of cross-AZ clusters.

Bottom line: Like everything in K8s, there is an open issue on the topic and some work has already been done. I am more than sure that a solution will be integrated into the next versions. So ECS wins here but not for long 🙂

Summary

As many companies are starting to use Docker as their main infrastructure and delivery mechanism, orchestration frameworks become the heart of the system and affects the way we develop, ship, run and update software. When I came to evaluate ECS and K8s I couldn’t find any article on the topic. I think it is extremely important to make our experience public so that others have a better starting point than we did.

For us at nanit.com, Kubernetes was a clear winner. I would be more than happy to hear if it wasn’t the case for you, and the reasons for it.

 

 

32 thoughts on “Why We Chose Kubernetes Over ECS

  1. Nice article. The last comment was a year ago. Has ECS mitigated the shortcomings that led to the Kubernetes choice?

    1. I did not have the chance to re-evaluate ECS.
      Since this article we’ve built our infrastructure on Kubernetes and very glad with our decision.

  2. I’ve been looking at OpenShift.
    It seems to provide some ease of use features and a better ui.
    Did you consider it or any full Paas? Any thoughts on OpenShift or any full Paas?

    Thanks!

    1. OpenShift is a PaaS platform which is different from container orchestration.
      Latest version of Red Hat’s OpenShift is powered by Kubernetes underneath so you are using Kubernetes indirectly if you build your app on top of OpenShift.

  3. Good job, really nice to see this compare!! I wish I’ve read it earlier, as we just moved/setup everything in ECS. Well it was a good exercise :0 and your mentioned limitations are still very limiting.

    Questions:
    1. Is there a way in Kubernetes to do load-balancing of internet traffic without ELB? ELB’s are pretty expensive in AWS.
    2. How does Kubernetes handle Docker repository integration, like ECR in case of ECS? ECR has drawbacks (see below) but provides unlimited amount of private repo’s (you pay per GB and outbound traffic instead).
    3. What is the compare in case of container memory configuration: ECS needs to be provisioned with an absolute magic/guessed number of MB’s in order to find out when to scale to other instances. What options you have on K8s?

    Maybe good to say, ECR is pretty underdeveloped with regards to usability. For example, it only supports one tag, there are no timestamps, random ordering, and a maximum of 1000 images per repo (now, try to write a script for removing old images…you can’t!).

    Thanks!

    1. I am glad you found this post useful 🙂
      I’ll try to give the questions my best shot:
      1. AFAIK you can route internet traffic to your cluster only via ELBs. There is a new kind of resource called Ingress controller (http://kubernetes.io/docs/user-guide/ingress/) but I haven’t experienced with it enough to say how well it works and if it does or doesn’t use ELBs behind the scenes.
      2. Kubernetes works well with private docker repos. All you have to do is to state the repository host prefixed to your image name as the container image. It works exactly as docker pull works. Kubernetes also has a designated field for managing private repositories access keys to make life even easier.
      3. Kubernetes has two kinds of container memory parameters: memory request and memory limit. Memory request is used to schedule pods into nodes by examining how much memory a node has already been requested. Memory limit is used as the –memory flag of the docker run command which means your container might be killed if it tries to use more memory than assigned.

      I hope that answers your question.

      1. To add to your comment
        1. I have used HA Proxy load balancing on k8
        2. You can build your private docker repo, end of the day kubernetes can talk to any registry that implements docker api
        3) You have horizontal pod autoscaler based on CPU metrics and the custom autoscaler is under active development

  4. I am glad to see that by reading this post 7 months after it is published I realized that Kubernetes is fixing a lot of its problems, such as multi-AZ and it keeps getting easier to deploy and use. Thanks for sharing your experiences.

    I also noticed a few minor things to correct, I hope you do not mind. 🙂

    Elastic Book Store –> Elastic Block Store
    NginX –> NGINX

  5. Nice article. How does K8 help with docker (container) security vs. ECS? Also, what is the best approach to monitor/troubleshoot traffic issues through the 2 ELBs (created for port management)? Use the open source ELK tools you mentioned in the article?

    1. Hi Steve, glad you found the article helpful.
      When it comes to security I really can’t testify for or against any platform since it wasn’t one of my concerns when I assessed them.
      Af for monitoring, K8s comes with both influx+grafana and ELK+kibana systems as cluster addons which works out of the box.
      Of course you can get more insights by sending custom metrics to influx or outputting JSON formatted logs so they’ll be indexed for easier searching on ELK.

Leave a comment