A Simple Way To Dockerize Applications

Dockerizing an application is the process of converting an application to run within a Docker container. While dockerizing most applications is straight-forward, there are a few problems that need to be worked around each time.

Two common problems that occur during dockerization are:

  1. Making an application use environment variables when it relies on configuration files
  2. Sending application logs to STDOUT/STDERR when it defaults to files in the container’s file system

This post introduces a new tool: dockerize that simplifies these two common dockerization issues.

Read On →

Squashing Docker Images

A common problem when building docker images is that they can get big quickly. A base image can be a tens to hundreds of MB in size. Installing a few packages and running a build can easily create a image that is a 1GB or larger. If you build an application in your container, build artifacts can stick around and end up getting deployed.

Large images are problematic when you start publishing images to a registry. More layers creates more requests and larger layers take longer to transfer. Unfortunately, deleting things in later layers does not actually remove them from the image due to the way AUFS layers work.

There are a few options to address this problem but this post will show you how you can squash your images to make them smaller without requiring big changes to your development and deployment workflow.

Read On →

Docker Service Discovery Using Etcd and Haproxy

In a previous post, I showed a way to create an automated nginx reverse proxy for docker containers running on the same host. That setup works fine for front-end web apps, but is not ideal for backend services since they are typically spread across multiple hosts.

This post describes a solution to the backend service problem using service discovery for docker containers.

The architecture we’ll build is modelled after SmartStack, but uses etcd instead Zookeeper and two docker containers running docker-gen and haproxy instead of nerve and synapse .

Read On →

Automated Nginx Reverse Proxy for Docker

A reverse proxy server is a server that typically sits in front of other web servers in order to provide additional functionality that the web servers may not provide themselves.

For example, a reverse proxy can provide SSL termination, load balancing, request routing, caching, compression or even A/B testing.

When running web services in docker containers, it can be useful to run a reverse proxy in front of the containers to simplify depoyment.

Read On →

Docker Log Management Using Fluentd

Docker is an open-source project to easily create lighweight, portable and self-sufficient containers for applications. Docker allows you to run many isolated applications on a single host without the weight of running virtual machines.

One of the problems with the current versions of docker is managing logs. Each container runs a single process and the output of that process is saved by docker to a location on the host.

There are a few operational issues with this currently:

  • This log file grows indefinitely. Docker logs each line as a JSON message which can cause this file to grow quickly and exceed the disk space on the host since it’s not rotated automatically.
  • The docker logs command returns all recorded logs each time it’s run. Any long running process that is a little verbose can be difficult to examine.
  • Logs under the containers /var/log or other locations are not easily visible or accessible.
Read On →

Open-Source Service Discovery

Service discovery is a key component of most distributed systems and service oriented architectures. The problem seems simple at first: How do clients determine the IP and port for a service that exist on multiple hosts?

Usually, you start off with some static configuration which gets you pretty far. Things get more complicated as you start deploying more services. With a live system, service locations can change quite frequently due to auto or manual scaling, new deployments of services, as well as hosts failing or being replaced.

Dynamic service registration and discovery becomes much more important in these scenarios in order to avoid service interruption.

This problem has been addressed in many different ways and is continuing to evolve. We’re going to look at some open-source or openly-discussed solutions to this problem to understand how they work. Specifically, we’ll look at how each solution uses strong or weakly consistent storage, runtime dependencies, client integration options and what the tradeoffs of those features might be.

We’ll start with some strongly consistent projects such as Zookeeper, Doozer and Etcd which are typically used as coordination services but are also used for service registries as well.

We’ll then look at some interesting solutions specifically designed for service registration and discovery. We’ll examine Airbnb’s SmartStack, Netflix’s Eureka, Bitly’s NSQ, Serf, Spotify and DNS and finally SkyDNS.

Read On →

Fluentd vs Logstash

Fluentd and Logstash are two open-source projects that focus on the problem of centralized logging. Both projects address the collection and transport aspect of centralized logging using different approaches.

This post will walk through a sample deployment to see how each differs from the other. We’ll look at the dependencies, features, deployment architecture and potential issues. The point is not to figure out which one is the best, but rather to see which one would be a better fit for your environment.

Read On →

Realtime Web Server Log Metrics

This is a sample config that uses nxlog to tail web access logs in Combined Log Format, pull out the status code and bytes sent and send them to statsd so they can be graphed using Graphite.

It’s a simple way to see if your web server is returning errors over time or how much data it’s sending. The same concept could be used for other log files.

Logster lets you do similar things but custom parsing is accomplished by writing Python plugins which can be a little more complicated than using configuration files.

Read On →

Centralized Logging Architecture

In Centralized Logging, I covered a few tools that help with the problem of centralized logging. Many of these tools address only a portion of the problem which means you need to use several of them together to build a robust solution.

The main aspects you will need to address are: collection, transport, storage, and analysis. In some special cases, you may also want to have an alerting capability as well.

Read On →

Optimizing MongoDB Indexes

Good indexes are an important part running a well performing application on MongoDB. MongoDB performs best when it can keep your indexes in RAM. Reducing the size of your indexes also leads to faster queries and the ability to manage more data with less RAM.

These are a few tips to reduce the size of your MongoDB indexes:

Read On →