Distributed Docker Containers

February 28, 2014

One thing I've been working with lately is Docker. You've probably seen it referenced in various tech articles lately as the next greatest thing for cloud computing. Docker runs "containers" from base "images" which essentially allow running many lightweight virtual machines on any recent, Linux-based system. Internally, the magic behind it is lxc, although Docker adds a lot more magic to improve and make it more usable.

For a long time now I've used virtual machines for development – it allows me to better simulate how software runs out on production servers. Historically, Vagrant + VirtualBox/VMWare Fusion/EC2 have been great tools for that, but they have limitations and they tend to drift a bit from production architecture.

The Problem

In trying to duplicate the production environments, it's not typically feasible for me to run more than one virtual machine on my laptop. I could split my single local virtual machine to multiple EC2 instances; but then it becomes more difficult to manage IP addresses for the various service dependencies as the instances get stopped/started between working sessions (in addition to the extra costs). VPCs with private IP addresses do help with that a lot, as long as there's a sane way to manage those resources.

Another issue that comes up when combining services on a single host is dependency overlap. One example of this is shared modules. Some newer features of nginx require a newer version of the openssl libraries. However, PHP doesn't necessarily support the newer version of openssl without upgrading quite a few other components. While there may be workarounds, the inconvenience of it all typically just prompts me to avoid working on that particular feature, unfortunately.

Ultimately, I want to have the same software and network stack that I use in a production environment, but in a development environment and, if possible, locally on my laptop.

The Alternatives

This problem is certainly not unique, but a practical solution has been difficult for me to find. I've been experimenting with a few different technologies over the years trying to solve this sort of thing.

Vagrant is obviously the first practical solution. For me, it has been a functional solution for quite a while, but not an optimal one. Like I mentioned before, it's a bit bulky when attempting to mimic non-trivial architectures on a standard laptop. For a while now, I've been finding the motivation and time to migrate to a better setup.

With the advent of Docker, many of my software requirements become much simpler. Each piece of software can run in its own container and I don't have to worry about dependency overlap. Multiple containers are significantly cheaper than trying to run multiple virtual machines. I could even reuse containers built on my development machine out on production. One thing Docker doesn't effectively solve is service dependency. It can support them on a single host with links, but not across multiple hosts.

I've been keeping an eye out for other tools which may help solve these problems. Some of them are:

  • decking – seems to primarily build on top of Docker's built-in link functionality for service dependency within a single host
  • etcd – an excellent distributed, hierarchical key-value store; very useful for monitoring configuration values and being notified when they change (related: confd)
  • fig – seems like Foreman, but geared for Docker containers
  • flynn – originally I was very excited about this, however it still seems underdeveloped for the purposes of service discovery of arbitrary services; I'm still very hopeful
  • serf – a very new client for distributing data across a cluster and taking action on it. To me it seems like more of a management tool (like half of the mcollective utility)

Recently, I've been becoming more aquainted with bosh, an interesting tool for managing large deployments along with all their dependencies. To me, bosh always seems overly complicated for whatever I'd want to accomplish and has quite a few bosh-specific practices to learn. Its resource and service management is very thorough, although it takes a while to get comfortable with it. It seems more like an infrastructure management tool rather than a service management tool, and I was hoping to keep those responsibilities separate and simpler. Ultimately, I think bosh could be made to work... but I was still hoping for something different, lighter, and utilizing more common open source tools that I was already familiar with.

The Ideas

I had a simple application in mind to roughly define my "minimum viable product":

  1. run WordPress web application, a MySQL server, and a backup MySQL server as separate services
  2. runtime parity (between development and production)
    1. configure services the exact same way
    2. run services the exact same way
    3. depend on other services the exact same way
  3. architecture flexibility
    1. in production, run the services on three separate hosts across two separate data centers
    2. in development, run all services on a single virtual machine on my laptop
  4. service flexibility – be able to dynamically relocate services without manual reconfiguration and minimal downtime
    • combine services into one or two hosts during quiet hours
    • move a service to a more powerful instance during high load
  5. self-provisioning – when a container requires a particular volume or network, make sure it can be automatically provisioned and de-provisioned

First off, I knew I wanted to run the services inside of Docker containers. I can only imagine Docker's ubiquity will continue to grow, and the ability to run completely arbitrary software anywhere with minimal host dependencies seemed like a perfect, lightweight solution.

I've used Puppet to configure servers and applications for a long time. While I dislike the overhead it requires for smaller use cases, I really like the consistency and declarative nature that it provides. Since I'll continue to use it for host server configuration, it's a small stretch to also use it for configuring the service runtimes.

When it comes down to it, I think there are two main questions that a service must answer:

  • How should I work? and
  • How do I connect with the rest of the world?

The first question can be managed and configured via Puppet. Once a service is configured and compiled to run as requested, it never needs to go through that process again. This approach lets compiled Docker images be consistently reused across time and servers.

The second question deals with pointing WordPress to the MySQL server, or pointing MySQL server to the data directory, or running the MySQL backup server on a specific network segment. These decisions and connections have nothing to do with how the service should work, so they can be changed as needed. So far, I have four main dependencies about how these containers get connected:

  1. volumes – giving containers a place to write persistent data (e.g. WordPress wp-content/uploads directory)
  2. provided services – a service that the container is running (e.g. http on 80/tcp)
  3. required services – a service that the container needs (e.g. mysql)
  4. network – how the container is attached to the network

I think these basic aspects effectively describe everything needed to manage a self-contained service.

The Implementation

The next step of an idea is to prototype it, and that's where I am today. There are several pieces that I've been working on, but three general topics...

Service Discovery

One of the most interesting concepts is service discovery. I wanted containers to be able to connect with each other across multiple hosts and data centers. I've been using DNS for host discovery and, while it works great it doesn't seem entirely appropriate for "containerized" discovery. Through A records, DNS easily picks up on hosts changing, but is not so good for dynamic ports. DNS SRV records seem much more appropriate with attributes for both hostname and port, but SRV records are rarely used in internal APIs.

Originally I was using etcd to register and discover services, but I found it to be inefficient for filtering services and propagating changes. Instead, I created a specialized client/server protocol to handle the registration and discovery process. In technical terms, the protocol works like the following...

WordPress needs a database, so before it starts the container, it connects with the disco server:

container: Hi, I need a mysql service to talk to – who's available?
disco: You should talk with 192.0.2.11:39313 – I'll keep you posted if it changes, but let me know if you no longer need it

The results are injected as environment variables when the container is started and can use them however it likes. WordPress obviously runs a web server, so, once the container is started, the container manager connects with disco:

container: Hi, I'm wordpress and I have an http service available at 192.0.2.12 on port 39212
disco: Nice to meet you; let me know if you no longer provide it

Then things are running happily and you could ask the disco server where to find wordpress/http to pull it up in your web browser. If the database server crashes and recovers elsewhere, a few things will happen. First, when disco realizes MySQL is no longer available (either by a clean disconnect, heartbeat timeout, or socket disconnect), it notifies everyone who is subscribed that the endpoint has been dropped:

disco: Looks like you were using mysql, but I'm sorry to tell you it's no longer available
container: Thanks for letting me know

The container manager then attaches to the container to run an update command letting it know about the change. The command can take care of updating the runtime configuration and restarting the application server.

Eventually the new MySQL server will come back online and register itself. Once registered, disco realizes that WordPress is subscribed, so it lets it know:

disco: Great news, I have a new mysql endpoint for you at 192.0.2.14.39414
container: Excellent, thanks

And it again runs the live update command, updating the environment and restarting the application server.

The disco protocol has a few more features (like using a single server for more than one WordPress/MySQL setup, or filtering services by arbitrary tags like availability zones to improve load balancing), but that's the general idea.

Configuration Files

I'm using YAML files to describe images and containers. They get compiled to a static version, and then cached based on the image configuration. For example, take a look at this example scs-wordpress image manifest. It describes the various connection points, docker details, and how it's configured. Now, take a look at the Puppet manifests which enumerates all the configuration options which affect how the service will run. Finally, take a look at the sample config which ties together what kind of image it needs to be able to run (configuration) and how that image will be connected to the world.

Self-Provisioning

For each of the four dependency/connection types (volumes, service provider, service dependent, network), I'm trying to make them suitable for local development and AWS EC2 deployment. For example:

  • AWS EC2 volumes can be auto-created, mounted, and attached to hosts for use by docker containers. This allows services to drift across instances
  • Likewise, I can also just use a local path for a volume and avoid an official network mount
  • Various other strategies can be added for each dependency:
    • nfs-volume: to attach a docker mount point to an external NFS mount
    • aws-ec2-eni: to attach an ENI as the network interface for a docker container

My goal is to provide a manifest configuration file to a machine and know that it will load up whatever it needs to run, including recompiling the image from scratch if it's not available in any caches.

The Prototype

So, all those ideas are currently under development in my scs-utils repository. I've created a repository called scs-example-blog which is a functional implementation of my original MVP. It provides a Vagrantfile for you to easily try it out yourself and it goes through the process of getting the containers running on a single virtual machine, accessing the services from the host, and then splitting them up across multiple virtual machines. It's more a tutorial describing the steps – typically the service deployment would be managed by Puppet.

The Conclusion

All these ideas are absolutely a work in progress and I'm still actively tweaking the implementation, but it was in a functional state to briefly discuss the idea. So far it has been an excellent learning opportunity for Docker, custom network protocols, and splitting some of the services I've previously been running into more reusable components. Even if scs-utils isn't still what I'm using in 2 years, the refactoring it has motivated makes it significantly easier to port into whatever more valuable tool surfaces further down the road.