Simplifying My BOSH-related Workflows
Over the last nine months I've been getting into BOSH quite a bit. Historically, I've been reluctant to invest in BOSH because I don't entirely agree with its architecture and steep learning curve. BOSH describes itself with...
BOSH installs and updates software packages on large numbers of VMs over many IaaS providers with the absolute minimum of configuration changes.
BOSH orchestrates initial deployments and ongoing updates that are:
- Predictable, repeatable, and reliable
- Self-healing
- Infrastructure-agnostic
With continued use and experience necessitated from the logsearch project, I saw ways it would solve more critical problems for me than it would create. For that reason, I started experimenting and migrating some services over to BOSH to better evaluate it for my own uses. To help bridge the gap between BOSH inconveniences and some of my architectural/practical differences I've been making a tool called cloque
.
You might find the ideas more useful rather than the cloque
code itself – it is, after all, experimental and written in PHP (since that's why I'm most productive in) whereas bosh
is more Ruby/Go-oriented.
Infrastructure First
Generally speaking, BOSH needs some help with infrastructure (i.e. it can't create its own VPC, network routing tables, etc). Additionally, sometimes deployments don't even need the BOSH overhead. Within cloque
, I've split management tasks into two components:
- Infrastructure – this is more of the "physical" layer defining the networking layer, some independent services (e.g. NAT gateways, VPN servers), security groups, and other core or non-BOSH functionality.
- BOSH – everything related to BOSH (e.g. director, deployment, snapshots, releases, stemcells) which is deployed onto the infrastructure somewhere.
Since BOSH depends on some infrastructure, we'll get started with that first. One key to a cloque
-managed environment is that each environment has its own directory which includes a network.yml
in the top-level. The network may be located in a single datacenter, or it could span multiple countries. The file defines all the basics about the network including subnets, reserved IPs, basic cloud properties, and some logical names.
I've committed an example network to the share
directory within cloque
and will use that in the examples here. To get started, we'll copy the example and work with it...
$
# copy the sample environment
$
cp -r ~/cloque/share/example-multi ~/cloque-acme-dev
$
cd ~/cloque-acme-dev
$
# this will help the command know where to look for configs later
$
export CLOQUE_BASEDIR="$PWD"
If you take a look at the sample network.yml
, you'll see a couple regions with their individual network segments, VPN networks, and a few reserved IP addresses which can be referenced elsewhere. Once network.yml
is created, the utility:initialize-network
task can take care of bootstrapping the following:
- create stub folders for your different regions; e.g.
aws-apne1/core
,global/private
) - create a new SSH key (in
global/private/cloque-{yyyymmdd}*.pem
) and upload it to the AWS regions being used - create a new IAM user, access key, and EC2 policy for BOSH to use
- create a certificate authority for OpenVPN usage
- create both client/server certificates for the inter-region VPN connections (requires interactive prompts for passwords/confirmations)
- create an S3 bucket for shared configuration storage
When run, it assumes AWS credentials can be discovered from the environment...
$
cloque utility:initialize-network
> local:fs/global -> created
...snip...
I created utility:initialize-network
because I found myself reusing keys and buckets across multiple environments (such as development vs production) because they were annoying to manage by hand. I wanted to make security easier for myself and, in the process, simplify the processes through automation.
The top-level global
directory is intended for configuration which applies to all areas. With the example I use it to create an additional IAM role which allows VPN gateways to securely download their VPN keys and configuration files...
$
( cd global/core && cloque infra:put --aws-cloudformation 'Capabilities=["CAPABILITY_IAM"]' )
> validating...done
> checking...missing
> deploying...done
> waiting...CREATE_IN_PROGRESS...........................CREATE_COMPLETE...done
The infra:put
is the core command responsible for managing the low-level, infrastructure-related resources. The command looks for an infrastructure.json
file (see the example) and since I'm focused on AWS, the files are CloudFormation scripts.
One thing I dislike about BOSH is how it uses a state file or global options to specify the director/deployment. It makes it very inconvenient to quickly switch between directors/deployments even between multiple terminal sessions. To help with that, cloque
respects environment variables (or command line options) to know where it should be working from. The CLOQUE_BASEDIR
(exported earlier) is the most significant, and it was able to detect when it was working from the global
region/director and core
deployment based on the current directory.
Now that the global resources have been created, we can create our "core" resources for the us-west-2
region. If you take a look at the infrastructure.json file, you'll see it creates a VPC, multiple subnets for each availability zone, a couple base security groups, and a gateway instance which will function as a VPN server to allow inter-region communication. You'll also notice it's using Twig templating to load network.yml
and simplify what would be a lot of repeated resources. We'll use the infra:put
command again, but this time within the aws-usw2/core
directory...
$
cd aws-usw2
$
( cd core && cloque infra:put )
...snip...
> waiting...CREATE_IN_PROGRESS.........................CREATE_COMPLETE...done
BOSH supports ERB-templated deploy manifests. With ERB I found myself repeating a lot of code in each manifest when trying to make it dynamic. After trying spiff (which I found a bit limited and difficult to understand), I decided to use a different approach – one that would allow for the same dynamic, peer-config referencing, and (later) transformational capabilities for both infrastructure configuration and BOSH deployment manifests.
Once the infra:put
command finishes, the aws-usw2
part of the environment is complete which means the OpenVPN server is ready for a client. First we'll need to create and sign a client certificate though...
$
# temporary directory
$
mkdir tmp-myovpn
$
cd tmp-myovpn
$
# create a key (named after the hostname and current date)
$
TMPOVPN_CN=$(hostname -s)-$(date +%Y%m%da)
$
openssl req \
-subj "/C=US/ST=CO/L=Denver/O=ACME Inc/OU=client/CN=${TMPOVPN_CN}/emailAddress=`git config user.email`" \
-days 3650 -nodes \
-new -out openvpn.csr \
-newkey rsa:2048 -keyout openvpn.key
Generating a 2048 bit RSA private key
.............................+++
................+++
writing new private key to 'openvpn.key'
-----
$
# sign the certificate (you'll need to enter the PKI password you used in the first step)
$
cloque openvpn:sign-certificate openvpn.csr
$
# now create the OpenVPN configuration profile for connecting to aws-usw2
$
( \
cloque openvpn:generate-profile aws-usw2 $TMPOVPN_CN \
; echo '<key>' \
; cat openvpn.key \
; echo '</key>' \
) > acme-dev-aws-usw2.ovpn
$
# opening should install it with a GUI connection manager like Tunnelblick
$
open acme-dev-aws-usw2.ovpn
$
# cleanup
$
cd ../
$
rm -fr tmp-myovpn
$
unset TMPOVPN_CN
I created the openvpn:sign-certificate
and, namely, openvpn:generate-profile
commands to make the steps highly reproducible to encourage better certificate usage practices through it's "trivialness".
Since I'm using example.com
in the share
scripts as the domain, DNS won't resolve it. For now, the easiest solution is to manually add an entry to /etc/hosts
...
$
echo "`cd core && cloque infra:get '.Z0GatewayEipId'` gateway.aws-usw2.acme-dev.cloque.example.com" \
| sudo tee -a /etc/hosts
The infra:get
command allows me to programmatically fetch configuration details about the current deployment. For infrastructure, this allows me to extract the created resource IDs/names using jq statements. This makes it extremely easy to automate basic lookup tasks (as in this case), but also allows for more complex IP or security group enumeration which can be used for other composable, automated tasks.
Once /etc/hosts
is updated, I can connect with an OpenVPN client like Tunnelblick and ping the network...
$
ping -c 5 10.101.0.4
PING 10.101.0.4 (10.101.0.4): 56 data bytes
64 bytes from 10.101.0.4: icmp_seq=0 ttl=64 time=59.035 ms
64 bytes from 10.101.0.4: icmp_seq=1 ttl=64 time=61.288 ms
64 bytes from 10.101.0.4: icmp_seq=2 ttl=64 time=78.194 ms
64 bytes from 10.101.0.4: icmp_seq=3 ttl=64 time=57.850 ms
64 bytes from 10.101.0.4: icmp_seq=4 ttl=64 time=57.956 ms
--- 10.101.0.4 ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 57.850/62.865/78.194/7.764 ms
BOSH Director
Now that we have a VPC and a private network to deploy things into, we can start a BOSH Director. Here it's important to note that I'm using "region", "network segment", and "director" interchangeably. Typically you'll have a single BOSH Director within an environment's region, and since that Director will tag it's deployment resources with a "director" tag, I decided to make them all synonyms. The effect is twofold:
- when you see a "director" name (whether it's in the context of BOSH or not) it refers to where resources are provisioned
- you can consistently use a "director" tag (BOSH or not) to identify where something is deployed which makes AWS resource management much simpler (and AWS Billing reports by tag much more valuable).
Back to getting BOSH deployed though. First, we'll create some additional BOSH-specific, region-specific infrastructure (specifically, security groups for the director and agents)...
$
( cd bosh && cloque infra:put )
...snip...
> waiting...CREATE_IN_PROGRESS...............CREATE_COMPLETE...done
Here I start using the bosh
directory. I put Director-related configuration in the bosh
deployment. Individual BOSH deployments get their own directory.
Once the security groups are available, we can create the BOSH Director. The boshdirector:*
commands deal with the Director tasks (i.e. they don't depend on a specific deployment). To get started, the boshdirector:inception:start
command takes care of provisioning the inception instance (it takes a few minutes to get everything installed and configured)...
$
cloque boshdirector:inception:start \
--security-group $( cloque --deployment=core infra:get '.TrustedPeerSecurityGroupId' ) \
--security-group $( cloque --deployment=core infra:get '.PublicGlobalEgressSecurityGroupId' ) \
$( cloque --deployment=core infra:get '.SubnetZ0PublicId' ) \
t2.micro
> finding instance...missing
> instance-id -> i-f84169f3
> tagging director -> acme-dev-aws-usw2
> tagging deployment -> cloque/inception
> tagging Name -> main
> waiting for instance...pending.........running...done
> waiting for ssh.......done
> installing...
...snip...
> uploading compiled/self...
...snip...
> uploading global/private...
...snip...
You'll notice the cloque --deployment=core infra:get
usage to to load the security groups. The --deployment
option is an alternative to running cd ../core
before the command. Another alternative would be to use the CLOQUE_DEPLOYMENT
environment variable. Whatever the case, cloque
is intelligent and flexible about figuring out where it should be working from.
Before continuing, there's still a manual process of finding the correct stemcell. If we were in us-east-1
, we could use the "light-bosh" stemcell (which is really just an alias to a pre-compiled AMI that Cloud Foundry publishes). Unfortunately, we need to take the slower route of compiling our own AMI for us-west-2
. To do this, we need to lookup the latest stemcell URL from the published artifacts, then we pass that URL to the next command...
$
cloque boshdirector:inception:provision \
https://s3.amazonaws.com/bosh-jenkins-artifacts/bosh-stemcell/aws/bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent.tgz
> finding instance...found
> instance-id -> i-f84169f3
> deploying...
WARNING! Your target has been changed to `https://10.101.16.8:25555'!
Deployment set to '/home/ubuntu/cloque/self/bosh/bosh.yml'
Verifying stemcell...
File exists and readable OK
Verifying tarball...
Read tarball OK
Manifest exists OK
Stemcell image file OK
Stemcell properties OK
Stemcell info
-------------
Name: bosh-aws-xen-ubuntu-trusty-go_agent
Version: 2710
Started deploy micro bosh
Started deploy micro bosh > Unpacking stemcell. Done (00:00:18)
Started deploy micro bosh > Uploading stemcell. Done (00:05:16)
Started deploy micro bosh > Creating VM from ami-8fe7a1bf. Done (00:00:19)
Started deploy micro bosh > Waiting for the agent. Done (00:01:19)
Started deploy micro bosh > Updating persistent disk
Started deploy micro bosh > Create disk. Done (00:00:02)
Started deploy micro bosh > Mount disk. Done (00:00:09)
Done deploy micro bosh > Updating persistent disk (00:00:19)
Started deploy micro bosh > Stopping agent services. Done (00:00:01)
Started deploy micro bosh > Applying micro BOSH spec. Done (00:00:21)
Started deploy micro bosh > Starting agent services. Done (00:00:01)
Started deploy micro bosh > Waiting for the director. Done (00:00:19)
Done deploy micro bosh (00:08:13)
Deployed `bosh/bosh.yml' to `https://10.101.16.8:25555', took 00:08:13 to complete
> fetching bosh-deployments.yml...
receiving file list ...
1 file to consider
bosh-deployments.yml
1025 100% 1000.98kB/s 0:00:00 (xfer#1, to-check=0/1)
sent 38 bytes received 723 bytes 101.47 bytes/sec
total size is 1025 speedup is 1.35
> tagging...done
The :start
command took care of pushing the compiled manifest, but this :provision
command is responsible for pushing everything to the director and, once complete, downloading the resulting configuration locally. I created these two commands because they were a common task and the manual, iterative process was getting tiresome. It also helps unify both the intitial provisioning vs upgrade process and deploying from AMI vs TGZ. Instead of ~12 manual steps spread out over ~30 minutes, I only need to intervene at three points (including instance termination).
Once the provisioning step is complete, I can login and talk to BOSH...
$
# default username/password is admin/admin
$
bosh target https://10.101.16.8:25555
$
bosh status
Config
/Users/dpb587/cloque-acme-dev/aws-usw2/.bosh_config
Director
Name acme-dev-aws-usw2
URL https://10.101.16.8:25555
Version 1.2710.0 (00000000)
User admin
UUID f38d685c-9a72-4fc0-bc84-558979cc80bf
CPI aws
dns enabled (domain_name: microbosh)
compiled_package_cache disabled
snapshots disabled
Deployment
not set
Since BOSH Director is successfully running, it's safe to terminate the inception instance. Whenever there's a new BOSH version I want to deploy, I can just rerun the two start
and provision
commands (with an updated stemcell URL) and it will take care of upgrading it.
More on Stemcells
While inception was deploying the BOSH Director, it ended up making a stemcell that I can reuse for our BOSH deployments. Unfortunately, the Director doesn't know about it. The following command takes care of publishing it...
$
cloque boshutil:create-bosh-lite-stemcell-from-ami \
https://s3.amazonaws.com/bosh-jenkins-artifacts/bosh-stemcell/aws/light-bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent.tgz \
ami-8fe7a1bf
Uploaded Stemcell: https://example-cloque-acme-dev.s3.amazonaws.com/bosh-stemcell/aws/us-west-2/light-bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent.tgz
The command uses the URL (the light-bosh stemcell of the same version from the artifacts page) as a template and patches in the correct metadata for the local region. It then takes care of uploading it to the environment's S3 bucket and to the Director so it's immediately usable.
Another task I frequently need to do is convert the standard stemcells (which only support the PV virtualization) into HVM stemcells that I can use with AWS's newer instance types. This next command takes care of all those steps and, once complete, there will be a new *-hvm
stemcell ready for use on the Director.
$
cloque boshutil:convert-pv-stemcell-to-hvm \
https://example-cloque-acme-dev.s3.amazonaws.com/bosh-stemcell/aws/us-west-2/light-bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent.tgz \
ami-d13845e1 \
$( cloque --deployment=core infra:get '.SubnetZ0PrivateId , .TrustedPeerSecurityGroupId' )
Created AMI: ami-f3e3a5c3
Uploaded Stemcell: https://example-cloque-acme-dev.s3.amazonaws.com/bosh-stemcell/aws/us-west-2/light-bosh-stemcell-2710-aws-xen-ubuntu-trusty-go_agent-hvm.tgz
The command needs the light-bosh TGZ and AMI for the existing PV stemcell as well as a subnet and security group for it to provision the conversion instances in.
BOSH Deployment
Now that the BOSH Director is running, I can deploy something interesting onto it. Let's use logearch as an example. First I'll need to clone the repository...
$
git clone https://github.com/logsearch/logsearch-boshrelease.git ~/logsearch-boshrelease
$
cd ~/logsearch-boshrelease
Since I've changed directories away from our environment, cloque
will no longer know where to find its environment information. To help, I'll use a .env
file...
$
( \
echo 'export CLOQUE_BASEDIR=~/cloque-acme-dev' \
; echo 'export CLOQUE_DIRECTOR=aws-usw2' \
; echo 'export CLOQUE_DEPLOYMENT=logsearch' \
) > .env
I mentioned before that cloque
uses the current working directory, environment variables, and command options to figure out where to look for things. If it's still missing information, it will check and load a .env
file from the current directory as a last resort. This is normally only useful during development where I already use .env
for other project-specific BASH alias
es and variables.
Now I can upload the release...
$
cloque boshdirector:releases:put releases/logsearch-latest.yml
Since releases are Director-specific and unrelated to a particular deployment, It uses the boshdirector:*
namespace.
The example has the configuration files for infrastructure (EIP and security groups) and BOSH (deploy manifest), but I still need to generate a certificate locally...
$
openssl req -x509 -newkey rsa:2048 -nodes -days 3650 \
-keyout ~/cloque-acme-dev/aws-usw2/ssl.key \
-out ~/cloque-acme-dev/aws-usw2/ssl.crt
Having a directory per deployment helps keep everything scoped and organized when there are additional artifacts. The templating nature of cloque
allows the files to be embedded into its own deployment manifest, but also other deployment manifests. With the example of logsearch, this means I don't need to copy and paste the ssl.crt
into other deployments, just embed it using a relative path (embeds are always relative to the config file – something BOSH ERBs struggle with): {{ env.embed('../logsearch/ssl.crt') }}
.
Once uploaded, I can use the infra:put
and mirrored bosh:put
command to push the infrastructure and BOSH deployment (-n
meaning non-interactive, just like with bosh
)...
$
cloque infra:put
...snip...
> waiting...CREATE_IN_PROGRESS.....................CREATE_COMPLETE...done
$
cloque -n bosh:put
Getting deployment properties from director...
...snip...
Deployed `bosh.yml' to `acme-dev-aws-usw2'
Once complete, I can see the elasticsearch service running...
$
wget -qO- '10.101.17.26'
{
"status" : 200,
"name" : "elasticsearch/0",
"version" : {
"number" : "1.2.1",
"build_hash" : "6c95b759f9e7ef0f8e17f77d850da43ce8a4b364",
"build_timestamp" : "2014-06-03T15:02:52Z",
"build_snapshot" : false,
"lucene_version" : "4.8"
},
"tagline" : "You Know, for Search"
}
And I can see the ingestor listening on its EIP:
$
echo 'QUIT' | openssl s_client -showcerts -connect $( cloque infra:get '.Z0IngestorEipId' ):5614
CONNECTED(00000003)
And I can SSH into the instance...
$
cloque bosh:ssh
...snip...
bosh_j51114xze@c989cf2f-91e4-407e-a7d7-bdc03ef79511:~$
The bosh:ssh
command is a little more intelligent than bosh ssh
. It will peek at the manifest to know if there's only a single job running, in which case the job/index argument becomes meaningless. Additionally, it always will use a default sudo
password of c1oudc0w
(avoiding the interactive delay and prompt that bosh ssh
requires).
Package Development
When I need to create a new package, I started using a convention where I'd add the origin URL where I found a blob/file. This provides me with more of an audit over time, but also allows me to automate a spec
file which looks like:
---
name: "nginx"
files:
# http://nginx.org/download/nginx-1.7.2.tar.gz
- "nginx-blobs/nginx-1.7.2.tar.gz"
# ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.35.tar.gz
- "nginx-blobs/pcre-8.35.tar.gz"
# https://www.openssl.org/source/openssl-1.0.1h.tar.gz
- "nginx-blobs/openssl-1.0.1h.tar.gz"
...snip...
Into a series of wget
s with the boshutil:package-downloads
command...
$
cloque boshutil:package-downloads nginx
mkdir -p 'blobs/nginx-blobs'
[ -f 'blobs/nginx-blobs/nginx-1.7.2.tar.gz' ] || wget -O 'blobs/nginx-blobs/nginx-1.7.2.tar.gz' 'http://nginx.org/download/nginx-1.7.2.tar.gz'
[ -f 'blobs/nginx-blobs/pcre-8.35.tar.gz' ] || wget -O 'blobs/nginx-blobs/pcre-8.35.tar.gz' 'ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-8.35.tar.gz'
[ -f 'blobs/nginx-blobs/openssl-1.0.1h.tar.gz' ] || wget -O 'blobs/nginx-blobs/openssl-1.0.1h.tar.gz' 'https://www.openssl.org/source/openssl-1.0.1h.tar.gz'
...snip...
I was tired of having to manually download files, bosh add blob
them with the correct parameters and then having to manually delete the originals. This lets me completely avoid that step and ensures I'm using the files I expect. Whenever a blob is an internal file or src
, I just take care of it manually like before.
When I'm working on a packaging
script I use Docker images to emulate the build environment. Since 99% of my build issues come from configure
arguments and environment variables, this is normally sufficient. This also lets me iteratively debug my packaging scripts as opposed to the slow, guess and check method of re-releasing and deploying the whole thing to BOSH to test fixes. The boshutil:package-docker-build
command helps me here...
$
cloque boshutil:package-docker-build ubuntu:trusty nginx
> compile/packaging...done
> compile/nginx-blobs/nginx-1.7.2.tar.gz...done
> compile/nginx-blobs/pcre-8.35.tar.gz...done
> compile/nginx-blobs/openssl-1.0.1h.tar.gz...done
...snip...
Sending build context to Docker daemon 7.571 MB
Sending build context to Docker daemon
Step 0 : FROM ubuntu:trusty
---> ba5877dc9bec
Step 1 : RUN apt-get update && apt-get -y install build-essential cmake m4 unzip wget
...snip...
root@347c1d4ca07b:/var/vcap/data/compile/nginx#
This command mirrors the BOSH environment by using the spec
file to add the referenced blobs, uploads the packaging script, configures the BOSH_COMPILE_TARGET
and BOSH_INSTALL_TARGET
variables, creates the directories, and switches to the compile directory, ready for me to type ./packaging
or paste commands iteratively. It also has the --import-package
and --export-package
options to import/dump the resulting /var/vcap/packages/{name}
directory to support dependencies.
Snaphots
One easy feature that BOSH has is snapshotting to get a full backup of its persistent disks. You can run its take snapshot
command for a particular job or for an entire deployment. Or, if "dirty" snapshots are okay, the Director can schedule them automatically. To manage all those snapshots, I created a few commands. The first command takes care of snapshots that the BOSH Director creates of itself...
$
cloque boshdirector:snapshots:cleanup-self 3d
snap-4219f4fb -> 2014-09-13T06:01:14+00:00 -> deleted
snap-2e6588e4 -> 2014-09-13T06:03:55+00:00 -> deleted
snap-1acd90d3 -> 2014-09-13T06:06:36+00:00 -> deleted
snap-618c7da9 -> 2014-09-14T06:01:15+00:00 -> retained
snap-dce22315 -> 2014-09-14T06:03:55+00:00 -> retained
snap-a9e81a60 -> 2014-09-14T06:06:35+00:00 -> retained
snap-d35ea51a -> 2014-09-15T06:01:18+00:00 -> retained
snap-3742b88e -> 2014-09-15T06:03:58+00:00 -> retained
snap-0b8b40c2 -> 2014-09-15T06:06:38+00:00 -> retained
snap-ea16dfd3 -> 2014-09-16T06:01:18+00:00 -> retained
snap-913df459 -> 2014-09-16T06:03:58+00:00 -> retained
snap-82d5fc4b -> 2014-09-16T06:06:38+00:00 -> retained
This command is simplistic and trims all snapshots earlier than a given period (in this case three days). I got very tired and forgetful about regularly cleaning up snapshots from the AWS Console. It communicates directly with the AWS API since the bosh
command doesn't seem to enumerate them.
The command for individual deployment snapshots is a bit more intelligent. It allows writing logic which, when passed a given snapshot, determines whether it should be retained or deleted. For example...
$
cloque boshdirector:snapshots:cleanup
...snip...
snap-7837f7d4 -> 2014-08-01T07:01:30+00:00 -> dirty -> retained
snap-62cca4de -> 2014-08-04T07:00:28+00:00 -> dirty -> retained
snap-bdd29512 -> 2014-08-04T22:51:57+00:00 -> clean -> retained
snap-4dd5a3e1 -> 2014-08-04T23:46:23+00:00 -> clean -> retained
snap-2bb7c784 -> 2014-08-11T07:00:46+00:00 -> dirty -> retained
snap-5239b7fc -> 2014-08-18T07:00:40+00:00 -> dirty -> retained
snap-cf6fcb6e -> 2014-08-25T07:00:39+00:00 -> dirty -> retained
snap-9d00103c -> 2014-08-28T13:34:39+00:00 -> clean -> retained
snap-9d80103d -> 2014-09-01T07:00:43+00:00 -> dirty -> retained
snap-79c18cda -> 2014-09-08T07:00:44+00:00 -> dirty -> retained
snap-87f47a24 -> 2014-09-09T07:00:57+00:00 -> dirty -> deleted
snap-5fec87fc -> 2014-09-10T07:00:55+00:00 -> dirty -> retained
snap-bdfeda1e -> 2014-09-11T07:00:58+00:00 -> dirty -> retained
snap-246b6987 -> 2014-09-12T07:00:54+00:00 -> dirty -> retained
snap-c234d870 -> 2014-09-13T07:00:43+00:00 -> dirty -> retained
snap-28ed128a -> 2014-09-14T07:00:55+00:00 -> dirty -> retained
snap-ef6ac34d -> 2014-09-15T07:00:55+00:00 -> dirty -> retained
snap-72c156d3 -> 2014-09-16T07:00:42+00:00 -> dirty -> retained
The command looks for a deployment-specific file which receives information about the snapshot (ID, date, clean/dirty) and returns true
to cleanup/delete or false
to retain. This allows me to create some very custom retention policies for individual deployments, depending on their requirements. In this example, clean snapshots are kept 3 months, Mondays are kept for 6 months, first of month is kept indefinitely, everything else kept for 1 week.
Revitalizing
In the past I've typically used local VMs with VirtualBox or VMWare Fusion for personal development. Unfortunately they always seemed to drift from production servers, which made things inconvenient, at best. With BOSH, it became trivial for me to start/stop deployments and guarantee they have a known environment. When my VMs were local I always had scripts which would pull down backups, restore them, and clean up data for development. With cloque
I've been using a revitalize
concept which allows me to restore data from snapshots or run arbitrary commands. For example, I can add the following to my database job to restore data from a slave's most recent snapshot...
jobs:
- name: "mysql"
...snip...
cloque.revitalize:
- method: "snapshot_copy"
director: "example-acme-aws-usw2"
deployment: "wordpress-demo-hotcopy"
job: "mysql"
- method: "script"
script: "{{ env.embed('revitalize.sh') }}"
The snapshot_copy
method takes care of finding the most recent snapshot with the given parameters and would copy the data onto the local /var/vcap/store
directory (trashing anything it replaces). The script
method allows an arbitrary script to run, in this case, one that resets the MySQL users/passwords and cleans data for development purposes.
Whenever I want to reload my dev deployment with more recent production data (or after I've sufficiently polluted my dev data), I can just run the bosh:revitalize
task...
$
cloque bosh:revitalize
> mysql/0
> finding 10.101.17.41...
> instance-id -> i-fe0e23f3
> availability-zone -> us-west-2w
> stopping services...
> waiting...............done
> snapshot_copy
> finding snapshot...
> snapshot-id -> snap-3867159a
> start-time -> 2014-09-16T06:58:31.000Z
> creating volume...
> volume-id -> vol-edc5bfe9
> waiting...creating...available...done
> attaching volume...
> waiting...in-use...done
> mounting volume...
> transferring data...
> removing mysql...done
> restoring mysql...done
> unmounting volume...
> detaching volume...
> waiting...in-use......available...done
> destroying volume...
> script...
> starting services...
...snip...
This also makes it easy for me to condense services which run on multiple machines in production onto a single machine for development by restoring from multiple snapshots (as long as the services store
directories are properly named).
Configuration Transformations
I mentioned earlier that configuration files are templates. In addition to basic templating capabilities, I added some transformation options. Transformations allow a processor to receive the current state of the configuration, do some magic to it, and return a new configuration. The easiest example of this is with logging – I want to centralize all my log messages and collectd
measurements. Here I'll use logsearch-shipper-boshrelease, but regardless of how it's done, it typically requires adding a new release to your deployment, adding the job template to every job, and adding the correct properties. When you have multiple deployments, this becomes tedious and this is where a transformation shines. The transform could take care of the following:
- adding the
logsearch
properties (SSL key,bosh_director
field to messages, EIP lookup for the ingestor) - add the
logsearch-shipper
release to the deployment - add the
logsearch-shipper
job template to every job
And raw code for that transform could go in aws-usw2/logsearch/shipper-transform.php
:
<?php return function ($config, array $options, array $params) {
// add our required properties
$config['properties']['logsearch'] = [
'logs' => [
'_defaults' => implode("\n", [
'---',
'files:',
' "**/*.log":',
' fields:',
' type: "unknown"',
' bosh_director: "' . $params['network_name'] . '-' . $params['director_name'] . '"',
]),
'server' => $params['env']['self/infrastructure/logsearch']['Z0IngestorEipId'] . ':5614',
'ssl_ca_certificate' => $params['env']->embed(__DIR__ . '/ssl.crt'),
],
'metrics' => [
'frequency' => 60,
],
];
// add the template job to all jobs
foreach ($config['jobs'] as &$job) {
$job['templates'][] = [
'release' => 'logsearch-shipper',
'name' => 'logsearch-shipper',
];
}
// add the release, if it's not explicitly using a version
if (!in_array('logsearch-shipper', array_map(function ($a) { return $a['name']; }, $config['releases']))) {
$config['releases'][] = [
'name' => 'logsearch-shipper',
'version' => '1',
];
}
return $config;
};
And then whenever I want a deployment to forward its logs with logsearch-shipper
, I only need to add the following to the root level of my bosh.yml
deployment manifest...
_transformers:
- path: "../logsearch/shipper-transform.php"
This approach helps me keep my deployment manifests concise. Rather than clutter up my definitions with ancillary configuration and sidekick jobs, they remain focused on the services they're actually providing.
Tagging
Since starting with BOSH, I've used AWS tags more heavily. I consistently use the director
tag to represent the {network_name}-{region_name}
(e.g. acme-dev-aws-usw2
) and the deployment
tag to represent the logical set of services (regardless of whether BOSH is managing them or not). I made another command which can enumerate relevant resources and ensure they have the expected tags:
$
cloque utility:tag-resources
> reviewing us-west-2...
> acme-dev-aws-usw2/bosh/microbosh -> i-298fb0c6
> /dev/xvda -> vol-d46fa79b
> adding director -> acme-dev-aws-usw2
> adding deployment -> microbosh
> adding Name -> microbosh/0/xvda
> /dev/sdb -> vol-8b6c46c6
> adding director -> acme-dev-aws-usw2
> adding deployment -> microbosh
> adding Name -> microbosh/0/sdb
> /dev/sdf -> vol-8a6d46c6
> adding director -> acme-dev-aws-usw2
> adding deployment -> microbosh
> adding Name -> microbosh/0/sdf
> acme-dev-aws-usw2/logsearch/main/0 -> i-46be80b9
> /dev/sda -> vol-fa4e57b5
> adding director -> acme-dev-aws-usw2
> adding deployment -> logsearch
> adding Name -> main/0/sda
> /dev/sdf -> vol-73e0ce3e
> acme-dev-aws-usw2/infra/core/z1/gateway -> i-8d60f6a2
> /dev/sda1 -> vol-7b5b7838
I added this command because I wanted to be sure my volumes were all accurately tagged. This helps me when using the AWS Console, but it also provides more detail in the AWS Billing Reports when the director
and deployment
tags are included for detailed billing.
Conclusion
BOSH is far from perfect, in my mind, but with a little help it is enabling me to be more productive and effective than other tools I've tried in the areas which are most important to me.