Watching Upstream Binaries with Concourse

Unifying how pipelines monitor third-party assets and versions.

When building software packages, it’s easy to accumulate dependencies on dozens of other, upstream software components. When building the first version of something, it’s easy to blindly download the source of the latest version off the packages’ website. However, once you’re past prototypes and need to deal with auditing or maintenance, it becomes important to have some [automated] processes in place.

I have written several posts over the years around experiments for automatically upgrading components to avoid repetitive work. Over time, and across projects, I noticed that a lot of the version and asset management responsibilities could be handled by more native Concourse concepts.

Difficulties Worth Solving

In my most recent iteration, check and get scripts are committed to the repository which encode the logic for finding the latest version and downloading the blobs. Then, a custom resource type was built for the repository so that each blob could be used as a native resource in the pipeline.

This approach had a few limitations:

  • it requires repository-specific Docker images to be maintained;
  • the Docker image may have different get/check logic from what is committed to the repository at a given time;
  • version constraints are difficult since they need to be encoded into the check script (e.g. grep -v 0.*); and
  • each get script was responsible for correctly downloading and performing its own checksum or signature verifications.

Since then, I have standardized how I track blobs by using metalinks because they support tracking download origins, checksum verification, signature verification, and lightweight version annotations. By reusing a metalink-like interface, I realized I could simplify many of these issues.

Planning an Interface

Concourse provides the check/get interface already, so I only needed to figure out how the resource should be configured. After reviewing the check behaviors I’ve used across projects, I settled on a few assumptions on how the resource should be configured:

  • simple command line tools would typically be sufficient for querying upstream versions (e.g. curl, git, grep, jq, sed);
  • instead of encoding version constraints in BASH commands:
    • return all known, upstream versions;
    • assume versions are semantically versioned; and
    • provide a resource configuration option for the user to specify a semver constraint.

The get behaviors were a bit more complicated, but eventually I settled on the user managing a script which receives a version environment variable and then the output is a metalink (in JSON format) of the blob details.

The basic functionality led to three main resource configuration options:

  • version_check - a bash script for returning a list of known versions
  • metalink_get a bash script for generating a metalink with the given version
  • version - an optional, semantic version constraint in the case that latest is not good enough

It seems a bit counter-intuitive that a Concourse resource would be configured with such dynamic commands (typically resources are much more deterministic in their configuration). But this seemed like a more practical way to start (when compared to managing a large number of Docker images, one per upstream, third-party software and a large number of resource types).

Eventually I settled on calling it dynamic-metalink and created the repository at dpb587/dynamic-metalink-resource.

Testing the Configuration

Using Go as an example, golang.org provides a simple API with the most recent patch of their most recent minor versions. The following curl is sufficient for getting the version list:

$ curl -s https://golang.org/dl/?mode=json | jq -r '.[].version[2:]'
1.11.2
1.10.5

Once the version is known, it can build the metalink-friendly structure of download and hash verification data:

$ version="1.11.2" curl -s https://golang.org/dl/?mode=json | jq '
  map(select(.version[2:] == env.version)) | map({
    "files": (.files | map({
      "name": .filename,
      "size": .size,
      "urls": [ { "url": "https://dl.google.com/go/\(.filename)" } ],
      "hashes": [ { "type": "sha-256", "hash": .sha256 } ] } ) ) } )[]'
{ "files": [
  { "name": "go1.11.2.src.tar.gz",
    "size": 21100145,
    "urls": [
      { "url": "https://dl.google.com/go/go1.11.2.src.tar.gz" } ],
    "hashes": [
      { "type": "sha-256",
        "hash": "042fba357210816160341f1002440550e952eb12678f7c9e7e9d389437942550" } ] },
  { "name": "go1.11.2.darwin-amd64.tar.gz",
...snip...

Testing the Pipeline

For a more specific example, I use the following for this website and automatically updating hugo. It watches for new versions, imports the new binary into the release, builds the site and deploys it, and then finally pushes it to the repository. When it succeeds or fails, it sends a notification to a Slack channel just to let me know.

jobs:
- name: upgrade-hugo-blob
  plan:
  - aggregate:
    - get: blob
      trigger: true
      resource: hugo-blob
    - get: repo
    - get: bosh-release-blobs-upgrader-pipeline
  - task: sync-blobs
    file: bosh-release-blobs-upgrader-pipeline/tasks/sync-blobs.yml
    params:
      blob: hugo
      track_files: .resource/metalink.meta4
  - task: test-deploy
    privileged: true
    file: repo/ci/tasks/test-deploy/config.yml
  - task: upload-blob
    file: bosh-release-blobs-upgrader-pipeline/tasks/upload-blobs.yml
    params:
      release_private_yml: |
        blobstore:
          options:
            access_key_id: ((access_key))
            secret_access_key: ((secret_key))
  - put: repo
    params:
      rebase: true
      repository: repo
  on_failure: *slack-notify-blob-failure
  on_success: *slack-notify-blob-success
resources:
- name: hugo-blob
  type: dynamic-metalink
  source:
    metalink_get: |
      jq -n '
        env.version | {
          "files": [
            { "name": "hugo_\(.)_Linux-64bit.tar.gz",
              "urls": [ { "url": "https://github.com/gohugoio/hugo/releases/download/v\(.)/hugo_\(.)_Linux-64bit.tar.gz" } ] } ] }'
    version_check: |
      git ls-remote --tags https://github.com/gohugoio/hugo.git \
        | cut -f2 \
        | grep -v '\^{}' \
        | grep -E '^refs/tags/v.+$' \
        | sed -E 's/^refs\/tags\/v(.+)$/\1/' \
        | grep -v '-' \
        | grep -E '^\d+\.\d+(\.\d+)?$'
- name: repo
  type: git
  source: ...snip...
- name: bosh-release-blobs-upgrader-pipeline
  type: git
  source: ...snip...

For example, Concourse automatically upgraded to v0.52 without me needing to worry about anything (you’ll also see I’m tracking the original download source information in those commits as well). To see more examples in pipeline-form, see the examples directory in the repository.

Usefulness

By having this dynamic-metalink resource available it has been even easier for me to implement automatic updating of upstream dependencies for my projects, especially BOSH releases. This sort of version management is a little different than package manager-based tools since it is intended to handle downloads that someone would otherwise handle manually. For example, if you are looking for something to keep your Ruby Gemfile or PHP composer.json up to date, tools like dependabot are much more useful since they can take advantage of the existing package and versioning systems. If there was a large, consolidated repository of package/binary/version assets for arbitrary projects, it would be a bit easier to have something like dependabot. But, in the meantime, the extra sections of pipeline configuration are helping me keep dependencies up to date.

If you are looking for more examples of this resource being used or to learn more, visit the repository at dpb587/dynamic-metalink-resource.