Metalink Repositories: Mirroring Third-Party Dependencies

December 30, 2018

When managing project dependencies which are outside of your control, it is often best practice to assume those artifacts may disappear (e.g. they may move, disappear, or become corrupt). For this reason, you may want to be mirroring your assets which, with metalink repositories, provides the functionality of:

  • Multiple URLs can be configured for where to find an artifact. This allows for documenting where files were originally discovered, but also supports retrying download from mirrors if one location fails.
  • Download locations can be prioritized and configured for locations which helps ensure you always use a local mirror which may be optimized for your environment.
  • Checksums in the metalink continue to verify data integrity between download locations.

Building on top of metalinks is the idea of having a shared repository to document them. When you separate the processes of mirroring files and consuming them, it allows your product workflows to remain simpler and focused on a single responsibility. For example, if you have a large team with many shared dependencies, perhaps the mirroring process is your own "internal product" with a single configuration/pipeline for mirroring dependencies into metalink repositories. Then, individual products do not need to worry about 1) knowing how to download a dependency (e.g. Go); 2) dealing with the overhead of mirroring; and 3) can reuse artifacts from a local, faster mirror.

Example

For a concrete example of mirroring, here is a Concourse pipeline which mirrors Go to a custom S3 bucket. It is based on the dynamic-metalink resource (learn more) and the metalink-repository resource. The inline comments provide more insight about what it is doing.

resources:

# This is a simple check which watches the go download endpoint for versions
# and provides the download locations and checksums in a metalink JSON format.
- name: golang
  type: dynamic-metalink
  source:
    version_check: |
      curl -s https://golang.org/dl/?mode=json | jq -r '.[].version[2:]'
    metalink_get: |
      curl -s https://golang.org/dl/?mode=json | jq '
        map(select(.version[2:] == env.version)) | map({
          "files": (.files | map({
            "name": .filename,
            "version": env.version,
            "size": .size,
            "urls": [ { "url": "https://dl.google.com/go/\(.filename)" } ],
            "hashes": [ { "type": "sha-256", "hash": .sha256 } ] } ) ) } )[]'

# This configures a GitHub repository to store our metalink and mirror data. It
# will put each golang version into its own file within the golang.org directory
# and upload them to an S3 bucket, using a checksum as the object key.
- name: golang-mirror
  type: metalink-repository
  source:
    uri: git+ssh://git@github.com:acme/mirrors.git//golang.org
    options:
      private_key: ((git_private_key))
    # Optionally, these next two settings will mirror the artifacts to a custom
    # S3 bucket to ensure continued access.
    mirror_files:
    - destination: s3://s3-external-1.amazonaws.com/acme-mirror-us-east-1/golang.org/{{.SHA256}}
      location: US
      priority: 10
    url_handlers:
    - type: s3
      options:
        access_key: ((s3_access_key))
        secret_key: ((s3_secret_key))

jobs:

# Whenever there is a new version of golang, get the generated metalink which
# refers to the official download URLs and checksums. Then, put that metalink
# into our own repository and, because the golang-mirror configures the
# mirror_files option, it will take care of reuploading them to our bucket and
# adding it to the list of download locations.
- name: mirror-golang
  plan:
  - get: golang
    trigger: true
    params:
      skip_download: true
  - put: golang-mirror
    params:
      metalink: golang/.resource/metalink.meta4
    get_params:
      skip_download: true

# These configure the custom resource types to support this pipeline.
resource_types:
- name: dynamic-metalink
  type: docker-image
  source:
    repository: dpb587/dynamic-metalink-resource
- name: metalink-repository
  type: docker-image
  source:
    repository: dpb587/metalink-repository-resource