A Generic Storage Interface

March 1, 2013

Websites often have a lot of different assets and files for the various areas of a website – content management systems, photo galleries, e-commerce product photos, etc. As a site grows, so does storage demand and backup requirements, and as storage demands grow it typically becomes necessary to distribute those files across multiple servers or services.

One method for managing disparate file systems is to use custom PHP stream wrappers and configurable paths; but some extensions don't yet support custom wrappers for file access. An alternative that I've been using is an object and service-oriented approach to keep my application code independent from the storage configuration.

Interface

At the core of my design, is the asset storage interface which looks something like:

interface StorageEngineInterface {

    // store a file and return back a token that can be used to retrieve it
    function store(SplFileInfo $file);

    // retrieve a locally-accessible SplFileInfo based on the token
    function retrieve($token);

    // remove data from storage based on the token
    function purge($token);

}

The storage engine is responsible for generating a reusable token that can be used for later retrieval. Generally, I simply have it generate a UUID as the token, however tokens could have storage-specific meaning.

Sample Storage Engines

I've used several base implementations:

LocalStorageEngine – the simplest storage using a local/NFS filesystem
AWSS3StorageEngine – using AWS S3 for storage
SftpStorageEngine – using PHP's ssh2 module to access files on servers via SFTP
AtlassianConfluenceStorageEngine – managing documents within Confluence wikis

Remote services like AWS S3 and SFTP can cause significant performance issues. To help with that, I use a CachedStorageEngine implementation. It accepts two StorageEngineInterface arguments: one as the upstream engine, and one as the local cache. For example:

new CachedStorageEngine(
    new AWSS3StorageEngine(new Aws\S3\S3Client(...), 'bucket.example.com', 'my-prefix'),
    new LocalStorageEngine('/tmp/s3-bucket.example.com-cache')
);

And since CachedStorageEngine is just another implementation of StorageEngineInterface, it can be used interchangeably within the application with performance being the only difference.

Application Usage

Using dependency injection, each of the storage backends becomes an independent service, configured depending on the application requirements. The application then has no storage-specific calls like copy, file_get_contents, fopen, etc and the code looks something like:

// storage service for photos
$storage = $dic->get('photo_storage')

// save a new photo
$photo = new PhotoRecord();
$photo->setAssetToken(
    $storage->store($request->files->get('upload'))
);

// use the photo
$image = (new Imagine\Gd\Imagine())->open(
    $storage->retrieve($photo->getAssetToken())
);

// delete the photo
$storage->purge($photo->getAssetToken());
$photo->delete();

Since retrieve will always return a SplFileInfo instance, it can be referenced and handled like a local file (as demonstrated by the open call in the example.

Complicating Things

The asset storage interface itself is fairly primitive, but it allows for some more complex configurations:

by using dependency injection, it becomes extremely easy to switch storage engines since application code doesn't need to change
complex storage rules can be combined with meaningful tokens to, for example, store very large files on different disks and using a token prefix to identify that class
creating a fallback storage class which will go through a chain of storages searching until it's able to store or retrieve a token
internally deferring operations via queue manager (e.g. instead of storing files immediately to S3 and waiting for upload time, write it locally and create a job to upload it in the background)

Summary

By abstracting storage logic outside of my application code, it makes my life much more easier as a developer and as a systems administrator when trying to manage where files are located and any relocations, as necessary.