Websites often have a lot of different assets and files for the various areas of a website - content management systems, photo galleries, e-commerce product photos, etc. As a site grows, so does storage demand and backup requirements, and as storage demands grow it typically becomes necessary to distribute those files across multiple servers or services.
One method for managing disparate file systems is to use custom PHP stream wrappers and configurable paths; but some extensions don’t yet support custom wrappers for file access. An alternative that I’ve been using is an object and service-oriented approach to keep my application code independent from the storage configuration.
At the core of my design, is the asset storage interface which looks something like:
The storage engine is responsible for generating a reusable token that can be used for later retrieval. Generally, I simply have it generate a UUID as the token, however tokens could have storage-specific meaning.
Sample Storage Engines
I’ve used several base implementations:
LocalStorageEngine- the simplest storage using a local/NFS filesystem
AWSS3StorageEngine- using AWS S3 for storage
SftpStorageEngine- using PHP’s ssh2 module to access files on servers via SFTP
AtlassianConfluenceStorageEngine- managing documents within Confluence wikis
Remote services like AWS S3 and SFTP can cause significant performance issues. To help with that, I use a
CachedStorageEngine implementation. It accepts two
StorageEngineInterface arguments: one as the upstream engine, and
one as the local cache. For example:
CachedStorageEngine is just another implementation of
StorageEngineInterface, it can be used
interchangeably within the application with performance being the only difference.
Using dependency injection, each of the storage backends becomes an independent service, configured depending on the
application requirements. The application then has no storage-specific calls like
etc and the code looks something like:
retrieve will always return a
SplFileInfo instance, it can be referenced and handled like a local file
(as demonstrated by the
open call in the example.
The asset storage interface itself is fairly primitive, but it allows for some more complex configurations:
- by using dependency injection, it becomes extremely easy to switch storage engines since application code doesn’t need to change
- complex storage rules can be combined with meaningful tokens to, for example, store very large files on different disks and using a token prefix to identify that class
- creating a fallback storage class which will go through a chain of storages searching until it’s able to store or retrieve a token
- internally deferring operations via queue manager (e.g. instead of storing files immediately to S3 and waiting for upload time, write it locally and create a job to upload it in the background)
By abstracting storage logic outside of my application code, it makes my life much more easier as a developer and as a systems administrator when trying to manage where files are located and any relocations, as necessary.