Week 19

August 6, 2017

Hi team,

This week has been about deploying all that has changed since the start of your holiday a month ago and patching up what broke. In addition, two new major features are included - a secure method of communicating with Mindbender via the web and the publishing of file sequences - and major space optimisations via hardlinks.

Table of contents

What broke?
HTTPS
Sequences
Hardlink
Task Distribution, continued
Next Week

What Broke?

One word; backwards compatibility.

Due to a significant change (i.e. changing name to Avalon) I had trouble maintaining compatibility in a way that didn’t negatively affect future maintainability of the project. I weighed this against the low number of projects currently in circulation at Mindbender and made the decision to break it.

What this means in practice is that (1) already loaded assets won’t show up in the new Manager and thus can’t be updated and (2) the new Loader will have trouble providing you with previously published assets. If this becomes a problem, let me know and I’ll put together a migration script for you to apply to relevant May scenes.

Those are the known breakages, then there are likely a few minor problems scattered across the pipeline that I’ll need you to look out for. None of these should prevent old and future projects from functioning properly, so just let me know and we’ll smoke it out.

Moving forward, the goal is to maintain backwards compatibility with every project starting now; a property that is expected to grow increasingly important as (1) projects overlap - such as the long-running Food Thief - (2) where we want to archive/restore old projects and (3) share assets in between projects.

HTTPS

Last week I started looking into what it would take to implement basic security measures for cross-site file synchronisation and struggled with the concept of TLS, or “Transport Layer Security” and basic user authentication.

Over the weekend, a thought occurred to me that I’ll attempt to elaborate on.

The problem I was having was that the top-level domain http://mindbender.com was managed by an external server, outside the offices of Mindbender and I wanted to serve multiple sub-domains of this domain - e.g. https://logs.mindbender.com and https://files.mindbender.com - from a different IP address and single computer within the walls of Mindbender. Dealing with certificates threw me off and complicated things; I was unable to figure out how to serve certificates to multiple subdomains at once, especially when the top-level domain didn’t have a certificate. Not fully understanding that each domain needed it’s own unique certificate to play ball.

Then the thought occurred to make a new sub-top-level domain, such as https://pipeline.mindbender.com, give this a certificate and chain additional subdomains on to it, e.g. https://logs.pipeline.mindbender.com. Not the most visually appealing or even memorable, but it does provide a convenient namespace in which to put all sorts of domains.

I gave this a try using nothing but Caddy and presto, it worked! Certificates were acquired automatically and things generally Just Worked (tm).

Shortly thereafter I discovered an extended syntax of Caddy that enabled me to reverse proxy multiple sub-domains from a single computer without a namespace, including relaying requests from the internet to another computer within our own local network.

Big win. The layout looks like the image attached above. Note the single point of security (Caddy) covering multiple addresses.

To sum it up, I’m using Caddy to (1) serve files, (2) provide an interactive file-manager and (3) to reverse proxy and manage certificates via Let’s Encrypt, Sentry to aggregate log messages and exceptions encountered per-machine, per-organisation (Colorbleed and Mindbender), Papertrail to aggregate log messages generated by each of the server applications and Logspout to pass them along from each individual Docker instance that ultimately acts as the procesing hub of servers.

GitHub

Sequences

Sequences include any asset consisting of 2 or more files and publishing them required some ingenuity.

With traditional assets, what you ultimately load is a file, referred to as a “representation”. A file has a name and an extension, the extension signifying the format in which the contained data is stored as. Such as .abc for Alembic caches and .ma for a generic Maya scene file, used for rigs amongst other things.

This maps well to how files are typically managed by the operating system, that associates relevant software to the suffix of each file.

With sequences, things are no different. When publishing a sequence of files - such as ["background.1000.exr", "background.1001.exr"] - we treat their parent directory as the representation.

v001/
  background.exr/
    background.1000.exr
    background.1001.exr
    background.1002.exr
    ...

Even though background.exr is a directory, it is managed no differently from how any other representation is managed. That is, when a representation is handed to one of your Loader plug-ins it is handed the resolved absolute path to it, which in this case just so happens to be the path to a directory.

On the other end, publishing, a general notion of sequences was introduced as well.

With a traditional representation, the outputted file was produced by the host application during extraction and stored as a string, the temporary filename given to it till it reaches integration and is aligned with the path template relative your project. Once extracted, the integrator simply takes this name into account and copies (actually hardlinks, more on this below) the file from its temporary location onto the public accounted immutable location.

With sequences, a branch was added in the code to consider entries of type list as a series of files - a sequence - and hardlinks each individual member of this sequence into the public location whilst only entering the single sequence into the database to conform with the overall object model.

As detailed in last week’s post, publishing of rendered sequences is a two-tier process, the first being a submission to a job scheduler and the second publishing the fruits of this submission. In order to identify sequences amongst one or more series of files, the open source Python library clique is used.

Alternative

Another way of managing sequences is the way Clique sees them. That is, when encountering multiple names following a particular pattern, e.g. myfile.1000.png where 1000 increments with one per file, it could convert it to a single name containing a variable - e.g. myfile.%d.png.

This way, we can replace the folder acting as as representation and store this variable in the name of the subset.

subset=myfile.%d
representation=png

The same property applies where we’d know whether a representation is a sequence or singular by the presence of %d as we would by looking at whether it is a file or folder.

Hardlink

Another low level but significant update to be made this week is related to performance and disk optimisation.

Prior to this week, each publish involved two phases; (1) extraction is the act of serialising data from an application to disk and (2) once written into a temporary location to copy this data into the final location where it is then reach by other artists. Note that there are two write operations here, one from the software and another a copy, resulting in two identical representations on disk and thus twice the required disk space.

This week, in collaboration with @tokejepsen, I replaced the copy operation with hardlinking.

A hardlink is in many ways identical to a copy, except it occupies no space and takes an equal amount of time to make regardless of file size, instant.

How?

To answer that I’ll first have to digress for a moment into the world of file system mechanics.

To a file system, all data is a linear series of ones and zeroes. A range of ones and zeroes - e.g. bytes 2034 to 3552 - indicate what you know as a file. A file then is merely a shortcut to this range.

What’s more, some file systems are designed in such are way that they can keep track of how many references there are to any given range. For example, if there are two files referencing the same range then this range is said to have a “reference count” of 2. To you, these files are duplicates. They may have different names, but looking inside either of them reveals the same information.

Here is the interesting bit. Yes, reading either file yields the same results, but so does editing either file. In practice what is happening is that you are merely editing this single range on disk, using two references to it. What’s most interesting is that these two references don’t actually occupy any additional space*.

They do occupy a infinitesimal amount of space to keep track of the range itself, along with a few other tidbits that we won’t get into, such as modified date and author.

To us, in this particular case, this seemingly insignificant property of a deceptively plain and uninteresting file system yields a 50% reduction of disk space used and a 10-50% reduction in time spent publishing for any and all assets produced at Mindbender (taking into account that most time is spent actually generating the data, not writing it). A property most helpful with many sequences of large files, such as renders.

Task Distribution

With rendering in place, we’ve got a pipeline for distributing arbitrary jobs to a farm of computers. That enables us to start looking into submitting more than just renders, such as caches and dailies.

Rendering today relies on a particular property at Mindbender that will not always be true at other companies nor Mindbender some time in the future, which is that the workstation from which a submission is made is virtually identical to the worked being assigned the task. That is to say, each artist is submitting jobs to and from workstations located within the physical office of Mindbender in Sweden, a workstation running Microsoft Windows with access to a common network and disk.

This assumption enables us to easily and accurately replicate the environment from the submitting machine to a remote worker machine. But it doesn’t fully account for every scenario, the most pressing one being jobs submitted to computers outside the network or to a different operating system, such as Linux. Furthermore, if we are to facilitate jobs of a different nature than renders, than odds are we will want to take advantage of some of the benefits of Linux, headless workstations and relevant technology such as Docker. Docker would enable virtually unlimited parallelisation of tasks small and large without the overhead of a red vantage dedicated machine per worker, but would be limited to running Linux.

We will also want to facilitate jobs submitted to and from outside the local network, such as a dailies submission made from one of our artists in Brazil.

Each of these break the above assumption and require further thought but once settled would enable us to submut virtually any kind if job to a farm, freeing up local resources and reducing iteration times. Once we get there, we can start talking about what the interface for submitting arbitrary jobs should look like and how artists are to interact with it.

For that, here’s what I’ve got in mind.

At the moment, publishing any kind of asset is a matter of hitting publish. Submitting a render to Deadline aligns with that as does publishing the resulting image sequence. But where things tear is when we want to enable an artist to either publish locally or submit for remote publishing.

To solve this, I’ll implement a dedicated option in addition to the menu item “Publish” called something along the lines of “Submit”. With it, publishing would be submitted as opposed to made directly.

That way, we reap a few benefits right ofd the bat.

The workflow remains the same
We can validate both local and remote publishes via the same mechanism
Any kind of job can be run either locally or remote via the same interface.

Technically, this would involve dynamically managing the registered host to include/exclude a keyword e.g. remote. This keyword would expose relevant plugins for submission to remote publishing via the standard Pyblish host mechanism.

Next Week

The pipeline is in a solid enough state to account for all projects in the foreseeable future, expected 6-12 months, at which point we can re-evaluate where we are at and where to go next.

Next week I’ll start winding things down and prepare for less involved oversight of the pipeline as I transition into other projects. My expectation is for the pipeline to continue to evolve over time and intend to be there as things progress. I’ll be on Slack indefinitely for support and will implement and manage various features as they become relevant.

What is most important to you is that you continue to test the pipeline with all scenarios you yourself think will become relevant over the course of the coming year.