# Week 7

## May 7, 2017

Last week, I introduced Launcher and the notion of an “inventory” with which you would define the contents of your project, including metadata.

This week, we’ll continue in this direction and have this file define your project in a database, and talk pros and cons in doing that. I’ll also talk about the changes to Loader and where its headed.

# Asset Database

With the advent of a database, the organisation of data has seen an upgrade.

To the artist, the workflow remains the same but to future development the horizon has significantly widened.

Multiple locations

As far as tools are concerned, there is only ever one location for all data - the database. Even though files are sent across continents, all artists use tools that communicate with a single source of truth. This is what will enable browsing of material not locally available and the automatic download of it. Even from home, given the right login information. Will get to this!

Complex Search

We are now able store an infinite amount of metadata per asset and make complex search queries on it, with little or no cost to performance. This will have an immediate effect on most tools - Launcher, Loader and Manager - and open the floodgates to upcoming functionality including (1) shotbuilding and (2) tracking the use of assets in other assets.

### What is a database?

Think of it like another filesystem. Like another mapped drive, N:\. Only on this drive, files are much smaller and when you search for and open files it is able to make optimisations based on how often you access a particular set or type of data.

In practice, every file on M:\ will have a corresponding entry on N:\.

On performance, the filesystem excels at small amounts of large data, whereas the database excels at large amounts of small data. Large data, such as an animation cache (10-10,000 MB). Small data, such as the names of all projects for an application like Launcher, or the latest version of a particular asset for tools like Manager (0.001-1 KB).

Before

In practice, what this means is that whenever a request for small data was made, artists working with large data suffered a performance hit, and vice versa. It also limited the scope of how much data a tool could effectively request and make on its own.

After

With this new structure, tools operate separately from applications like Maya, resulting in an immediate performance increase, but more importantly a wider scope of what tools may accomplish.

### What has changed?

Before, all data was located in one location - the filesystem. Now, data is split between the filesystem and a separate database.

Technical details aside, what matters to you is that it (1) enables searches of (2) complex queries.

Currently, searching is done through a carefully designed directory layout. With it, you are able to easily locate assets for a given project, or versions for a published subset.

But even though data is easily found this way, the performance cost has already made itself known via both the Loader and Manager. Both of which repeatedly make even the simplest of queries - listing a directory - and yet still take a noticeable time to accomplish.

As we progress, the requirement on search will continue to expand. The immediate use for this performance gain lies in the user interfaces we build. Being able to instantly gain access to all thumbnails for every project, without bogging down the system for those looking to gain access to files, results not only in more appealing interfaces, but in a new generation of tools simply not possible before.

On complexity and performance, consider making a search for everything that match the following criteria.

• It is an asset
• It is a character
• It belongs to project A
• It has at least one version
• The one version has at least one representation
• The representation is compatible with Maya
• The asset is used on at least 5 shots
• The asset was made less then 5 days ago
• The asset has at least one comment with the word “Awesome” in it

A complex but not unreasonable query. Searching the filesystem for this criteria could take hours, if not days on sufficiently complex projects, not to mention that it burdens the very same mechanism artists use to store and retrieve files, causing delays for them as you search, and for your search as they load or save.

The data stored in the database as of this moment is:

• The data previously stored in the .metadata.json files of published assets.
• Project and asset metadata, such as start and end frame of a given shot.

### What is the cost?

Of equal importance to the benefits of using a database is the cost. What are we giving up in order to gain these seemingly brilliant advantages, why didn’t we do it from day one and why isn’t everyone doing it?

The answer lies in complexity. Here are some new things to worry about.

• Files and metadata can go out of sync
• Need dedicated server
• Another thing to backup

In addition, when things goes wrong with a file, odds are past experience would guide you to a solution. Problems in the database mean you require someone able to deal with that.

### Compatibility

Current and past projects will work with this future workflow, but future projects will not work with the past version of the pipeline.

The creation of new projects has been simplified, with one central location for sculpting your project (the inventory) as opposed to individual configuration (.bat) files per asset. Current and past on the other hand projects require migration.

# Migration

With assets stowed away in a database, interacting with it is no longer a matter of browsing files and folders.

Until there is a more friendly desktop- or web-interface, the .inventory.toml and .config.toml files will act as a bridge between assets and the mysterious database.

The workflow goes like this.

1. Load .inventory.toml and .config.toml from database
2. Modify
3. Save

Existing projects must be migrated, here is how the workflow looks for it.

1. Extract project.json
3. Repeat workflow for .inventory.toml and .config.toml

The extraction produces all metadata from your existing assets on disk and prepares it for upload to the database. The --upload command then uploads this file.

Prior to uploading, you may want to inspect the generated project.json file and potentially append to it. Alternatively, you can complete the upload and generate a standard issue .inventory.toml and .config.toml fom it afterwards.

See details on each of the two workflows below.

### Migrating an existing project

Once uploaded, you’ll be able to continue adding data by working normally, and reap the benefits added by the database.

mindbender.inventory

I’ve prepared a tool to help ease the migration.

It will automatically scan an existing project and upload relevant data to the database, without affecting the original data. Not affecting the original data means it us safe to run on an existing project and at any point return to using the previous mb.bat workflow.

I expect the initial projects to reveal any bugs that need squashing where having the ability to revert back to what works is important.

mb-shell.bat

From the new mb-shell.bat found in the mindbender-setup project, type the following.

$cd my/existing/project$ python -m mindbender.inventory --extract --silo-prefix f02_prod/
$python -m mindbender.inventory --upload  The process should take a minute or two and once finished you’ll be able to launch applications, load and publish assets ambles per usual. Only this time data is (transparently) split between files and database entries. IMPORTANT: If at any point you find that the updated pipeline is preventing you from accomplishing the tasks you would be able to accomplish using the original workflow, let me know and continue using mb.bat till the problem has been squashed. ### Managing projects Last week I introduces the inventory and the workflow remains the same, only this time you load and save your configuration to the database. If you haven’t created an .inventory.toml and .config.toml for your project yet, go ahead and load a default one from the database. IMPORTANT: The below will overwrite your existing .inventory.toml and .config.toml files. mb-shell.bat From mb-shell.bat, type the following. $ cd my/existing/project
$python -m mindbender.inventory --load  This command will create the .inventory.toml and .config.toml files for the project whose directory you are currently in. .inventory.toml schema = "mindbender-core:inventory-1.0" [assets.Bruce] label = "Bruce Wayne" [film.1000] edit_in = 101 edit_out = 321 [film.2000] edit_in = 101 edit_out = 182  Go ahead and edit this file to your liking, adding and removing assets as you please, along with adding metadata to it. .config.toml schema = "mindbender-core:config-1.0" [metadata] name = "hulk" fps = 25 resolution_width = 1920 resolution_height = 1080 label = "The Hulk" [apps.maya2016] label = "Autodesk Maya 2016" [apps.nuke10] label = "The Foundry Nuke 10.0" [tasks.modeling] [tasks.animation] [tasks.rigging] [tasks.lookdev] [tasks.layout] [template] work = "{root}/{project}/{silo}/{asset}/work/{task}/{user}/{app}" publish = "{root}/{project}/{silo}/{asset}/publish/{subset}/v{version:0>3}/{subset}.{representation}"  Once you are happy with your changes, go ahead and save it. $ python -m mindbender.inventory --save


This will take the changes you made and update the database accordingly.

# Dependencies

Until now, all dependencies have been pure Python and cross-compatible with all operating systems and Python 2 and 3.

I’ll go great lengths to delay having to walk this path for as long as possible, but just to give you (and myself) a taste of the inevitable future as we scale, here’s the gist of the cliff that lies ahead.

With the inclusion of MongoDB, we now have our first dependency that no longer fit into this narrow requirement.

• PyMongo contains a (optional) compiled extension
• PyMongo cannot be vendored, as it refers to itself via the global namespace, e.g. from pymongo import ...
• PyMongo has dependencies in the global scope, one of which contains an (optional) compiled extension. This makes it impossible to vendor without altering the library.

On top of this, in addition to being dependent on application suites like Autodesk Maya, we now introduce MongoDB which requires not only a compiled version per OS, but also a dedicated server with guaranteed up-time.

For these reasons, we say welcome to the world of dependency management.

Dependency Type Examples
Application Maya, Nuke, MongoDB
Library PyMongo, PyQt5

### Applications

Application-type dependencies are currently managed manually, either by installing the software by hand on each computer of by writing a pre-installed image to disk.

As we scale, the overhead of this approach will likely overcome the gain. This is where external tooling - known as IT Automation - comes into play.

• https://ansible.com
• https://chef.io
• https://puppet.com

Listed in no particular order, what these do is automate the installation and updating of software across a (large) network of computers. This is what companies such as Framestore and friends use to update 500+ workstations and ensuring the have an identical set-up.

### Libraries

Software libraries, in addition to requiring all of what applications do, often also depend the application in which they are used, in addition to also depend on one another. This increases complexity by two orders of magnitude, because now you need an individual library for every application.

• OS
• Application
• Library
• Dependency

For example, for PySide to run everywhere, let’s say you have 3 operating systems, 3 applications, 3 versions of each application and 3 libraries that depend on different versions of PySide.

3 * 3 * 3 * 3 * PySide = 81 individual versions


That’s right, to fulfill the requirements of this environment, we’ll need to download and compile 81 individual versions of PySide, which means we’ll need a dozen individual versions of various compilers for the operating systems and of course, booting and actually running the compilation in the given operating system because compiling for Linux from Windows isn’t hip.

Now multiply that for every dependency, in this case PyMongo and PyQt5.

Requirements

• Each dependency MUST run on any OS
• Each dependency MUST run on any Application
• Each dependency MUST run with any dependent library

Compiled dependencies require compilation for every OS, multiplied by the number of applications used. For example, in order to use PyMongo with Maya on Windows, it must be explicitly compiled for it. And then another is compiled for Maya on Linux, and Nuke on Windows and so forth.

Our options

• pip
• anaconda
• ecosystem
• rez
• roll our own

Each of these have their own pros/cons, the first candidate I’ll try within the coming weeks is Rez.

### Delay

For this pipeline, I’ve carefully chosen to depend on libraries without the requirement on compiled extensions, with a few exceptions.

• PyQt5, we’ll need this for every OS, but only for standalone use with no inter-dependency on other libraries. Which means we’ll need 3 versions, maximum.
• PyMongo, these offer compiled extensions, but they are optional and included for performance. I assume performance without this extension is will be acceptable and expect a seamless transition for both of these once we introduce proper dependency management.
• Mongo Server, We’ll need this on one OS, on a single computer, of a single version. With IT automation in place, I expect another seamless transition between versions that won’t affect surrounding tooling.

### Choosing a Database

There are quite a few choices out there with regards to hosting lots of small data, like the start and end frame of a shot or frames per second of a project.

Picking the right one is challenging and can take time. Knowing which one fits best I think is accomplished by first using it, and that’s the part that takes time. The hard bit is working through the kinks and keeping and eye on what the alternatives does better (or worse).

Based on ease of use, I’ve chosen to try MongoDB first.

database.insert_one({
"type": "project",
"name": "Hulk"
})
database.insert_one({
"type": "Asset",
"name": "Bruce",
"parent": "id1"
})

# Find a document where "type" is "asset", and "parent" is "id1"
asset = database.find_one({"type": "asset"})
assert asset["name"] == "Bruce"


The above is the actual syntax for interacting with this particular database, simple.

Here are a few other considerations I made, relative my impression of MongoDB.

Database Notes
MongoDB Document database, prevalent and well documented. It isn’t perfect..
OrientDB Graph/document database, with events, but less known and less documented.
Elasticsearch Relational database, free database but authentication (x-pack) is a paid addon.
RethinkDB Like Mongo, but better and with support for events. Alas, it was recently discontinued..
Postgres Relational database, tried and true. Used by ftrack at one point.
Node4j Graph database, possibly a good fit for the hierarchical data (project/asset/subset/version), but a little frighteningly new and also not free.