Deep Learning – from Prototype to Production

Erez Kashi

Erez Kashi

Erez is a large scale and real time Machine Learning engineer.

Erez Kashi | 30 Jul 2017 | Big Data

Tags: deep learning, docker, Nvidia, python, tensorflow

About 8 months ago my team and I were facing the challenge of building our first Deep Learning infrastructure. One of my team members (a brilliant data scientist) was working on a prototype for our first deep model. The time arrived to move forward to production. I was honored to lead this effort.

Our achievements: we built an infrastructure that ranks over 600K items/sec, our deep models have beaten the previous models by a large margin. This pioneer project has led the way for the subsequent Deep Learning projects at Taboola.

So the prototype was ready, and I was wondering: how to go from a messy script to a production ready framework? In other words, if you are into establishing a deep model pipeline this post is for you.

This blog post is focused on the training infrastructure, without  the inference infrastructure.


Assume you have basic knowledge in:

  • Python
  • Train Machine Learning model
  • TensorFlow
  • Docker

Let’s take one step back and think…Ideally such a training infrastructure should support:

  • Comfortable development environment
  • Portability among environments
  • Run on CPU/GPU
  • Simple & Flexible configuration
  • Support Multiple Network Architectures
  • Packaging the model
  • Hyper Parameter Tuning

One mantra to keep in mind:

‘Speedy Gonzales Time to Market’ == ‘no over engineering!’ == ‘Keep it simple’

Enough with the baloney, let’s get down to business:

Technology stack

BigQuery, Google Storage (GS), Python, Pandas, TensorFlow, Docker, and some other Python nice libraries: fabric, docker-py, pyformance & spacy

Comfortable development environment

This makes the whole team move faster, plus putting a new team member into action is super easy. To achieve this we did 2 things:

The first was using a small dataset for sanity training, a JSON file with 2K rows. Used both for testing on a developer machine and for sanity automation tests. This is very helpful for quick testing your code changes.

The second is a project management script (just one big Python script).

We implemented a simple command line tool using ‘fabric’ (Python library). Here are few example commands:

Using docker-py library was very useful for the Docker stuff.

CI tasks were also added to this script (used by Jenkins build agent).

High level flow

high level flow

Dataset is generated from raw data tables in BigQuery and stored as a JSON file in GS.

Then the training process kicks in: download the dataset, train the model, and export the model back to GS (to some other path).

Fit dataset into memory

Not always possible, but it was for us. Our dataset contains about 5M rows (~ 4GB JSON file). The JSON file is loaded into pandas’ DataFrame for pre-processing, then split into train & test. This approach is simple to implement, gaining fast training run time. If the dataset does not fit into the memory, a quick (maybe temporary) solution could be subsampling it.

Portability among environments

No more “It works on my machine”! By using Docker, training runs the same on the developer machine, automation and cloud environments. Training input & output is stored in GS, no matter where the training process ran.

Running on CPU/GPU

  • Install both TensorFlow GPU and CPU versions (in the project management script: use the GPU for production training and the CPU for development and testing)
  • Use NVIDIA Docker base image – note that the CUDA version should match the TensorFlow version (see more instruction on TensorFlow’s  website)

  • Use nvidia-docker command line tool – a thin wrapper for Docker command line. It enables the Docker container to use the GPU

Simple & flexible configuration

Flexible for research, live models & hyper parameter tuning. Configuration should be simple to reason – I have seen too many configuration mechanisms get out of hand. I think we found a fair trade off between flexibility and simplicity:

First, no hierarchies! (keep it as simple as possible).

For example:

Hierarchies are Bad:

Flat is Good:

Config values are set in 3 levels:

Level 1 – NetConfig class:

As you can see default values are hard coded as part of the configuration class.

Level 2 – variant_config.json:

Here we define our model variants. Only overriding non default configurations.

The configuration file is held with the code, and baked into the Docker image. A new Docker image is built for every configuration change.

Level 3 – environment variable:

Using environment variable enables overriding a specific configuration for a single training run. We use it for research and hyper parameter tuning.

Support Multiple Network Architectures

training flow image

Supporting multiple network architectures is a key concept for us. It enables many team members to work in parallel on different architectures (and set A/B tests on live traffic).

So how do we support multi network architectures? Let’s take a closer look into ‘build network graph’ step:

‘build_graph_deep_v1’ & ‘build_graph_rnn_v1’ are just static methods that contain  the TensorFlow code for defining the graph.

‘build_model_graph’ returns an OpsWrapper instance, holding the TensorFlow ops for the training loop:

Note that the ‘build_model_graph’ has 3 inputs: net_config (discussed earlier), dataset_shapes and tf_session. The dataset_shapes object is created during the preprocessing step. It contains features vector information that is used to build the network graph. For instance, the cardinality of a categorical feature is used to define the size of the embedding lookup table.

The tf_session in just an instance of tf.Session class.

Packaging the Model

Packaging the model TensorFlow files inside a zip with other extra files, can help debug and monitor the training process. What extra files did we store?

  • TensorBoard files
  • Graph image of train & test offline metrics
  • Offline metrics CSV:
  •            >  row for each epoch
  •            >  add all kinds of offline metrics, such as train & test error & loss
  • DeFacto configuration – used to train the model
  • Dataset metadata – for example dataset size
  • Other things you may find useful for troubleshooting & debugging

Hyper parameter tuning

Hyper parameter tuning is critical. A big improvement on model performance can be gained by tuning parameters such as, batch size, learning rate, embedding dimension, hidden layer depth, width, and so on….

We started by implementing a simple Python script that runs a random search on the hyper parameter space. It runs sequentially using a single GPU. We just ran it for days. To get faster and better results we moved to the cloud, running in parallel on multiple machines & GPUs.

All-in-all we had lots of fun working on this project (at least I did). We ended up with a relatively simple and straightforward infrastructure, being used by the whole team, raising somewhere between 10-15 live A/B tests in parallel.

That is it! I hope you find it useful.

Special thanks to my team: Yedid, Aviv, Dan, Stavros, Efrat and Yoel.