Machine Learning: The Hard Parts (ok fine — A Hard Part)
It’s a rite of passage in the ML world to get entangled in the hair-tearing depths of dependency management. And I have a feeling that I’m not the only one on this boat.
Raise your hands if you have ever shared some awesome ML code with someone, only to have them tell us something along the lines of — “I’m getting a ModuleNotFound
error”.
For me, this became history in my previous project with a command-line tool that we used — 🎉🎉🎉 batect
🎉🎉🎉! In this article, I’ll show you how batect
can simplify life and improve developer productivity. I’ll go in this order:
- Why should we care about
batect
? - What is
batect
and what does it do? (in a nutshell) - How can I use it in my project? (code-along time!)
To illustrate my point, our code-along exercise will get you to train a deep learning model using tensorflow.keras
and deploy the model to Google Cloud Platform (GCP). (Credits: GCP “hello-keras” tutorial)
1. Why should we care about batect
?
Have you ever:
- Cloned a really cool repo doing some really cool ML, hoping to try it out, only to get stuck even before you start because of something like
error: command ‘gcc’ failed with exit status 1
? - Accidentally installed pandas in your OS-level python because you forgot to activate a virtual environment?
- Spent more than a day getting set up in a new project?
A few years ago, I was on a project where I took one whole week just to get the project set up — I spent so many hours figuring out how to uninstall whatever version of Python and Node.js I had so that I can install some specific version of Python (I think it was 3.6.8) and Node.js on my mac, which the project required.
We eventually optimised the whole process using some clever shell scripts, and got setup down to 1 day. It was all dandy, until one day, a Windows user joined our team. He was the first on our team to use Windows, and we spent many days solving this problem again to get everything set up on his computer. (You could flip the Mac and Windows users in this equation and the story will still ring true)
Imagine this alternate reality — You’re onboarding a new teammate, and your instructions to him/her is simply to “check out the repo, run ./go, and you’re done”.
batect
gives your team this capability, as I hope to show you in the remainder of this article.
2. batect
: a command-line tool that lets you easily dockerise any task (e.g. training an ML model, or running unit tests)
batect
lets you define your development tasks (testing, linting, training a model, etc.) and it’ll run the tasks within Docker containers that you specify. Because it runs your task in a Docker container, it doesn’t need the host machine (i.e. the computer that you’re using to run the task) to have any other dependency other than Docker (and Java, which is needed by batect).
This gives you three huge benefits:
- Time saved. If it works on one computer, it will work on any other computer (e.g. your teammates’ machine, or on your continuous integration (CI) pipeline)
- Just Works™ on MacOS, Windows and Linux
- Simplified dev setup. Even if you’re a shell scripting guru — this means you no longer need to figure out how to write a setup script that works on 3 operating systems for every. single. project.
3. How do I use it? (It’s code-along time!)
Let’s see batect in action! Feel free to code along and feel batect’s awesomeness for yourself 😎
1. Checkout code and setup
git clone https://github.com/davified/batect-tensorflow-demo# install dependencies needed by batect (i.e. Docker and Java):
bin/go.sh
2. Install dependencies for our ML project using batect
# install dependencies
./batect setup
setup
is a custom batect task that I defined in batect.yml
(see YAML snippet below). Essentially, it runs bin/setup.sh
(a script that I wrote for this project to pip install
our dependencies) within a docker container which has all the dependencies needed by tensorflow (e.g. python3
, gcc
, g++
, etc.).
# batect.ymlcontainers:
python_tensorflow: # this can be named anything
image: tensorflow/tensorflow:2.1.0-py3
working_directory: /code
volumes:
- local: .
container: /codetasks:
setup:
description: Setup and install dependencies
run:
container: python_tensorflow
command: bin/setup.sh
3. Train ML model
./batect train_model
Let’s take a look at the definition of the train_model
task:
# batect.ymlcontainers:
// ... (same as before)tasks:
train_model:
description: Train ML model
run:
container: python_tensorflow
command: bin/train-model.sh
Because this is a tutorial about batect, I want to highlight what just happened:
- We ran :
./batect train_model
- In
batect.yml
:train_model
is defined as → Runbin/train-model.sh
in thetensorflow/tensorflow:2.1.0-py3
Docker container bin/train-model.sh
: Activate the virtual environment (created in./batect setup
) and runpython -m src.trainer.task <...args>
(this trains a deep learning model and saves it to thelocal-training-output/
directory)
4. Run unit tests
./batect unit_test
Let’s make this a mini-exercise: Can you trace through unit_test
in batect.yml
and figure out how it did what it did?
# batect.ymlcontainers:
// ... (same as before)tasks:
unit_test:
description: Run unit tests
run:
container: python_tensorflow
command: bin/unit-test.sh
At this point, I just want to highlight that we just trained a deep learning model using tensorflow, and ran automated tests for it, without growing a single grey hair over messy OS-level dependencies (e.g. gcc
, g++
, on MacOS, Windows, Linux, etc.)!
5. Let’s deploy this bad boy!
If you’re coding along for this bit, you’ll need to create a GCP project and download the credentials for deploying resources to your GCP project (instructions here)
Now your palms might be getting sweaty, thinking about deployment, installing gcloud CLI and figuring out all the messy details about deployments. Now, let’s see how deployments could look like with batect
# train model on GCP
./batect train_model_on_gcloud# wait for model training job to complete: https://console.cloud.google.com/ai-platform/dashboard?project=YOUR_PROJECT_ID# deploy model version
JOB_NAME='REPLACE_ME_WITH_JOB_NAME' ./batect deploy_model# you can find the JOB_NAME in the terminal output of the ./batect train_model_on_gcloud command
# TODO: streamline this bit?
Notice a few things:
- You didn’t need to install
gcloud
to callgcloud
commands inbin/deploy-model.sh
(because we told batect to use thegcloud
container to run that) - The complexity of deploying an ML model to GCP AI Platform is encapsulated in
bin/deploy-model.sh
. You’ve solved this problem once, and future teammates will never need to worry about it again. They (or most probably, your team’s CI pipeline) will simply need to run./batect deploy_model
to deploy a new ML model
And as a bonus, because batect
encapsulates the execution runtime using Docker, we won’t need to spend time configuring our CI agents (probably a Linux machine) to have the same dependencies as our machine (probably a MacOS or Windows machine). You can see it for yourself — the CI pipeline for this project just calls batect
and didn’t need to install any project-level dependencies (not even python3
!)
In a nutshell — batect
— works on my machine, and also everywhere else (as long as the machine has Docker and Java). Woohoo! 🎉🎉🎉
That’s all folks!
In less than an hour, we’ve set up our computer for development, ran unit tests, trained a deep learning model using tensorflow and deployed it to GCP. And we did it without any of the pain that I described at the start of this article. (Time for a drink / donut!)
In my view, batect
enhanced my developer experience and the productivity of my team. This tool made it a lot easier for us to practice agile software practices, such as continuous integration and continuous delivery), and I’ll definitely use it for my next project 😎
- I hope you’ve enjoyed this article. I’d love to hear your thoughts and feedback! Let me know by creating issues on the GitHub project or by leaving a comment here. Cheers!
- Want to use it your project? Check out batect’s Getting Started guide, or get some inspiration from how I set up a project template from scratch
- You love everything here except the overkill with tensorflow? Check out batect-ml-template !