Deploying a Containerized Streamlit Application in 13(ish) Steps

One of the biggest challenges in data science is deploying a machine learning (ml) application to production. This phase is often complicated by the fact that the code base is sensitive to the operating system it’s running on, and that there can be a number of other complex dependencies needed for a machine learning application to run. Enter docker.

Docker in a Nutshell

Docker is used to package other software into standardized units, called containers, for development, shipment and deployment. These containers differ from the more familiar virtual machine (VM) in that containers are an abstraction at the application layer and thus share the same host operating system (OS).

Comparison of Containers (Right) vs Virtual Machines (Left). Credit: Docker Documentation

As a result, containers are much more lightweight and portable while maintaining interoperability across platforms. The blueprint for a container is called a docker image. When a docker image runs, a container is created. Many files constitute a docker image, but chief among them is the Dockerfile. From the documentation:

A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.

To summarize:

The relationship between a docker file, docker image, and docker container. Credit: JFrog

Creating a containerized ml application alleviates one of the main headaches in the deployment process.

Containerizing a Streamlit Application

To explore how to use docker, we’ll first need a use-case. In a previous project, I created a ML-powered Streamlit application that, upon user input, provides the usage summary, churn likelihood, and drivers of churn for a single customer:

A churn prediction application for the fictitious digital media company, Sparkify

I’ve wanted to explore deploying this dashboard for a while and thought this would be an excellent opportunity to learn docker.

The technical details of the project don’t concern us here, we’re only interested in containerizing and deploying the Streamlit dashboard. However, if you’re interested in Spark, Streamlit, SHAP, or Azure Databricks, there is a detailed README containing the project summary, analysis, and results. The Github is available here.

Steps:

  1. Install docker.
  2. Clone the Github repository locally and cd into the root of the project folder:
git clone https://github.com/smit5490/sparkify-customer-retention.git

3. Review the Dockerfile in the repository:

Dockerfile to containerize Streamlit application

Luckily, it’s relatively straightforward to deploy a Streamlit dashboard using docker. Let’s walk through each line in the Dockerfile:

FROM python:3.8

The FROM instruction initializes a new build stage and sets the Base Image for subsequent instructions. A valid Dockerfile must start with a FROM instruction. In our case, we are using the official Python v3.8 image as our base image from docker hub.

COPY . /app

The COPY instruction copies new files or directories from <src> and adds them to the filesystem of the container at the path <dest>. In our case, we’re copying the local git repository contents to the app folder in the container.

WORKDIR /app

The WORKDIR instruction sets the working directory for any RUNCMDENTRYPOINTCOPY and ADD instructions that follow it in the Dockerfile. In our case, we’re setting our working directory to be our app folder we just created and populated.

RUN pip3 install -r docker_requirements.txt

The RUN instruction will execute any commands in a new layer on top of the current image and commit the results. The resulting committed image will be used for the next step in the Dockerfile. Here, we’re installing the Streamlit dashboard’s Python package dependencies.

EXPOSE 8501

The EXPOSE instruction informs Docker that the container listens on the specified network ports at runtime. When running our container locally, we’ll use port 8501. When deploying this application, Heroku uses its own port which can change from run to run. A PORT environmental variable is available at run time to route applications appropriately.

CMD streamlit run --server.port $PORT app.py

The main purpose of a CMD is to provide defaults for an executing container. There are many ways to run CMD. In our case, we’re using the shell form. You can read more about this command here. Here, we’re telling the container to run the app.py Streamlit application on a specified PORT (per the environmental variable).

There are other docker-related files in the repository including a .dockerignore which, like a .gitignore file, ignores the specified files and folders from being copied over into the image. Excluding any unnecessary files will reduce the image size and increase portability.

4. Build an image. On the command line (in the project’s root directory), type:

docker build -t sparkifyapp:latest -f Dockerfile .

This command tells docker to build an image with a name:tag (-t) of sparkifyapp:latest using the file (-f) Dockerfile in the working directory (.) It might take a few minutes for the image to be created:

Output after building an image.

5. Run the image:
docker run -p 8501:8501 -e PORT=8501 sparkifyapp:latest 

This command tells docker to run the sparkifyapp:latest image, expose port (-p) 8501, and create an environmental variable (-e), PORT, and set it to 8501.

6. Open the Streamlit application by navigating to http://localhost:8501 on any web browser.

A churn prediction application for the fictitious digital media company, Sparkify

To get a list of running containers type docker ps. To stop a container type docker stop NAME. The NAME can be found in the container list.

It’s great that we were able to build and run a containerized application locally, but we should also deploy this container somewhere else. An easy place to test things out is Heroku.

Deploying to Heroku

Heroku lets you deploy, run and manage applications written in Ruby, Node.js, Java, Python, Clojure, Scala, Go and PHP.

7. Sign-up for Heroku. There is a free account option which allows users to spin up small applications and is sufficient for many simple proof-of-concepts and experimentation.

8. Download the Heroku CLI. I used this on a Mac, so I installed via homebrew:

brew tap heroku/brew && brew install heroku

Login to Heroku:

heroku login

9. Create a Heroku Application

heroku create <app_name>

The application name must be globally unique (to Heroku) and should be the same name as your image. It will also be used as part of the url to the streamlit dashboard:

https://<app_name>.herokuapp.com

10. Login to the Container Registry:

heroku container:login

In order to deploy a containerized application on Heroku, we must register our container with Heroku’s Container Registry.

11. Push docker image to Heroku’s Container Registry. First tag the docker image and push it to the registry with this naming template:

docker tag sparkifyapp:latest registry.heroku.com/<app_name>/web
docker push registry.heroku.com/<app_name>/web

There’s one major quirk at this step that I had as a result of using a Mac with Apple silicon; the docker image didn’t compile correctly without specifying what platform to use when building the image. Stack Overflow came to the rescue and provided some guidance on how to re-build the image before tagging and pushing to the registry:

docker build --platform=linux/amd64 -t sparkifyapp:latest -f Dockerfile .

After re-building the docker image with the platform specified, I was able to push it to the registry. Depending on your internet connection, this might take a while. It’s about a 2GB image that gets pushed to the registry.

12. Release container to the web.

heroku container:release web

13. Access the deployed dashboard. Navigate to the web url or type heroku open. My application can be viewed at: https://sparkifyapp.herokuapp.com Sometimes it might take a minute for the application to render; Heroku’s compute instances need a little time to spin up the container.

And that’s it! In 13(-ish) steps, it’s possible to deploy a containerized data science application on Heroku.

Additional Readings:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: