Building a Docker Container Image – CHTC

Linux containers are a way to create a self-contained environment that includes software, libraries, and other tools. CHTC currently supports running jobs inside Docker containers. This guide describes how to create a Docker image that you can use to run jobs in CHTC. For information on using this image for jobs, see our Docker jobs guide.

Please note that all the steps below should be executed on your own computer, not on CHTC.

Docker images can be created using a special file format called “Dockerfile”. This file has commands that allow you to:

use a

  • pre-existing Docker image as a basis
  • Add files
  • to the image Run

  • installation commands
  • Set environment variables

You can then “create

” an image from this file, test it locally, and send it to DockerHub, where HTCondor can use the image to create containers in which to run jobs. Different versions of the image can be tagged with different version “tags”.

This guide has:

  1. Step-by-step instructions
  2. Examples

1. Configure

Docker on your computer

If you haven’t already, create a DockerHub account and install Docker on your computer. You’ll want to look for Docker Community Edition for your operating system. Sometimes Docker takes some time to start, especially the first time. Once Docker starts, it won’t open a window; You will only see a small whale and container icon on one of your computer’s toolbars. To use Docker, you’ll need to open a command-line program (such as Terminal or Command Prompt) and run commands there.

2. Explore Docker containers

(optional)

If you’ve never used Docker before, we recommend exploring a pre-existing container and testing the installation steps interactively before creating a Dockerfile. See the first half of this guide: Exploring and testing a Docker 3 container

. Create

a

Dockerfile A Dockerfile is a plain text file with keywords that add items to a Docker image. There are many keywords that can be used in a Dockerfile (documented on the Docker website here: Dockerfile keywords), but we’ll use a subset of these keywords following this basic schema

: Starting point:

  • Which Docker image do you want to start with
  • ?

  • Addendums: What needs to be added? Folders? Data? Other software?
  • Environment: What variables (if any) are set as part of the software installation?

Create

the

file

Create a blank text file named Dockerfile. If you plan to create multiple images for different parts of your workflow, you should create a separate folder for each new image with a Dockerfile inside each of them.

Choose a base image with FROM

You usually don’t want to start building your image from scratch. Instead, you’ll want to choose a “base” image to add things to.

You can find a base image by searching DockerHub. If you are using a scripting language such as Python, R, or perl, you can start with the “official” image of these languages. If you’re not sure what to start with, using a basic Linux image (Debian, Ubuntu, and CentOS are common examples) is often a good place to start.

Images often have labeled versions. In addition to choosing the image you want, make sure you choose a version by clicking on the “Tags” tab of the image.

Once you’ve decided on a base image and version, add it as the first line of your Dockerfile, like this:

FROM repository/image:tag Some images are

maintained by DockerHub itself (these are the “official” images mentioned above) and do not have a repository. For example, to get started with Centos

7, you can use FROM centos:7 while from one of the images of

HTC’s HTC Jupyter laptop from HTC could look like

FROM htcondor/htc-minimal-notebook: 2019-12-02

When possible, you should use a specific tag (not the last autotag) in the FROM statements.

Here are some base images that you may find useful for building:

  • Centos
  • Ubuntu
  • Python / Anaconda / Miniconda
  • R / Tidyverse
  • Tensorflow
  • PyTorch

Install software packaged with RUN

The next step is the most challenging. We need to add commands to the Dockerfile to install the desired software. There are a few standard ways to do this:

  • Use a Linux package manager. This is usually apt-get for Debian-based containers (e.g. Ubuntu) or yum for RedHat Linux containers (e.g. CentOS).
  • Use a software-specific package manager (such as pip or conda for Python).
  • Use installation instructions (usually a progression of configuring, doing, installing).

Each of these options will be prefixed with the RUN keyword. You can join commands linked with the &&; symbol To break lines, place a backslash at the end of the line. RUN can execute any command within the image during construction, but keep in mind that the only thing that remains in the final image are changes to the file system (new and modified files, directories, etc.).

For example, suppose your job executable ends up running Python and needs access to the numpy and scipy packages, as well as the unix wget tool. Below is an example of a Dockerfile that uses RUN to install these packages using the system package manager and Python’s built-in package manager.

# Build the image based on the official Python version 3.8 image FROM python:3.8 # Our base image is based on Debian, so you use apt-get as your system package manager # Use apt-get to install wget RUN apt-get update && apt-get install wget # Use RUN to install Python packages (numpy and scipy) via pip, Python RUN package manager pip3 install numpy scipy

If you need to copy specific files (such as source code) from your computer into the image, place the files in the same folder as Dockerfile and use the COPY keyword. You can also download files within the image using the RUN keyword and commands like wget or git clone.

For example, suppose you need to use JAGS and the rjags package for R. If you have

the JAGS source code downloaded next to the Dockerfile, you can compile and install it inside the image like this: FROM rocker/r-ver:3.4.0 # COPY the JAGS source code in the image under /tmp COPY JAGS-4.3.0.tar.gz /tmp # RUN a series of commands to unzip the JAGS source code, compile it and install it RUN cd /tmp && tar -xzf JAGS-4.3.0.tar.gz && cd JAGS-4.3.0 && ./configure && make && make install # install the R package rjags RUN install2.r -error rjags

Configure the environment with ENV

The

software may depend on certain environment variables being configured correctly

.

A common situation is that if you are installing a program in a custom location (such as a home directory), you may need to add that directory to the system path of the image. For example, if you installed some scripts in

/home/software/bin, you could use ENV PATH=”/home/software/bin:${PATH}” to

add them to your PATH.

You can set multiple environment variables at once:

ENV DEBIAN_FRONTEND=non-interactive LC_ALL=en_US. UTF-8 LANG=en_US. UTF-8 LANGUAGE=en_US. UTF-8

4. Build

, name, and tag the image

So far we haven’t created the image, we’ve just been listing instructions on how to build the image in the Dockerfile. Now we are ready to build

the image!

First, decide on a name for the image, as well as a tag. Tags are important for tracking which version of the image you’ve created (and are using). A simple label scheme would be to use numbers (e.g. v0, v1, etc.), but you can use any system that makes sense to you.

Because HTCondor caches Docker images per tag, we strongly recommend that you never use the most recent tag and that you always create images with a new, unique tag that you then explicitly specify in new jobs.

To create and tag the image, open a terminal (Mac/Linux) or command prompt (Windows) and navigate to the folder containing the Dockerfile

:$cd directory (

Replace the directory with the path to the appropriate folder).

Then, make sure Docker

is running (there should be an icon in the status bar and the Docker information running shouldn’t indicate any errors) and run

: $docker build -t username/imagename:tag. Replace the username with

the Docker Hub username and replace imagename and tag with the values of your choice. Note the . at the end of the command (to indicate “the current directory”).

If you get errors, try to determine what you may need to add or change to your Dockerfile, and then run the build command again. Debugging a Docker build is largely the same as debugging any software installation process.

5. Test locally

This page describes how to interact with your new Docker image on your own computer, before attempting to run a job with it in CHTC: Exploring a

  • Docker container on your computer

6. Push

to DockerHub

Once the image has been successfully created and tested, you can submit it to DockerHub to make it available to run jobs in CHTC. To do

this, run the following command: $ docker push username/imagename:tag

(Where you once again replace username/imagename:tag with what you used in the previous steps).

The first time you push an image into DockerHub, you may need to run this command beforehand:

$ Docker login

You should ask for your DockerHub username and password.

Reproducibility

If you have a free account on Docker Hub, any container images you’ve pushed there will be scheduled for deletion if it’s not used (pulled) at least once every 6 months (see Docker Terms of Service).

For this reason, and just because it’s a good idea in general, we recommend creating an archive file of your container image and placing it in any space you use for long-term, backed up storage of data and research code.

To create an archive file of a container image, use

this command, renaming the archive file and container to reflect the names you want to use:

It’s also a good idea to archive a copy of the Dockerfile used to generate a container image along with the archive file of the container image itself.

7. Running

jobs

Once your Docker image is in Docker Hub, you can use it to run jobs on CHTC’s HTC system. See this guide for more details:

  • Running Docker jobs in CHTC

This section contains several sample Dockerfiles covering more advanced use cases

. Install a custom Python package from GitHub Let’s say you have

a custom Python package

hosted on GitHub, but it’s not available on PyPI. Since pip can install packages directly from git repositories, you can install your package like this

: FROM python:3.8 RUN pip3 install git+https://github.com/<RepositoryOwner>/<RepositoryName>

where you would replace <RepositoryOwner> and <RepositoryName> with the desired targets

. QIIME This Dockerfile installs

QIIME2

based on these instructions. It is assumed that the 64-bit Linux miniconda installer has been downloaded to the directory with the Dockerfile.

FROM python:3.6-stretch COPY Miniconda3-latest-Linux-x86_64.sh /tmp RUN mkdir /home/qiimeuser ENV HOME=/home/qiimeuser RUN cd /tmp && ./Miniconda3-latest-Linux-x86_64.sh -b -p /home/qiimeuser/minconda3 && export PATH=/home/qiimeuser/minconda3/bin:$PATH && conda update conda && conda create -n qiime2-2017.10 -file https://data.qiime2.org/distro/core/qiime2-2017.10-conda-linux-64.txt

Contact US