Linux containers are a way to create a self-contained environment that includes software, libraries, and other tools. CHTC currently supports running jobs inside Docker containers. This guide describes how to create a Docker image that you can use to run jobs in CHTC. For information on using this image for jobs, see our Docker jobs guide.
Please note that all the steps below should be executed on your own computer, not on CHTC.
Docker images can be created using a special file format called “Dockerfile”. This file has commands that allow you to:
use a
- pre-existing Docker image as a basis
- Add files
- installation commands
- Set environment variables
to the image Run
You can then “create
” an image from this file, test it locally, and send it to DockerHub, where HTCondor can use the image to create containers in which to run jobs. Different versions of the image can be tagged with different version “tags”.
This guide has:
- Step-by-step instructions
- Examples
1. Configure
Docker on your computer
If you haven’t already, create a DockerHub account and install Docker on your computer. You’ll want to look for Docker Community Edition for your operating system. Sometimes Docker takes some time to start, especially the first time. Once Docker starts, it won’t open a window; You will only see a small whale and container icon on one of your computer’s toolbars. To use Docker, you’ll need to open a command-line program (such as Terminal or Command Prompt) and run commands there.
2. Explore Docker containers
(optional)
If you’ve never used Docker before, we recommend exploring a pre-existing container and testing the installation steps interactively before creating a Dockerfile. See the first half of this guide: Exploring and testing a Docker 3 container
. Create
a
Dockerfile A Dockerfile is a plain text file with keywords that add items to a Docker image. There are many keywords that can be used in a Dockerfile (documented on the Docker website here: Dockerfile keywords), but we’ll use a subset of these keywords following this basic schema
: Starting point:
- Which Docker image do you want to start with
- Addendums: What needs to be added? Folders? Data? Other software?
- Environment: What variables (if any) are set as part of the software installation?
?
Create
the
file
Create a blank text file named Dockerfile. If you plan to create multiple images for different parts of your workflow, you should create a separate folder for each new image with a Dockerfile inside each of them.
Choose a base image with FROM
You usually don’t want to start building your image from scratch. Instead, you’ll want to choose a “base” image to add things to.
You can find a base image by searching DockerHub. If you are using a scripting language such as Python, R, or perl, you can start with the “official” image of these languages. If you’re not sure what to start with, using a basic Linux image (Debian, Ubuntu, and CentOS are common examples) is often a good place to start.
Images often have labeled versions. In addition to choosing the image you want, make sure you choose a version by clicking on the “Tags” tab of the image.
Once you’ve decided on a base image and version, add it as the first line of your Dockerfile, like this:
FROM repository/image:tag Some images are
maintained by DockerHub itself (these are the “official” images mentioned above) and do not have a repository. For example, to get started with Centos
7, you can use FROM centos:7 while from one of the images of
HTC’s HTC Jupyter laptop from HTC could look like
FROM htcondor/htc-minimal-notebook: 2019-12-02
When possible, you should use a specific tag (not the last autotag) in the FROM statements.
Here are some base images that you may find useful for building:
- Centos
- Ubuntu
- Python / Anaconda / Miniconda
- R / Tidyverse
- Tensorflow
- PyTorch
Install software packaged with RUN
The next step is the most challenging. We need to add commands to the Dockerfile to install the desired software. There are a few standard ways to do this:
- Use a Linux package manager. This is usually apt-get for Debian-based containers (e.g. Ubuntu) or yum for RedHat Linux containers (e.g. CentOS).
- Use a software-specific package manager (such as pip or conda for Python).
- Use installation instructions (usually a progression of configuring, doing, installing).
Each of these options will be prefixed with the RUN keyword. You can join commands linked with the &&; symbol To break lines, place a backslash at the end of the line. RUN can execute any command within the image during construction, but keep in mind that the only thing that remains in the final image are changes to the file system (new and modified files, directories, etc.).
For example, suppose your job executable ends up running Python and needs access to the numpy and scipy packages, as well as the unix wget tool. Below is an example of a Dockerfile that uses RUN to install these packages using the system package manager and Python’s built-in package manager.
# Build the image based on the official Python version 3.8 image FROM python:3.8 # Our base image is based on Debian, so you use apt-get as your system package manager # Use apt-get to install wget RUN apt-get update && apt-get install wget # Use RUN to install Python packages (numpy and scipy) via pip, Python RUN package manager pip3 install numpy scipy
If you need to copy specific files (such as source code) from your computer into the image, place the files in the same folder as Dockerfile and use the COPY keyword. You can also download files within the image using the RUN keyword and commands like wget or git clone.
For example, suppose you need to use JAGS and the rjags package for R. If you have
the JAGS source code downloaded next to the Dockerfile, you can compile and install it inside the image like this: FROM rocker/r-ver:3.4.0 # COPY the JAGS source code in the image under /tmp COPY JAGS-4.3.0.tar.gz /tmp # RUN a series of commands to unzip the JAGS source code, compile it and install it RUN cd /tmp && tar -xzf JAGS-4.3.0.tar.gz && cd JAGS-4.3.0 && ./configure && make && make install # install the R package rjags RUN install2.r -error rjags
Configure the environment with ENV
The
software may depend on certain environment variables being configured correctly
.
A common situation is that if you are installing a program in a custom location (such as a home directory), you may need to add that directory to the system path of the image. For example, if you installed some scripts in
/home/software/bin, you could use ENV PATH=”/home/software/bin:${PATH}” to
add them to your PATH.
You can set multiple environment variables at once:
ENV DEBIAN_FRONTEND=non-interactive LC_ALL=en_US. UTF-8 LANG=en_US. UTF-8 LANGUAGE=en_US. UTF-8
4. Build
, name, and tag the image
So far we haven’t created the image, we’ve just been listing instructions on how to build the image in the Dockerfile. Now we are ready to build
the image!
First, decide on a name for the image, as well as a tag. Tags are important for tracking which version of the image you’ve created (and are using). A simple label scheme would be to use numbers (e.g. v0, v1, etc.), but you can use any system that makes sense to you.
Because HTCondor caches Docker images per tag, we strongly recommend that you never use the most recent tag and that you always create images with a new, unique tag that you then explicitly specify in new jobs.
To create and tag the image, open a terminal (Mac/Linux) or command prompt (Windows) and navigate to the folder containing the Dockerfile
:$cd directory (
Replace the directory with the path to the appropriate folder).
Then, make sure Docker
is running (there should be an icon in the status bar and the Docker information running shouldn’t indicate any errors) and run
: $docker build -t username/imagename:tag. Replace the username with
the Docker Hub username and replace imagename and tag with the values of your choice. Note the . at the end of the command (to indicate “the current directory”).
If you get errors, try to determine what you may need to add or change to your Dockerfile, and then run the build command again. Debugging a Docker build is largely the same as debugging any software installation process.
5. Test locally
This page describes how to interact with your new Docker image on your own computer, before attempting to run a job with it in CHTC: Exploring a
- Docker container on your computer
6. Push
to DockerHub
Once the image has been successfully created and tested, you can submit it to DockerHub to make it available to run jobs in CHTC. To do
this, run the following command: $ docker push username/imagename:tag
(Where you once again replace username/imagename:tag with what you used in the previous steps).
The first time you push an image into DockerHub, you may need to run this command beforehand:
$ Docker login
You should ask for your DockerHub username and password.
Reproducibility
If you have a free account on Docker Hub, any container images you’ve pushed there will be scheduled for deletion if it’s not used (pulled) at least once every 6 months (see Docker Terms of Service).
For this reason, and just because it’s a good idea in general, we recommend creating an archive file of your container image and placing it in any space you use for long-term, backed up storage of data and research code.
To create an archive file of a container image, use
this command, renaming the archive file and container to reflect the names you want to use:
It’s also a good idea to archive a copy of the Dockerfile used to generate a container image along with the archive file of the container image itself.
7. Running
jobs
Once your Docker image is in Docker Hub, you can use it to run jobs on CHTC’s HTC system. See this guide for more details:
- Running Docker jobs in CHTC
This section contains several sample Dockerfiles covering more advanced use cases
. Install a custom Python package from GitHub Let’s say you have
a custom Python package
hosted on GitHub, but it’s not available on PyPI. Since pip can install packages directly from git repositories, you can install your package like this
: FROM python:3.8 RUN pip3 install git+https://github.com/<RepositoryOwner>/<RepositoryName>
where you would replace <RepositoryOwner> and <RepositoryName> with the desired targets
. QIIME This Dockerfile installs
QIIME2
based on these instructions. It is assumed that the 64-bit Linux miniconda installer has been downloaded to the directory with the Dockerfile.
FROM python:3.6-stretch COPY Miniconda3-latest-Linux-x86_64.sh /tmp RUN mkdir /home/qiimeuser ENV HOME=/home/qiimeuser RUN cd /tmp && ./Miniconda3-latest-Linux-x86_64.sh -b -p /home/qiimeuser/minconda3 && export PATH=/home/qiimeuser/minconda3/bin:$PATH && conda update conda && conda create -n qiime2-2017.10 -file https://data.qiime2.org/distro/core/qiime2-2017.10-conda-linux-64.txt