Practice Training cluster setup

Last updated: January 4, 2023

Access to the training cluster

This course is run in a training cluster. You access it with the secure shell command.

Open a terminal emulator:

Windows:   MobaXTerm (install the free version)
MacOS:   Terminal
Linux:   xterm or the terminal emulator of your choice

In it, using the user name and password that you received in our first Zoom session, type:

$ ssh userxxx@uu.c3.ca
# enter password (it is blind typing, so you won't see it as you type)

You are now in our training cluster.

Load necessary modules

To use Python for ML on the cluster, you will need to load the relevant modules.

This is done with the Lmod tool through the module command. Here are some key commands:

# Get help on the module command
$ module help

# List modules that are already loaded
$ module list

# See which modules are available for Python
$ module avail python

# Load the module for Python version 3.8.3
$ module load python/3.8.2

At this point, we do not have GPUs in our training cluster (we will soon!). For this course, the Python module is thus the only one you need to load. When working on the Compute Canada clusters however, in order to use GPUs, you will also need to load the modules cuda , possibly cudacore (depending on which cuda module you are loading), and cudnn (NVIDIA CUDA Deep Neural Network libraryโ€”a GPU-accelerated library of primitives for deep neural networks).

Install the necessary Python wheels in a virtual environment

You also need Python packages.

For this, create a virtual environment in which you will install packages with pip .

Do not use Anaconda
While Anaconda is a great tool on personal computers, it is not an appropriate tool when working on the Compute Canada clusters: binaries are unoptimized for those clusters and library paths are inconsistent with their architecture. Anaconda installs packages in $HOME where it creates a very large number of small files. It can also create conflicts by modifying .bashrc .

Create a virtual environment:

$ virtualenv --no-download ~/env

Activate your virtual environment:

$ source ~/env/bin/activate

Update pip:

(env) $ pip install --no-index --upgrade pip

Install the packages you need in the virtual environment:

(env) $ pip install --no-index matplotlib torch torchvision tensorboard

If you want to exit the virtual environment, run:

(env) $ deactivate

Issues

If you have issues accessing the training cluster or installing the python packages in a virtual environment, please join the debug session where we will help you getting up and running.

Comments & questions