Practice Training cluster setup
Access to the training cluster
This course is run in a training cluster. You access it with the secure shell command.
Open a terminal emulator:
Windows:
MobaXTerm (install the free version)
MacOS:
Terminal
Linux:
xterm or the terminal emulator of your choice
In it, using the user name and password that you received in our first Zoom session, type:
$ ssh userxxx@uu.c3.ca
# enter password (it is blind typing, so you won't see it as you type)
You are now in our training cluster.
Load necessary modules
To use Python for ML on the cluster, you will need to load the relevant modules.
This is done with the Lmod tool through the module command. Here are some key commands:
# Get help on the module command
$ module help
# List modules that are already loaded
$ module list
# See which modules are available for Python
$ module avail python
# Load the module for Python version 3.8.3
$ module load python/3.8.2
At this point, we do not have GPUs in our training cluster (we will soon!). For this course, the Python module is thus the only one you need to load. When working on the Compute Canada clusters however, in order to use GPUs, you will also need to load the modules cuda , possibly cudacore (depending on which cuda module you are loading), and cudnn (NVIDIA CUDA Deep Neural Network libraryโa GPU-accelerated library of primitives for deep neural networks).
Install the necessary Python wheels in a virtual environment
You also need Python packages.
For this, create a virtual environment in which you will install packages with pip .
Do not use Anaconda
While Anaconda is a great tool on personal computers, it is not an appropriate tool when working on the Compute Canada clusters: binaries are unoptimized for those clusters and library paths are inconsistent with their architecture. Anaconda installs packages in $HOME
where it creates a very large number of small files. It can also create conflicts by modifying .bashrc
.
Create a virtual environment:
$ virtualenv --no-download ~/env
Activate your virtual environment:
$ source ~/env/bin/activate
Update pip:
(env) $ pip install --no-index --upgrade pip
Install the packages you need in the virtual environment:
(env) $ pip install --no-index matplotlib torch torchvision tensorboard
If you want to exit the virtual environment, run:
(env) $ deactivate
Issues
If you have issues accessing the training cluster or installing the python packages in a virtual environment, please join the debug session where we will help you getting up and running.