Practice Installing PyTorch and logging in the training cluster

Last updated: January 4, 2023

All installations are local (on your machine). The only thing you need to worry about with the training cluster is to make sure that you can ssh into it.

Python and package manager

First of all, you need Python 3.7 or higher.
Then you need a package manager to install Python packages.

There are many ways to go about this and you are free to chose an alternative method if you wish to do so, particularly if you already have Python installed and are used to installing new packages.

For those new to Python, I suggest installing Anaconda:

Go to the Anaconda Installers section of this page, download and install the 64-Bit installer for your operating system (unless you still have a 32-Bit version of Windows of course. But those are becoming rare).

Notes:

  • Anaconda is extremely convenient on personal computers. It is also huge since it installs a whole suite of scientific Python packages. If you are looking for a much leaner installation than Anaconda, Miniconda3 will only install Python 3.7, the package manager conda, and their dependencies.
  • Those on Linux, depending on their distro, can also install Python packages with their distribution package manager.
  • While Anaconda is a good option on personal computers, this is NOT the way to go once you move to the Compute Canada clusters.

Python packages

Then, you need 3 Python packages: PyTorch, TorchVision, and Matplotlib.

Matplotlib

Matplotlib is already packaged in Anaconda. So those who installed Anaconda are good with that one. Those who chose an alternative method can find information on how to install it here.

PyTorch and TorchVision

Use this page to find the command you need to run based on your operating system and whether or not you have CUDA (this only concerns you if you have an Nvidia GPU).

I recommend using the Stable (1.5) build.

Examples:

  • You are on Windows, you installed Anaconda as suggested above, and you do not have CUDA, then:

Launch the Anaconda Prompt (for instance by typing it in the main Windows search box or by going to Start | Anaconda3 | Anaconda Prompt ) and type:

conda install pytorch torchvision cpuonly -c pytorch
  • You are on a Mac and you do not have CUDA, then run:
conda install pytorch torchvision -c pytorch
  • Etc.
Test the installation

Make sure that all works well by launching Python, loading the torch package and creating a 2-dimensional tensor of zeros:

[In]

from __future__ import print_function
import torch

torch.zeros(3, 4)

[Out]

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

Note: if you have a CUDA-enabled GPU on your machine and you want to test whether it is accessible to PyTorch, you can run:

[In]

torch.cuda.is_available()

[Out]

True

TensorBoard

TensorBoard is a web visualization toolkit developed by TensorFlow which can be used with PyTorch.

Install it with:

pip install tensorboard

Log in the training cluster

Lastly, you need to make sure that you can ssh into our training cluster.

Open a terminal emulator:

Windows:   MobaXTerm
MacOS:   Terminal
Linux:   xterm or the terminal emulator of your choice

Use the user name and password that you received by email:

$ ssh userxxx@cassiopeia.c3.ca
# enter password (it is blind typing)

You are now in our training cluster.

You will learn tomorrow how to run ML jobs in Compute Canada clusters. But for the purpose of this course and while using our training cluster, all you will have to do before running sbatch jobs containing your PyTorch scripts is to activate a Python virtual environment that we have already created and which contains the torch , torchvision , and matplotlib packages:

$ source /project/shared/bio2/bin/activate

From here on, you can launch sbatch jobs to run python scripts in this cluster. As it is a small training cluster, please refrain from running Python directly in the login node or from running interactive jobs with salloc .

Issues

If you have issues installing PyTorch and/or logging in the training cluster, please sign up for the debug session to make sure that you are ready for our next Zoom meeting tomorrow morning.

Comments & questions