Practice First exposure to PyTorch

Last updated: January 4, 2023

PyTorch is an open source package for Python, primarily developed by Facebook's AI Research lab, enabling tensor computing on the GPU and deep learning research.

Loading PyTorch

from __future__ import print_function
import torch

GPU support

As you start playing with PyTorch, you will notice that it resembles NumPy in many ways. One of the differences is that PyTorch brings GPU support.

Example:

To create a 2-dimensional tensor filled with random numbers from the standard normal distribution on CPU:

torch.randn(2, 3, device = torch.device('cpu'))

This is similar to NumPy's ndarray (the NumPy class for n-dimensional arrays):

import numpy
numpy.random.randn(2, 3)

except that you can do the same on GPU:

torch.randn(2, 3, device = torch.device('cuda'))

Being able to run the same code on CPU and GPU is very convenient: you can develop code on a machine that doesn't have a dedicated GPU, then run it on the Compute Canada cluster's GPUs.

Application to a basic neural net

The code below, from Justin Johnson's PyTorch examples, builds a fully-connected ReLU network with one hidden layer and no biases. If you have watched the videos in the previous lesson, you should have some sense of what that means.

Reminders:

A fully-connected or dense network means that, on each layer, every neuron is connected to all the neurons of the previous layer as well as all the neurons of the next layer.

ReLU is a rectified linear unit: a neuron using the rectifier or \(f(x) = max(0, x)\) as its activation function.

This extremely simple network is trained to predict some random output y from random input x .

The forward pass, loss function, and backward pass are all coded manually, so that you can really see what is going on.

from __future__ import print_function
import torch

# for those without a GPU
device = torch.device('cpu')

# for those with a CUDA-enabled GPU
device = torch.device('cuda')

# N is the batch size
# D_in is the dimension of the input layer
# H is the dimension of the hidden layer
# D_out is the dimension of the output layer
N, D_in, H, D_out = 64, 1000, 100, 10

# Create 2d random input and output data
# Similar to numpy.random.randn()
x = torch.randn(N, D_in, device = device)
y = torch.randn(N, D_out, device = device)

# Randomly initialize the weights
# from the input layer to the hidden layer
w1 = torch.randn(D_in, H, device = device)

# Randomly initialize the weights
# from the hidden layer to the output layer
w2 = torch.randn(H, D_out, device = device)

learning_rate = 1e-6

# We are going over 500 epochs
for t in range(500):

  # 1/ Forward pass: compute predicted y
  # torch.mm() is the equivalent of numpy.dot()
  h = x.mm(w1)
  h_relu = h.clamp(min=0)
  y_pred = h_relu.mm(w2)

  # 2/ Compute and print loss
  # Loss is a scalar, stored in a PyTorch tensor
  # We can get its value as a Python number with loss.item()
  # Sum of squares of the errors
  # (Error = difference between predicted value and real value)
  loss = (y_pred - y).pow(2).sum()
  print(t, loss.item())

  # 3/ Backpropagation to compute gradients of w1 and w2 with respect to loss
  # This comes from the calculation of derivatives
  # (explained in the 3Blue1Brown 4th Video at 4:12)
  grad_y_pred = 2.0 * (y_pred - y)
  grad_w2 = h_relu.t().mm(grad_y_pred)
  grad_h_relu = grad_y_pred.mm(w2.t())
  grad_h = grad_h_relu.clone()
  grad_h[h < 0] = 0
  grad_w1 = x.t().mm(grad_h)

  # 4/ Update weights using gradient descent
  w1 -= learning_rate * grad_w1
  w2 -= learning_rate * grad_w2

Practice

Run this code, trying to understand what each line does.
It might be useful to explore the created objects along the way with, for instance, type() or print() .

If the calculations do not make sense to you, watch the videos of the previous lesson again.
If you are puzzled by the code syntax, revisit Python’s and/or NumPy’s tutorials (if you are very familiar with NumPy, you might find this list of equivalence between NumPy and PyTorch by Kentaro Wada useful).

Finally, draw a diagram of this neural network and place D_in , H , D_out , w1 , w2 , x , and y on it.

In our Zoom session tomorrow morning, we will discuss this code and start building from it.

Comments & questions