Description
PyTorch is a deep learning framework that puts Python first. It provides Tensors and Dynamic neural networks in Python with strong GPU acceleration.
Software Category: data
For detailed information, visit the PyTorch
website.
Available Versions
The current installation of PyTorch
incorporates the most popular packages. To find the available versions and learn how to load them, run:
module spider pytorch
The output of the command shows the available PyTorch
module versions.
For detailed information about a particular PyTorch
module, including how to load the module, run the module spider
command with the module’s full version label. For example:
module spider pytorch/1.12.0
Module | Version |
Module Load Command |
pytorch | 1.12.0 |
module load apptainer/1.2.2 pytorch/1.12.0
|
pytorch | 2.0.1 |
module load apptainer/1.2.2 pytorch/2.0.1
|
pytorch | 2.4.0 |
module load apptainer/1.2.2 pytorch/2.4.0
|
PyTorch Jupyter Notebooks
Jupyter Notebooks can be used for interactive code development and execution of Python scripts and several other codes. PyTorch Jupyter kernels are backed by containers in the corresponding modules.
Accessing the JupyterLab Portal
- Open a web browser and go to: https://ood.hpc.virginia.edu.
- Use your “Netbadge” credentials to log in.
- On the top right of the menu bar of the Open OnDemand dashboard, click on
Interactive Apps
.
- In the drop-down box, click on
JupyterLab
.
Requesting access to a GPU node
To start a JupyterLab session, fill out the resource request webform. To request access to a GPU, verify the correct selection for the following parameters:
- Under Rivanna Partition, choose “GPU”.
- Under Optional GPU Type, choose a GPU type or leave it as “default” (first available).
Click Launch
to start the session.
Editing and Running the Notebook
Once the JupyterLab instance has started, you can edit and run your notebook as described here.
PyTorch Slurm jobs
The following is a Slurm script template. The commented numbers correspond to the items in the ensuing notes.
#!/bin/bash
#SBATCH -A mygroup
#SBATCH -p gpu # 1
#SBATCH --gres=gpu:1 # 1
#SBATCH -c 1
#SBATCH -t 00:01:00
#SBATCH -J pytorchtest
#SBATCH -o pytorchtest-%A.out
#SBATCH -e pytorchtest-%A.err
module purge
module load apptainer pytorch/2.0.1 # 2
apptainer run --nv $CONTAINERDIR/pytorch-2.0.1.sif pytorch_example.py # 3
Notes:
-
The Slurm script needs to include the #SBATCH -p gpu
and #SBATCH --gres=gpu
directives in order to request access to a GPU node and its GPU device. Please visit the Jobs Using a GPU section for details.
-
To use the pytorch container, load the apptainer and pytorch modules. You may choose a different version (see module spider
above).
Do not load the cuda
or cudnn
modules since these libraries are included with pytorch.
-
The --nv
flag sets up the container’s environment to use a GPU when running a GPU-enabled application. The run
command executes the default command defined in the container, which in this case is python
. What follows after the *.sif
is passed as arguments. In summary, the apptainer command can be translated as: “Use the python
interpreter inside the pytorch container to execute pytorch_example.py
with GPU enabled.”
PyTorch Interactive Jobs (ijob)
Start an ijob. Note the addition of -p gpu
and --gres=gpu
to request access to a GPU node and its GPU device.
ijob -A mygroup -p gpu --gres=gpu -c 1
module purge
module load apptainer pytorch/2.0.1
apptainer run --nv $CONTAINERDIR/pytorch-2.0.1.sif pytorch_example.py
Interaction with the Host File System
The following user directories are overlayed onto each container by default on the HPC system:
- /home
- /scratch
- /nv
- /standard
- /project
Due to the overlay, these directories are by default the same inside and outside the container with the same read, write, and execute permissions. This means that file modifications in these directories (e.g. in /home) via processes running inside the container are persistent even after the container instance exits. The /nv
and /project
directories refer to leased storage locations that may not be available to all users.