Compiling for a GPU

Using a GPU can accelerate a code, but requires special programming and compiling. Several options are available for GPU-enabled programs.

OpenACC

OpenACC is a standard

Available NVIDIA CUDA Compilers

ModuleVersion Module Load Command
cuda10.2.89 module load cuda/10.2.89
cuda11.4.2 module load cuda/11.4.2
cuda11.8.0 module load cuda/11.8.0
cuda12.2.2 module load cuda/12.2.2
cuda12.4.1 module load cuda/12.4.1
ModuleVersion Module Load Command
nvhpc24.1 module load nvhpc/24.1
nvhpc24.5 module load nvhpc/24.5

GPU architecture

According to the CUDA documentation, “in the CUDA naming scheme, GPUs are named sm_xy, where x denotes the GPU generation number, and y the version in that generation.” The documentation contains details about the architecture and the corresponding xy value. The compute capability is x.y.

Please use the following values when compiling CUDA code on the HPC system.

Type GPU Architecture Compute Capability CUDA Version
Datacenter V100 Volta 7.0 9+
A100 Ampere 8.0 11+
A40 Ampere 8.6 11+
RTX A6000 Ampere 8.6 11+
GeForce RTX2080Ti Turing 7.5 10+
RTX3090 Ampere 8.6 11+

As an example, if you are only interested in V100 and A100:

-gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80