Running CUDA Code on AMD Graphics Cards
Many people know that CUDA is the most commonly used platform for accelerating massive parallel computing used in various practical and research areas.
In 2016, AMD literally introduced a clone of the CUDA platform – ROCm. Alternatives to CUDA modules for ROCm can be seen in the table from AMD’s official website.
Table of correspondence of platform modules
CUDA platform module | ROCm platform module |
cuBLAS | rocBLAS |
cuFFT | rocFFT |
cuSPARSE | rocSPARSE |
cuSolver | rocSOLVER |
AMG-X | rocALUTION |
Thrust | rocThrust |
CUB | rocPRIM |
cuDNN | MIOpen |
curAND | rocRAND |
EIGEN | EIGEN |
NCCL | RCCL |
This library allows you to automatically transfer the source code intended for the CUDA platform to ROCm and compile it. One of the disadvantages of this platform is its exclusive focus on the Linux OS.
Let’s proceed directly to porting code and comparing the performance of platforms.
Test configuration
PC 1 | PC 2 | |
Operating system | Windows 10 Pro 21H1 | Ubuntu 22.04 5.15.0-53-generic |
CPU | x2 Intel Xeon Gold 6132 | i5-12600K |
RAM | x4 DDR4 16GB | x1 DDR4 32GB |
GPU | GeForce RTX 3070 8GB | Radeon RX 6800XT 16GB |
1. Installing CUDA on Windows OS
Go to the NVidia websitehttps://developer.nvidia.com/cuda-downloads) and download the latest version of CUDA Toolkit for the required platform. The screenshot below shows the minimum required configuration for compiling and running the CUDA platform on Windows OS.
Minimum Required Installation Configuration
2. Installing ROCm on Linux OS
Consider the progress of installing ROCm on Ubuntu 22.04. (https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.3/page/How_to_Install_ROCm.html – this website lists installation methods for some other Linux distributions)
2.1 Download the installer package and install it.
sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/5.3/ubuntu/jammy/amdgpu-install_5.3.50300-1_all.deb
sudo apt-get install ./amdgpu-install_5.3.50300-1_all.deb
2.2 Installing the required ROCm components
sudo amdgpu-install --usecase=dkms,rocm,rocmdevtools,lrt,hip,hiplibsdk,mllib,mlsdk
Errors may appear during the installation process, but they should not affect the operation of the platform in any way. In fact, I’m not 100% sure that this is the minimum required set of modules for installation, but through trial and error, I came up with this set.
2.3 Installing CUDA.
To port CUDA code to ROCm, you also need to install the CUDA Toolkit. The easiest way to do this is with the following command. (Other CUDA versions and installation methods can be found on this web page https://developer.nvidia.com/cuda-downloads)
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run
CUDA installation configuration
3. Compiling source code on Windows OS
As a test example, let’s take the code for multiplying matrices of random integer 32-bit numbers from Github (https://github.com/lzhengchun/matrix-cuda).
Using the PowerShell commands below, download and compile the source files. After executing the commands below, the executable file “a.exe” will appear in the source code directory.
git clone https://github.com/lzhengchun/matrix-cuda
cd matrix-cuda
nvcc ./matrix_cuda.cu
4. Converting CUDA code to ROCm code and compiling it on Ubuntu OS
Converting CUDA code to ROCm is done using the ROCm HIPIFY platform utility (from HIP – ROCm Platform Programming Language)
git clone https://github.com/lzhengchun/matrix-cuda
cd matrix-cuda
/opt/rocm-5.3.0/bin/hipify-clang matrix_cuda.cu
After executing these commands, the matrix_cuda.cu.hip file will appear in the directory next to the matrix_cuda.cu file, which is the source code file for the ROCm platform.
Compiling code for the ROCm platform is done using the HIPCC compiler. After executing the commands below, the executable file “a.out” will appear in the source code directory.
/opt/rocm-5.3.0/bin/hipсс matrix_cuda.cu.hip
5. Platform Performance Comparison
Matrix size | CUDA Runtime | ROCM Runtime |
1000×1000 | 2.536 ms | 5.812 ms |
10000×10000 | 195.123 ms | 297.219 ms |