This repository holds my study notes and hands-on projects for CUDA-based GPU programming. This covers:
- Canny Edge Detector and KLT Tracker with CUDA from scratch
- GPU Programming Specialization course by Johns Hopkins University
- Selected examples from various textbooks
My CUDA implementation is 3× faster than OpenCV, even with OpenCV already using 16 threads and SIMD.
All code is tested on Ubuntu 22.04, using CUDA 12.4 and OpenCV 4.10.0.
My laptop has one GPU "NVIDIA RTX A3000" and one Intel CPU with 16 CPUs and 32 GB RAM.
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Make sure CUDA is installed and environment variables are set:
export PATH=/usr/local/cuda-12.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH# Install FFMPEG and dependencies
sudo apt update && sudo apt install -y libavcodec-dev libavformat-dev libavutil-dev libswscale-dev libgtk2.0-dev libcanberra-gtk-module
# Download opencv and opencv_contrib
wget -O opencv.zip https://github.com/opencv/opencv/archive/refs/tags/4.10.0.zip \
&& wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/refs/tags/4.10.0.zip
# Extract both
unzip opencv.zip && unzip opencv_contrib.zip
cd opencv-4.10.0
# Build and install
rm -rf build/ install_opencv/ \
&& cmake -S . -B build/ \
-GNinja \
-DCMAKE_INSTALL_PREFIX=./install_opencv \
-DCMAKE_BUILD_TYPE=RELEASE \
-DCMAKE_CXX_STANDARD=17 \
-DCMAKE_CUDA_STANDARD=17 \
-DWITH_CUDA=ON \
-DWITH_FFMPEG=ON \
-DWITH_OPENMP=ON \
-DWITH_OPENCL=ON \
-DWITH_GTK=ON \
-DWITH_GTK_2_X=ON \
-DBUILD_opencv_hdf=OFF \
-DBUILD_TESTS=OFF \
-DBUILD_PERF_TESTS=OFF \
-DBUILD_EXAMPLES=OFF \
-DOPENCV_GENERATE_PKGCONFIG=ON \
-DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib-4.10.0/modules \
-DBUILD_LIST=core,cudev,imgproc,imgcodecs,videoio,highgui,video,cudaarithm,cudafilters,cudaimgproc,cudawarping \
&& cmake --build build/ --parallel $(nproc) && cmake --install build/- GPU Programming Specialization offered by Johns Hopkins University
- An Even Easier Introduction to CUDA
