Skip to content

lionlai1989/GPU_Programming_Specialization

Repository files navigation

GPU Programming Specialization

This repository holds my study notes and hands-on projects for CUDA-based GPU programming. This covers:

  • Canny Edge Detector and KLT Tracker with CUDA from scratch
  • GPU Programming Specialization course by Johns Hopkins University
  • Selected examples from various textbooks

My CUDA implementation is 3× faster than OpenCV, even with OpenCV already using 16 threads and SIMD.

Video Comparison

KLT Tracker naive

Prerequisites

All code is tested on Ubuntu 22.04, using CUDA 12.4 and OpenCV 4.10.0.

My laptop has one GPU "NVIDIA RTX A3000" and one Intel CPU with 16 CPUs and 32 GB RAM.

CPU(s):                   16
  On-line CPU(s) list:    0-15
    Thread(s) per core:   2
    Core(s) per socket:   8
    Socket(s):            1

CUDA Toolkit 12.4

Make sure CUDA is installed and environment variables are set:

export PATH=/usr/local/cuda-12.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH

OpenCV 4.10.0 (with FFMPEG + CUDA support)

# Install FFMPEG and dependencies
sudo apt update && sudo apt install -y libavcodec-dev libavformat-dev libavutil-dev libswscale-dev libgtk2.0-dev libcanberra-gtk-module

# Download opencv and opencv_contrib
wget -O opencv.zip https://github.com/opencv/opencv/archive/refs/tags/4.10.0.zip \
&& wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/refs/tags/4.10.0.zip

# Extract both
unzip opencv.zip && unzip opencv_contrib.zip

cd opencv-4.10.0

# Build and install
rm -rf build/ install_opencv/ \
&& cmake -S . -B build/ \
      -GNinja \
      -DCMAKE_INSTALL_PREFIX=./install_opencv \
      -DCMAKE_BUILD_TYPE=RELEASE \
      -DCMAKE_CXX_STANDARD=17 \
      -DCMAKE_CUDA_STANDARD=17 \
      -DWITH_CUDA=ON \
      -DWITH_FFMPEG=ON \
      -DWITH_OPENMP=ON \
      -DWITH_OPENCL=ON \
      -DWITH_GTK=ON \
      -DWITH_GTK_2_X=ON \
      -DBUILD_opencv_hdf=OFF \
      -DBUILD_TESTS=OFF \
      -DBUILD_PERF_TESTS=OFF \
      -DBUILD_EXAMPLES=OFF \
      -DOPENCV_GENERATE_PKGCONFIG=ON \
      -DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib-4.10.0/modules \
      -DBUILD_LIST=core,cudev,imgproc,imgcodecs,videoio,highgui,video,cudaarithm,cudafilters,cudaimgproc,cudawarping \
&& cmake --build build/ --parallel $(nproc) && cmake --install build/

Resources

About

My study notes and hands-on projects for CUDA-based GPU programming

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published