Build OpenCV 4.4.0 with CUDA (GPU) Support on Windows 10 (Without Tears)

Hello Everyone!
In this article, I am going to explain step by step how you can build OpenCV 4.4.0 from source with CUDA GPU acceleration on Windows 10. I will try to make it as brief as possible. So let’s start!

Step 0: Prerequisites

  1. Download and install CUDA and cuDNN as explained here.
  2. Download and install CmakeGUI from here.
  3. Download and install Visual Studio Community Edition from here. Install with Desktop Development for C++ option.
  4. Download OpenCV source from here.
  5. Download OpenCV contrib from here. Make sure the version matches with OpenCV.
  6. Extract OpenCV and OpenCV contrib zip files.
  7. Make an empty folder called build

Step 1: Building OpenCV using CMake GUI

  1. Open CMake GUI and browse for OpenCV source folder.
  2. Browse for make folder that we created above.
  3. Click on Configure and select X64 platform and hit Finish.
  4. New options will appear in CMake in red color. Tick these checkboxes there: WITH_CUDA, OPENCV_DNN_CUDA, ENABLE_FAST_MATH
  5. On the same window, go to OPENCV_EXTRA_MODULES_PATH and browse for OpenCV contrib directory and point to the modules subfolder.
  6. Hit Configure again. You will see new options in red color. Tick CUDA_FAST_MATH checkbox. From CUDA_ARCH_BIN property, remove any compute architecture that your model of nVidia GPU does not support. You can find a list of compatible compute architectures for your model of GPU here.
  7. Hit Configure and then Generate.

Step 2: Making OpenCV with Visual Studio

  1. Go to build folder and open OpenCV.sln file with Visual Studio.
  2. Once opened, change Debug to Release from the top.
  3. On the panel at the right-hand side, expand Cmake Targets.
  4. Right-click on ALL_BUILD and click on build.
  5. Once done, right-click on Install and click on build.

That’s it. You have successfully installed OpenCV with CUDA GPU acceleration on Windows 10. It would work for any language but I have experimented with Python 3.8 for now.

Verify GPU Computation Speedup

Here is a small code that performs simple matrix multiplication on CPU using NumPy and on GPU using OpenCV.

You should see the output similar to mine (depending on GPU model).

4.45 ms ± 123 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)2.06 s ± 50 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

That’s approximately 450x speedup!!

For the step to step video and GPU usage verification scripts, you can refer to the following video. It explains every step in great detail along with possible errors and their solutions.

--

--

--

I am PhD scholar, specializing in deep learning for natural language processing and computer vision, specifically interested in learning with minimal data.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Database Terminology for Beginners

Manage your AWS infra with AWS CDK

What is clean architecture and why you should use it?

Is Time Complexity Important?

Why the idea of a scrum team is so powerful..

Bedrock Fixed Player Animation (Beta 0.0.0)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
M. Haroon Shakeel

M. Haroon Shakeel

I am PhD scholar, specializing in deep learning for natural language processing and computer vision, specifically interested in learning with minimal data.

More from Medium

Image Preprocessing for Facial Recognition: The Most Important Steps in Python with OpenCV

Computer Vision using Python

How to build a GAN for generating MNIST digits in PyTorch

Install Nvidia Deepstream 6.0.1 With Python Bindings on Jetson.

Nvidia DeepStream + Python cover