Image resize with OpenCV 3.x and CUDA

This prototype implements and tests different downsampling algorithms of grayscale and color images to any size using C++, CUDA, OpenCV 3.X.

In image analysis downsampling is a fundamental transformation to significantly decrease the processing time with little or no errors introduced into the system.

Several algorithms exist, from simple skipping pixels which results in aliased images to more sophisticated reduction kernels with decrease in speed. Popular image processing libraries such OpenCV implement it as a simple geometric transformation like pixel area relation where areas are shrunk independently.

Three different methods are being implemented and compared to each other in the prototype :

  • OpenCV 3.x CPU based method cv::rezize() from the imgproc module.
  • OpenCV 3.x GPU based method cuda::rezize()¬†from the cudawarping module.
  • own method downscaleCuda() implemented in Cuda to run in parallel on the GPU. This method simply merges frames of pixels into a single points in the output image. For instance, a downscale of 4x on both x and y scale will merge frames of 4×4 pixels of the original image in a single pixel in the output image.

The code is build and compiled on a MSI laptop with Geforce GTX 970M, 3GB GDDR5 and 13 streaming multi processors.

OpenCV CPU implementation

In this first test, the image is loaded from the disk and and downscaled 4 times in both x and y scale using cv::resize.

OpenCV GPU implementation

In this second test, the image is loaded from the disk and and downscaled 4 times in both x and y scale using cuda::resize.

Own CUDA implementation

In this third test, the image is loaded from the disk and and downscaled 4 times in both x and y scale using own implemented cuda method.

Execution times of the 3 implementations on simple_room-wallpaper-4096×3072.jpg, 2,3 MB, 4096×3072 pixels to a 4x downscale

The 3 implementations were run 10 times in a row. By excluding the peaks from the output, we can see that the OpenCV CPU implementation took in average ~6.5 ms, the OpenCV GPU implementation took in average ~1.65 ms and the CUDA implementation took in average ~1.60 ms.

All these times does not include the time spent on loading the image into CPU or GPU memory, but only the time spent in processing the downscale. In a normal usage an image is loaded once and many transformations are aplied on it, so the loading time is irrelevant to the total processing time.

Profiling of the OpenCV GPU and own CUDA implementation using NSight profiler

Also in profiling can be seen the time spent on computing is about 1.5 ms for both methods.











Original and generated rescaled images with OpenCV GPU and own method


Original image

Resized 1 4th

Rescaled image using OpenCV GPU

Resized 1 4th

Rescaled image using CUDA method


Both GPU versions tend to have a similar speed. While the OpenCV implementation is straightforward and easy to use, the CUDA implementation gives more flexibility like sharing of the GPU memory between CUDA kernels without the need to copy back and forth between the CPU and GPU. Speedups can also been achieved using features of the CUDA runtime such streams, shared memory or pinned memory.



Leave a Reply

Your email address will not be published.