Histogram calculation with CUDA and OpenCV 3.X

This prototype tests different implementations of the histogram calculation for images using C++, CUDA, OpenCV 3.X.

An image histogram is a type of histogram that acts as a graphical representation of the tonal distribution in a digital image. It plots the number of pixels for each tonal value. By looking at the histogram for a specific image a viewer will be able to judge the entire tonal distribution at a glance.

histogram_examples

Histogram examples, left with a high density a dark pixels, right with white pixels

Three different methods are compared to each other in this prototype:

  • OpenCV 3.x CPU based method cv::calcHist from the imgproc module.
  • OpenCV 3.x GPU based method cv::cuda::calcHist from the cudaimgproc module.
  • Own method calcHistCuda() implemented in Cuda to run in parallel on the GPU.

OpenCV CPU implementation:

In this first test, the CPU version from OpenCV cv::calcHist is tested.

OpenCV GPU implementation:

In this second test, the CPU version from OpenCV cv::cuda::calcHist is tested.

Own CUDA implementation:

In this third test, own CUDA implementation is tested.

Execution times of the 3 implementations on simple_room-wallpaper-4096×3072.jpg, 2,3 MB, 4096×3072 pixels:

It can be seen that both OpenCV version are much faster than the simple CUDA version using built-in atomics.

Profiling of the OpenCV GPU and own CUDA implementation using NSight profiler:

In profile mode can also be seen the time spent on computing is about 1.46 ms for OpenCV GPU 4% of the total compte time, compared to 96% for own GPU implementation.

hist_compute_profiling

Original image and generated histogram images

simple_room-wallpaper-4096x3072

simple_room-wallpaper-4096×3072

hist_compute_OpenCvCpu

hist_compute_OpenCvCpu

hist_compute_OpenCvGpu

hist_compute_OpenCvGpu

hist_compute_Cuda

hist_compute_Cuda

Conclusion:

The OpenCV GPU version is at least 10 times faster than own CUDA implementation. But for own the implementation is still place for optimizations by using advanced features provided by the CUDA framework such using the shared memory for both the kernel and image, streams for moving data between CPU and GPU, or pinned memory.

Sources:

https://bitbucket.org/coldvisionio/coldvision-library/src/eb87c764386877dea9af4d48b1a2d7618d94d3ec/samples/2_imaging/hist_compute/

Resources:

https://en.wikipedia.org/wiki/Image_histogram

http://www.cambridgeincolour.com/tutorials/histograms1.htm

http://opencv-code.com/tutorials/drawing-histograms-in-opencv/

http://docs.opencv.org/master/d8/dbc/tutorial_histogram_calculation.html#gsc.tab=0

http://docs.opencv.org/master/d6/dc7/group__imgproc__hist.html#ga4b2b5fd75503ff9e6844cc4dcdaed35d&gsc.tab=0

http://docs.opencv.org/master/d8/d0e/group__cudaimgproc__hist.html#gaaf3944106890947020bb4522a7619c26&gsc.tab=0

https://github.com/Itseez/opencv/blob/master/modules/cudaimgproc/test/test_histogram.cpp

https://laconsigna.wordpress.com/2011/04/29/1d-histogram-on-opencv/

Leave a Reply

Your email address will not be published.