This prototype tests different implementations of the real-time feature-based object detection with SURF, KNN, FLANN, OpenCV 3.X and CUDA.
Object detection is the process of finding instances of real-world objects such as faces, bicycles, and buildings in images or videos. Object detection algorithms typically use extracted features and learning algorithms to recognize instances of an object category. It is commonly used in applications such as image retrieval, security, surveillance, and automated vehicle parking systems.
It is one of the fundamental problems in computer vision and a lot of techniques have come up to solve it, including Feature-based object detection, Viola-Jones object detection, SVM classification with histograms of oriented gradients (HOG) features and Image segmentation and blob analysis. Other methods for detecting objects with computer vision include using gradient-based, derivative-based, and template matching approaches. More details can be found in this post.
This post is about testing of the SURF algorithm, the FLANN based matcher (Fast Library for Approximate Nearest Neighbors), and their OpenCV 3.x CPU and GPU/CUDA implementations. The SURF algorithm falls into the first category Feature-based object detection.

Detecting a reference object (left) in a cluttered scene (right) using feature extraction and matching. RANSAC is used to estimate the location of the object in the test image.
SURF
SURF stands for Speeded Up Robust Features and is an algorithm which extracts some unique keypoints and descriptors from an image. Details on the algorithm can be found here and on its OpenCV implementation here and here. A set of SURF keypoints and descriptors can be extracted from an image and then used later to detect the same image. SURF uses an intermediate image representation called Integral Image, which is computed from the input image and is used to speed up the calculations in any rectangular area. It is formed by summing up the pixel values of the x,y co-ordinates from origin to the end of the image. This makes computation time invariant to change in size and is particularly useful while encountering large images. The SURF detector is based on the determinant of the Hessian matrix. The SURF descriptor describes how pixel intensities are distributed within a scale dependent neighborhood of each interest point detected by Fast Hessian.
Object detection using SURF is scale and rotation invariant which makes it very powerful. Also it doesn’t require long and tedious training as in case of using cascaded Haar classifier based detection. The detection time of SURF is a little longer than Haar, but it doesn’t make much problem in most situations if takes some tens of millisecond more for detection. Since this method is rotation invariant, it is possible to successfully detect objects in any orientation, case where the Haar classifier fails miserably.

Feature keypoints are represented by the circle centers and the descriptors by their radius and orientation
- Find interest points in the image using Hessian matrices
- Determine the orientation of these points
- Use basic Haar wavelets in a suitably oriented square region around the interest points to find intensity gradients in the X and Y directions. As the square region is divided into 16 squares for this, and each such sub-square yields 4 features, the SURF descriptor for every interest point is 64 dimensional.
- For a description of the 4 features, please refer the paper – they are basically sums of gradient changes.
Surf is a non-free algorithm. Other feature detector algorithms implemented in OpenCV include SIFT, FAST, BRIEF, ORB as listed here.
Feature Matching with FLANN (Fast Library for Approximate Nearest Neighbors)
After unique keypoints and descriptors are extracted, from both images object and scene, a matching must be done. OpenCV has two implemented matching strategies, the Brute-Force Matcher and FLANN Matcher as explained here and here.
The Brute-Force matcher is simple and it takes the descriptor of one feature in first set and is matched with all other features in second set using some distance calculation. And the closest one is returned.
The FLANN library contains a collection of algorithms optimized for fast nearest neighbor search in large datasets and for high dimensional features, and it works more faster than Brute-Force Matcher for large datasets.
Both versions implement the same interface and provide different strategies for selecting the best matches. The strategies are DescriptorMatcher::match() which returns only the best match, DescriptorMatcher::knnMatch() wich returns the best k matches using the k-nearest neighbors algorithm, and DescriptorMatcher::radiusMatch() which returns training descriptors not farther than the specified distance. The three OpenCV implementations are explained here and here in details.
In this post the second method is used DescriptorMatcher::knnMatch().
OpenCV libraries
Surf is a non-free algorithm and therefore included in a separate repository opencv_contrib. For linux it can be installed as explained here.
To get the code running with OpenCV 3.1 these libraries must be included in the path.
OpenCV CPU
This first implementation is using only CPU only methods of OpenCV 3.X.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
/** * It searches for on object inside an scene using the SURF for keypoints and descriptors detection, and FLANN+KNN for matching them. * This implementation is done only in CPU. * * @param objectInputFile Path of the image containing the object to be searched. * @param sceneInputFile Path of the image where the object to be searched. * @param outputFile Path of the image where the matching to be saved. * @param minHessian The Hessian value of the SURF algorithm. */ void processWithCpu(string objectInputFile, string sceneInputFile, string outputFile, int minHessian = 100) { // Load the image from the disk Mat img_object = imread( objectInputFile, IMREAD_GRAYSCALE ); // surf works only with grayscale images Mat img_scene = imread( sceneInputFile, IMREAD_GRAYSCALE ); if( !img_object.data || !img_scene.data ) { std::cout<< "Error reading images." << std::endl; return; } // Start the timer GpuTimer timer; timer.Start(); vector<KeyPoint> keypoints_object, keypoints_scene; // keypoints Mat descriptors_object, descriptors_scene; // descriptors (features) //-- Steps 1 + 2, detect the keypoints and compute descriptors, both in one method Ptr<SURF> surf = SURF::create( minHessian ); surf->detectAndCompute( img_object, noArray(), keypoints_object, descriptors_object ); surf->detectAndCompute( img_scene, noArray(), keypoints_scene, descriptors_scene ); //-- Step 3: Matching descriptor vectors using FLANN matcher FlannBasedMatcher matcher; // FLANN - Fast Library for Approximate Nearest Neighbors vector< vector< DMatch> > matches; matcher.knnMatch( descriptors_object, descriptors_scene, matches, 2 ); // find the best 2 matches of each descriptor timer.Stop(); printf( "Method processImage() ran in: %f msecs.\n", timer.Elapsed() ); //-- Step 4: Select only goot matches std::vector< DMatch > good_matches; for (int k = 0; k < std::min(descriptors_scene.rows - 1, (int)matches.size()); k++) { if ( (matches[k][0].distance < 0.6*(matches[k][1].distance)) && ((int)matches[k].size() <= 2 && (int)matches[k].size()>0) ) { // take the first result only if its distance is smaller than 0.6*second_best_dist // that means this descriptor is ignored if the second distance is bigger or of similar good_matches.push_back( matches[k][0] ); } } //-- Step 5: Draw lines between the good matching points Mat img_matches; drawMatches( img_object, keypoints_object, img_scene, keypoints_scene, good_matches, img_matches, Scalar::all(-1), Scalar::all(-1), vector<char>(), DrawMatchesFlags::DEFAULT ); //-- Step 6: Localize the object inside the scene image with a square localizeInImage( good_matches, keypoints_object, keypoints_scene, img_object, img_matches ); //-- Step 7: Show/save matches //imshow("Good Matches & Object detection", img_matches); //waitKey(0); imwrite(outputFile, img_matches); } |
OpenCV GPU
This second implementation is using both GPU and CPU methods of OpenCV 3.X.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
/** * It searches for on object inside an scene using the SURF for keypoints and descriptors detection, and BruteForce+KNN for matching them. * This implementation is done partly in GPU (SURF+KNN) and partly in CPU (best points selection + localization). * * @param objectInputFile Path of the image containing the object to be searched. * @param sceneInputFile Path of the image where the object to be searched. * @param outputFile Path of the image where the matching to be saved. * @param minHessian The Hessian value of the SURF algorithm. */ void processWithGpu(string objectInputFile, string sceneInputFile, string outputFile, int minHessian = 100) { // Load the image from the disk Mat img_object = imread( objectInputFile, IMREAD_GRAYSCALE ); // surf works only with grayscale images Mat img_scene = imread( sceneInputFile, IMREAD_GRAYSCALE ); if( !img_object.data || !img_scene.data ) { std::cout<< "Error reading images." << std::endl; return; } // Copy the image into GPU memory cuda::GpuMat img_object_Gpu( img_object ); cuda::GpuMat img_scene_Gpu( img_scene ); // Start the timer - the time moving data between GPU and CPU is added GpuTimer timer; timer.Start(); cuda::GpuMat keypoints_scene_Gpu, keypoints_object_Gpu; // keypoints cuda::GpuMat descriptors_scene_Gpu, descriptors_object_Gpu; // descriptors (features) //-- Steps 1 + 2, detect the keypoints and compute descriptors, both in one method cuda::SURF_CUDA surf( minHessian ); surf( img_object_Gpu, cuda::GpuMat(), keypoints_object_Gpu, descriptors_object_Gpu ); surf( img_scene_Gpu, cuda::GpuMat(), keypoints_scene_Gpu, descriptors_scene_Gpu ); //cout << "FOUND " << keypoints_object_Gpu.cols << " keypoints on object image" << endl; //cout << "Found " << keypoints_scene_Gpu.cols << " keypoints on scene image" << endl; //-- Step 3: Matching descriptor vectors using BruteForceMatcher Ptr< cuda::DescriptorMatcher > matcher = cuda::DescriptorMatcher::createBFMatcher(); vector< vector< DMatch> > matches; matcher->knnMatch(descriptors_object_Gpu, descriptors_scene_Gpu, matches, 2); // Downloading results Gpu -> Cpu vector< KeyPoint > keypoints_scene, keypoints_object; //vector< float> descriptors_scene, descriptors_object; surf.downloadKeypoints(keypoints_scene_Gpu, keypoints_scene); surf.downloadKeypoints(keypoints_object_Gpu, keypoints_object); //surf.downloadDescriptors(descriptors_scene_Gpu, descriptors_scene); //surf.downloadDescriptors(descriptors_object_Gpu, descriptors_object); timer.Stop(); printf( "Method processImage() ran in: %f msecs.\n", timer.Elapsed() ); //-- Step 4: Select only goot matches //vector<Point2f> obj, scene; std::vector< DMatch > good_matches; for (int k = 0; k < std::min(keypoints_object.size()-1, matches.size()); k++) { if ( (matches[k][0].distance < 0.6*(matches[k][1].distance)) && ((int)matches[k].size() <= 2 && (int)matches[k].size()>0) ) { // take the first result only if its distance is smaller than 0.6*second_best_dist // that means this descriptor is ignored if the second distance is bigger or of similar good_matches.push_back(matches[k][0]); } } //-- Step 5: Draw lines between the good matching points Mat img_matches; drawMatches( img_object, keypoints_object, img_scene, keypoints_scene, good_matches, img_matches, Scalar::all(-1), Scalar::all(-1), vector<char>(), DrawMatchesFlags::DEFAULT ); //-- Step 6: Localize the object inside the scene image with a square localizeInImage( good_matches, keypoints_object, keypoints_scene, img_object, img_matches ); //-- Step 7: Show/save matches //imshow("Good Matches & Object detection", img_matches); //waitKey(0); imwrite(outputFile, img_matches); //-- Step 8: Release objects from the GPU memory surf.releaseMemory(); matcher.release(); img_object_Gpu.release(); img_scene_Gpu.release(); } |
Localization
Localization is done after keypoints and feature descriptors are extracted from both object and scene images. It searches for the right position, orientation and scale of the object in the scene based on the good_matches. This method also draws lines between matching good keypoints and a square where and if the object was detected. For finding the best translation matrix Random sample consensus (RANSAC) is used.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
// It searches for the right position, orientation and scale of the object in the scene based on the good_matches. void localizeInImage(const std::vector<DMatch>& good_matches, const std::vector<KeyPoint>& keypoints_object, const std::vector<KeyPoint>& keypoints_scene, const Mat& img_object, const Mat& img_matches) { //-- Localize the object std::vector<Point2f> obj; std::vector<Point2f> scene; for (int i = 0; i < good_matches.size(); i++) { //-- Get the keypoints from the good matches obj.push_back(keypoints_object[good_matches[i].queryIdx].pt); scene.push_back(keypoints_scene[good_matches[i].trainIdx].pt); } try { Mat H = findHomography(obj, scene, RANSAC); //-- Get the corners from the image_1 ( the object to be "detected" ) std::vector<Point2f> obj_corners(4); obj_corners[0] = cvPoint(0, 0); obj_corners[1] = cvPoint(img_object.cols, 0); obj_corners[2] = cvPoint(img_object.cols, img_object.rows); obj_corners[3] = cvPoint(0, img_object.rows); std::vector<Point2f> scene_corners(4); perspectiveTransform(obj_corners, scene_corners, H); // Draw lines between the corners (the mapped object in the scene - image_2 ) line(img_matches, scene_corners[0] + Point2f(img_object.cols, 0), scene_corners[1] + Point2f(img_object.cols, 0), Scalar(255, 0, 0), 4); line(img_matches, scene_corners[1] + Point2f(img_object.cols, 0), scene_corners[2] + Point2f(img_object.cols, 0), Scalar(255, 0, 0), 4); line(img_matches, scene_corners[2] + Point2f(img_object.cols, 0), scene_corners[3] + Point2f(img_object.cols, 0), Scalar(255, 0, 0), 4); line(img_matches, scene_corners[3] + Point2f(img_object.cols, 0), scene_corners[0] + Point2f(img_object.cols, 0), Scalar(255, 0, 0), 4); } catch (Exception& e) {} } |
Execution times of the 2 implementations
Only the time spent extracting the keypoints and feature descriptors and the time for extracting the matches are taken in account. The time for loading images into CPU and GPU memories is ignored.
In the console can be seen that the GPU version finishes in about 20 ms for images not bigger than 800×600 which means the GPU implementation can run in real-time. The CPU version finishes in about 200 ms and could process only a few frames per second.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
CPU::Processing object: ../data/img_color_1_object1.jpg and scene: ../data/img_color_1.jpg ... Method processWithCpu() ran in: 200.673447 msecs, object size: 640x480, scene size: 640x480 GPU::Processing object: ../data/img_color_1_object1.jpg and scene: ../data/img_color_1.jpg ... Method processWithGpu() ran in: 20.629408 msecs, object size: 640x480, scene size: 640x480 CPU::Processing object: ../data/img_color_2_object1.jpg and scene: ../data/img_color_2.jpg ... Method processWithCpu() ran in: 260.762726 msecs, object size: 270x270, scene size: 800x600 GPU::Processing object: ../data/img_color_2_object1.jpg and scene: ../data/img_color_2.jpg ... Method processWithGpu() ran in: 21.733664 msecs, object size: 270x270, scene size: 800x600 CPU::Processing object: ../data/img_color_3_object1.jpg and scene: ../data/img_color_3.jpg ... Method processWithCpu() ran in: 193.915268 msecs, object size: 256x292, scene size: 800x454 GPU::Processing object: ../data/img_color_3_object1.jpg and scene: ../data/img_color_3.jpg ... Method processWithGpu() ran in: 17.539713 msecs, object size: 256x292, scene size: 800x454 CPU::Processing object: ../data/img_color_4_object1.jpg and scene: ../data/img_color_4.jpg ... Method processWithCpu() ran in: 253.040451 msecs, object size: 256x202, scene size: 1024x765 GPU::Processing object: ../data/img_color_4_object1.jpg and scene: ../data/img_color_4.jpg ... Method processWithGpu() ran in: 30.642303 msecs, object size: 256x202, scene size: 1024x765 CPU::Processing object: ../data/img_color_5_object1.jpg and scene: ../data/img_color_5.jpg ... Method processWithCpu() ran in: 258.356354 msecs, object size: 256x216, scene size: 1024x768 GPU::Processing object: ../data/img_color_5_object1.jpg and scene: ../data/img_color_5.jpg ... Method processWithGpu() ran in: 29.302015 msecs, object size: 256x216, scene size: 1024x768 |
Original image and detected images
Conclusion
Surf is a powerful and fast algorithm for unique feature extraction. For extracting keypoints and feature descriptors it uses only neighbour pixels, for this reason it has no information regarding the shape and relative position between keypoints. To have a correct detection it must be combined with other algorithm which take the relative position and shape in consideration.
Sources:
Resources:
http://de.mathworks.com/discovery/object-detection.html
http://aaaipress.org/Papers/Symposia/Fall/1993/FS-93-04/FS93-04-013.pdf
https://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework
https://en.wikipedia.org/wiki/Histogram_of_oriented_gradients
https://en.wikipedia.org/wiki/Blob_detection
https://en.wikipedia.org/wiki/Feature_detection_(computer_vision)
https://en.wikipedia.org/wiki/Speeded_up_robust_features
http://www.cs.ubc.ca/research/flann/
Computer Vision – The Integral Image
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
http://campar.in.tum.de/twiki/pub/Chair/TeachingWs09MATDCV/SURF_paper.pdf
http://cs.au.dk/~jtp/SURF/report.pdf
http://opensurf1.googlecode.com/files/OpenSURF.pdf
http://www.kafu-academic-journal.info/journal/4/119/
http://asrl.utias.utoronto.ca/code/gpusurf/
http://docs.opencv.org/3.0-beta/modules/xfeatures2d/doc/nonfree_features.html
http://docs.opencv.org/3.0-beta/doc/tutorials/features2d/feature_homography/feature_homography.html
http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_feature2d/py_surf_intro/py_surf_intro.html
http://docs.opencv.org/master/db/d06/classcv_1_1cuda_1_1SURF__CUDA.html
http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_feature2d/py_matcher/py_matcher.html
http://docs.opencv.org/3.0-beta/modules/features2d/doc/common_interfaces_of_descriptor_matchers.html
http://docs.opencv.org/2.4/modules/flann/doc/flann_fast_approximate_nearest_neighbor_search.html
http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_video/py_meanshift/py_meanshift.html
http://study.marearts.com/2015/06/opencv-30-rc1-example-source-code-for.html
http://robocv.blogspot.de/2012/02/real-time-object-detection-in-opencv.html
https://achuwilson.wordpress.com/2011/08/05/object-detection-using-surf-in-opencv-part-1/
https://www.intorobotics.com/how-to-detect-and-track-object-with-opencv/
http://blog.christianperone.com/2015/01/real-time-drone-object-tracking-using-python-and-opencv/