MCCF - Hamed Kiani (Ph.D.)

Multi Channel Correlation Filters

CLICK FOR MATLAB CODE

Modern descriptors like HOG and SIFT are now commonly used in vision for pattern detection within image and video. From a signal processing perspective this detection process can be efficiently posed as a correlation/ convolution between a multi-channel image and a multi-channel detector/filter which results in a singlechannel response map indicating where the pattern (e.g. object) has occurred. In this work we propose a novel framework for learning a multi-channel detector/filter efficiently in the frequency domain (both in terms of training time and memory footprint) which we refer to as a multichannel correlation filter. To demonstrate the effectiveness of our strategy, we evaluate it across a number of visual detection/ localization tasks where we: (i) exhibit superior performance to current state of the art correlation filters, and (ii) superior computational and memory efficiencies compared to state of the art spatial detectors.

We propose an extension to canonical correlation filter theory that is able to efficiently handle multi-channel signals. Specifically, we show how when posed in the frequency domain the task of multi-channel correlation filter estimation forms a sparse banded linear system. Further, we demonstrate how our system can be solved much more efficiently than spatial domain methods.
We characterize theoretically and demonstrate empirically how our multi-channel correlation approach affords substantial memory savings when learning on multichannel signals. Specifically we demonstrate how our approach does not have a memory cost that is linear in the number of samples, allowing for substantial savings when learning detectors across large amounts of data.
We apply our approach across a myriad of detection and localization tasks including: eye localization, car detection and pedestrian detection. We demonstrate: (i) superior performance to current state of the art single-channel correlation filters, and (ii) superior computational and memory efficiency in comparison to spatial detectors (e.g. linear SVM) with comparable detection performance.

An example of multi-channel correlation/convolution where one has a multi-channel image x correlated/convolved with a multi-channel filter h to give a single-channel response y. By posing this objective in the frequency domain, our multi-channel correlation filter approach attempts to give a computational & memory efficient strategy for estimating h given x and y.

Car detection demo

Eye detection demo

Facial landmark localization on LFW dataset (HOG channels)

(a) Facial features localization rate at d<0.10, and (b) mean localization error normalized by interocular distance. d is localization threshold defined as a fraction of interocular distance.

The performance of facial features localization: localization rate versus threshold.

Visualizing facial features localization, first and second rows show successful localizations, and the third row show wrong localizations.

Car detection on MIT StreetScene dataset comparing with prior filters (HOG channels)

Car detection rate as a function of threshold (pixels).

Visualizes car detection results. First and second rows: true detections, and third row: wrong detections. The red, blue and green boxes represents detection by our method, MOSSE and ASEF, respectively.

Pedestrian detection on Daimler dataset comparing with SVM (HOG channels)

Comparing our method with SVM + HOG (a) pedestrian detection rate at FPR = 0.10 versus number of training images, and (b) ROC curve of detection rate as a function of false positive rate (8000 training images).

Comparing the training time of our method with SVM as a function of training set sizes.

Comparing minimum required memory (MB) of our method with SVM as a function of number of training images.

UIUC car detection results (HOG channels)

Visualizing accuracy and detection speed of our method compared to the state of the arts. In a very comparable detection accuracy, our method is almost 30 times faster than the state of the art for single scale car detection in the UIUC dataset.

Detection rate at EER and detection time (sec) of our method compared to state-of-the-art approaches. Our method achieves very competitive results with much faster detection speed.

Detection results of our method over the UIUC cars dataset. The proposed method is able to nd the cars in street images captured under uncontrolled circumstances with challenging intra-class variations, very textured background, extreme lighting and scale changes. The ground truth and the detected cars are respectively shown by the red and dashed blue boxes.

INRIA horse detection results (HOG channels)

Recall versus FPPI ROC curve of our method compared to the state-of-the-art approaches on the INRIA horses dataset.

Detection results of our method over the INRIA horses dataset. Our proposed method is stable against cluttered background, illumination and scale changes. The ground truth, true positive and false positive detections are respectively shown by the blue, green and red boxes.