Data-Driven 3D Voxel Patterns for Object Category Recognition

Introduction

Despite the great progress achieved in recognizing objects as 2D bounding boxes in images, it is still very challenging to detect occluded objects and estimate the 3D properties of multiple objects from a single image. In this paper, we propose a novel object representation, 3D Voxel Pattern (3DVP), that jointly encodes the key properties of objects including appearance, 3D shape, viewpoint, occlusion and truncation. We discover 3DVPs in a data-driven way, and train a bank of specialized detectors for a dictionary of 3DVPs. The 3DVP detectors are capable of detecting objects with specific visibility patterns and transferring the meta-data from the 3DVPs to the detected objects, such as 2D segmentation mask, 3D pose as well as occlusion or truncation boundaries. The transferred meta-data allows us to infer the occlusion relationship among objects, which in turn provides improved object recognition results. Experiments are conducted on the KITTI detection benchmark [1] and the outdoor-scene dataset [2]. We improve state of-the-art results on car detection and pose estimation with notable margins. We also verify the ability of our method in accurately segmenting objects from the background and localizing them in 3D.

Publication

Yu Xiang, Wongun Choi, Yuanqing Lin and Silvio Savarese. Data-Driven 3D Voxel Patterns for Object Category Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. pdf, bibtex, technical report, KITTI results, Code

Annotations

The 3D voxel exemplar annotations we built from KITTI are here ~ 320M.

For each car in the training set of the KITTI detection benchmark, we provide its 2D segmentation mask and its 3D voxel model by aligning a 3D car model to its 3D cuboid annotated by KITTI.

We split the train set of KITTI into a training set and a validation set for cross-validation. Our splitting ensures that images in the two sets are from different videos. The image IDs in the training set is here (3,682 images). The images IDs in the validation set is here (3,799 images).

References

A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, pages 3354–3361, 2012.
Y. Xiang and S. Savarese. Object detection by 3d aspectlets and occlusion reasoning. In 3dRR, pages 530–537, 2013.

Acknowledgements

We acknowledge the support of NSF CAREER grant N.1054127, ONR award N000141110389, and DARPA UPSIDE grant A13-0895-S002.

Contact : yuxiang at umich dot edu

Last update : 6/16/2015