Deep Metric Learning via Lifted Structured Feature Embedding

Introduction

Learning the distance metric between pairs of examples is of great importance for learning and visual recognition. With the remarkable success from the state of the art convolutional neural networks, recent works [1, 2] have shown promising results on discriminatively training the networks to learn semantic feature embeddings where similar examples are mapped close to each other and dissimilar examples are mapped farther apart. In this paper, we describe an algorithm for taking full advantage of the training batches in the neural network training by lifting the vector of pairwise distances within the batch to the matrix of pairwise distances. This step enables the algorithm to learn the state of the art feature embedding by optimizing a novel structured prediction objective on the lifted problem. Additionally, we collected Stanford Online Products dataset: 120k images of 23k classes of online products for metric learning. Our experiments on the CUB-200-2011 [3], CARS196 [4], and Stanford Online Products datasets demonstrate significant improvement over existing deep feature embedding methods on all experimented embedding sizes with the GoogLeNet [5] network.

Publication

Hyun Oh Song, Yu Xiang, Stefanie Jegelka and Silvio Savarese. Deep Metric Learning via Lifted Structured Feature Embedding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. pdf, bibtex, technical report

Code and Dataset

The github repository for this project is here.

The Stanford Online Products dataset (2.9G) is here.

References

S. Bell and K. Bala. Learning visual similarity for product design with convolutional neural networks. In SIGGRAPH, 2015.
F. Schroff, D. Kalenichenko, and J. Philbin. Facenet: A unified embedding for face recognition and clustering. In CVPR, 2015.
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
J. Krause, M. Stark, J. Deng, and F.-F. Li. 3d object representations for fine-grained categorization. ICCV 3dRR-13, 2013.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.

Acknowledgements

We acknowledge the support of ONR grant #N00014-13-1-0761 and grant #122282 from the Stanford AI Lab-Toyota Center for Artificial Intelligence Research.

Contact : hsong at cs dot stanford dot edu

Last update : 5/20/2016