Santiago, Chile, December 17th, 2015
in association with the
Object recognition and scene understanding have long been a central goal of computer vision research. Changes in lighting, viewpoint, intra-class differences, articulation, and deformation, as well as occlusion lead to enormous appearance variation, making the problem highly challenging. While advances in machine learning and image feature representations have led to great progress in 2D pattern recognition approaches, recognizing objects in the physical, 3D world from different sensors has wide applications in robotics, autonomous driving, security, virtual reality, human-computer interaction, and so on. When modeling scenes, objects and their relations in 3D, we must answer several fundamental questions. What representations are useful to model objects and scenes in 3D? How can we effectively learn 3D object representations from images or videos? What level of supervision is required? Can we use 3D CAD models to help 3D object recognition and scene understanding? How can we infer spatial knowledge of the scene and use it to aid in recognition? How can both depth sensors and RGB data be used to enable more descriptive representations for scenes and objects?
After the success of the 3dRR workshop during the past ICCV07, ICCV09, ICCV11 and ICCV13, we are pleased to organize a fifth edition of 3dRR in conjunction with ICCV 2015. This workshop would represent a great opportunity to bring together experts from multiple areas of computer vision and provide an arena for stimulating debate. We believe the complementary viewpoint offered by studies in human vision can provide additional insight on this fundamental problem. Specific questions we aim to address include:
3D Object Representation
- How can we find better representations of the 3D geometry of object instances or categories to further improve recognition?
- How can we use the 3D object representation for recognition?
- How can we utilize synthetic 3D training data (3D CAD models) besides real images to learn better object representations?
3D Object Recognition
- How can we recognize 3D properties of objects from images or videos, such as 3D pose, 3D location and 3D shape?
- How can we infer the spatial layout of multiple objects in 3D or reason about the occlusion relationships between objects in 3D?
- How can we track objects in 3D from videos?
3D Reconstruction and Recognition
- Can recognition and 3D reconstruction be run simultaneously to enhance each other?
- How much can semantic information help 3D reconstruction, and vice versa?
- How detailed does the 3D reconstruction need to be in order to achieve satisfactory recognition?
Combining Depth and RGB Sensors
- How can we represent and recognize object categories using both RGB and depth sensors?
- How can we estimate scene surfaces and physical interactions with RGBD data?
- How can depth and RGB data help extract object functional parts and affordances?
Spatial Inference
- How can we represent and infer the depth and orientation of surfaces and free space in indoor and outdoor scenes?
- How can alternative representations, such as depth maps and surface layout estimates, be combined to improve robustness?
Spatial Constraints and Contextual Recognition
- How can we use/explore different degrees of 3D spatial constraints (e.g. ground plane) for recognition?
- How can 3D spatial constraints be used for joint recognition of scenes and the objects within?
Human Vision
- What can we learn from what we know about our own visual system?
- How do we humans represent 3D objects or the 3D environment? Can this inspire computational work?