Building Strong Geometric Priors for Total Scene Understanding

The overarching goal of this NSF funded project (IIS-1618806) is to develop integrated models for fusing recognition and geometric image understanding

This project is exploring how capabilities for geometric image understanding can change the way people approach the problem of automatically interpreting the semantic content of individual photos or videos. By developing algorithms for accurately localizing cameras from images that integrate other sources of geo-spatial data, such as 3D models of buildings and maps of urban areas, the project aims to significantly improve the ability of computer vision systems to understand image content. Utilizing strong prior information for scene understanding has a wide range of important practical applications. An assistive robot providing elderly care in a home should leverage knowledge of the appearance and location of objects in its immediate environment while adapting to changes on multiple time scales (a coffee cup sitting on the table moves much more frequently than the table itself). A network of self-driving cars could benefit significantly from dynamically updated urban maps built from the stream of data collected by the cars and other cameras (e.g., adapting behavior to a temporary lane closure that changes typical car and pedestrian traffic patterns). The project involves students in research spanning a range of traditional disciplines and is engaging a wider audience across the UC Irvine campus in understanding and applying these technologies to novel social and scientific applications.

This research investigates an alternate approach in which scene priors (including affordances and semantic attributes) are represented in 3D geo-spatial model coordinates rather than in 2D image space. Incorporating geometric context into scene understanding has largely been pursued under very weak prior assumptions on scene geometry and camera pose. Importantly, the research allows for direct integration of non-visual data such as GIS maps. The project is developing the appropriate algorithms and datasets to integrating such data along with a continual stream of images to produce a strong, temporally-evolving (4D) scene prior that can improve accuracy of camera pose estimation, monocular geometry, object detection and semantic segmentation.


M. Lee, C. Fowlkes, "Spacetime Localization and Mapping", to appear, ICCV, Venice, (October 2017). [pdf]

S. Kong, C. Fowlkes, "Recurrent Scene Parsing with Perspective Understanding in the Loop", Technical Report, (May 2017) arXiv:1705.07238 [pdf]

S. Kong, C. Fowlkes, "Low-rank Bilinear Pooling for Fine-Grained Classification", CVPR, (July 2017). arXiv:1611.05109 [pdf]

R. Diaz, C. Fowlkes, "Cluster-wise Ratio Tests for Fast Camera Localization", Int. Workshop on Visual Odometry and Computer Vision Applications Base don Location Clues, CVPR, (July 2017). arXiv:1612.01689 [pdf]

S. Wang, S. Wolf, C. Fowlkes, J. Yarkony, "Tracking Objects with Higher Order Interactions using Delayed Column Generation", AISTATS, (April 2017). arXiv:1512.02413 [pdf]

R. Diaz, M. Lee, J. Schubert, C. Fowlkes, "Lifting GIS Maps into Strong Geometric Context" WACV 2016 arXiv:1507.03698 [pdf]

R. Díaz, S. Hallman, C. Fowlkes, "Detecting Dynamic Objects with Multi-View Background Subtraction", ICCV, Sydney, Australia (December 2013). [pdf]

R. Díaz, S. Hallman, C. Fowlkes, "Multi-View Background Subtraction for Object Detection", Scene Understanding Workshop, Portland, OR, (June 2013).

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.